Re: pandas DF Dstream to Spark DF

Bryan Cutler Mon, 10 Apr 2017 10:14:22 -0700

Hi Yogesh,

It would be easier to help if you included your code and the exact error
messages that occur.  If you are creating a Spark DataFrame with a Pandas
DataFrame, then Spark does not read the schema and infers from the data to
make one.  This might be the cause of your issue if the schema is not
inferred correctly.  You can try to specify the schema manually, like this
for example


schema = StructType([
            StructField("str_t", StringType(), True),
            StructField("int_t", IntegerType(), True),
            StructField("double_t", DoubleType(), True)])

pandas_df = pandas.DataFrame(data={...})
spark_df = spark.createDataFrame(pandas_df, schema=schema)

This step might be eliminated by using Apache Arrow, see SPARK-13534 for
related work.

On Sun, Apr 9, 2017 at 10:19 PM, Yogesh Vyas <informy...@gmail.com> wrote:

> Hi,
>
> I am writing a pyspark streaming job in which i am returning a pandas data
> frame as DStream. Now I wanted to save this DStream dataframe to parquet
> file. How to do that?
>
> I am trying to convert it to spark data frame but I am getting multiple
> errors. Please suggest me how to do that.
>
> Regards,
> Yogesh
>

Re: pandas DF Dstream to Spark DF

Reply via email to