Re: How to specify column type when saving DataFrame as parquet file?

2015-08-14 Thread Raghavendra Pandey
I think you can try dataFrame create api that takes RDD[Row] and Struct type... On Aug 11, 2015 4:28 PM, Jyun-Fan Tsai jft...@appier.com wrote: Hi all, I'm using Spark 1.4.1. I create a DataFrame from json file. There is a column C that all values are null in the json file. I found that

Re: How to specify column type when saving DataFrame as parquet file?

2015-08-14 Thread Francis Lau
Jyun Fan Here is how I have been doing it. I found that I needed to define the schema when loading the JSON file first Francis import datetime from pyspark.sql.types import * # Define schema upSchema = StructType([ StructField(field 1, StringType(), True), StructField(field 2, LongType(),

How to specify column type when saving DataFrame as parquet file?

2015-08-11 Thread Jyun-Fan Tsai
Hi all, I'm using Spark 1.4.1. I create a DataFrame from json file. There is a column C that all values are null in the json file. I found that the datatype of column C in the created DataFrame is string. However, I would like to specify the column as Long when saving it as parquet file. What