Schema - DataTypes.NullType

2018-01-29 Thread Jean Georges Perrin
Hi Sparkians, Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema? Thanks jg - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Type Casting Error in Spark Data Frame

2018-01-29 Thread Jean Georges Perrin
You can try to create new columns with the nested value, > On Jan 29, 2018, at 15:26, Arnav kumar wrote: > > Hello Experts, > > I would need your advice in resolving the below issue when I am trying to > retrieving the data from a dataframe. > > Can you please let me

Schema - DataTypes.NullType

2018-01-29 Thread Jean Georges Perrin
Hi Sparkians, Can someone tell me what is the purpose of DataTypes.NullType, specially as you are building a schema? Thanks jg - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Type Casting Error in Spark Data Frame

2018-01-29 Thread Patrick McCarthy
You can't select from an array like that, try instead using 'lateral view explode' in the query for that element, or before the sql stage (py)spark.sql.functions.explode. On Mon, Jan 29, 2018 at 4:26 PM, Arnav kumar wrote: > Hello Experts, > > I would need your advice in

Type Casting Error in Spark Data Frame

2018-01-29 Thread Arnav kumar
Hello Experts, I would need your advice in resolving the below issue when I am trying to retrieving the data from a dataframe. Can you please let me know where I am going wrong. code : // create the dataframe by parsing the json // Message Helper describes the JSON Struct //data out is the

Spark Streaming checkpoint

2018-01-29 Thread KhajaAsmath Mohammed
Hi, I have written spark streaming job to use the checkpoint. I have stopped the streaming job for 5 days and then restart it today. I have encountered weird issue where it shows as zero records for all cycles till date. is it causing data loss? [image: Inline image 1] Thanks, Asmath

Re: Reading Hive RCFiles?

2018-01-29 Thread Michael Segel
Just to follow up… I was able to create an RDD from the file, however, diving in to the RDD is a bit weird, and I’m working thru it. My test file seems to be one block … 3K rows. So when I tried to get the first column of the first row, I ended up getting all of the rows for the first column

Re: Reverse MinMaxScaler in SparkML

2018-01-29 Thread Nick Pentreath
This would be interesting and a good addition I think. It bears some thought about the API though. One approach is to have an "inverseTransform" method similar to sklearn. The other approach is to "formalize" something like StringIndexerModel -> IndexToString. Here, the inverse transformer is a