Re: to_json not working with selectExpr

2017-07-16 Thread Burak Yavuz
Hi Matthew, Which Spark version are you using? The expression `to_json` was added in 2.2 with this commit: https://github.com/apache/spark/commit/0cdcf9114527a2c359c25e46fd6556b3855bfb28 Best, Burak On Sun, Jul 16, 2017 at 6:24 PM, Matthew cao wrote: > Hi all, > I just

to_json not working with selectExpr

2017-07-16 Thread Matthew cao
Hi all, I just read the databricks blog here: https://docs.databricks.com/_static/notebooks/complex-nested-structured.html When I try to follow the example about the to_json and selectExpr part, it gave error:

Re: splitting columns into new columns

2017-07-16 Thread ayan guha
You are looking for explode function. On Mon, 17 Jul 2017 at 4:25 am, nayan sharma wrote: > I’ve a Dataframe where in some columns there are multiple values, always > separated by ^ > > phone|contact| > ERN~58XX7~^EPN~5X551~|C~MXXX~MSO~^CAxxE~~3XXX5| > >

splitting columns into new columns

2017-07-16 Thread nayan sharma
I’ve a Dataframe where in some columns there are multiple values, always separated by ^ phone|contact| ERN~58XX7~^EPN~5X551~|C~MXXX~MSO~^CAxxE~~3XXX5| phone1|phone2|contact1|contact2| ERN~5XXX7|EPN~5891551~|C~MXXXH~MSO~|CAxxE~~3XXX5| How can this be achieved using loop

Re: Querying on Deeply Nested JSON Structures

2017-07-16 Thread Burak Yavuz
Have you checked out this blog post? https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html Shows tools and tips on how to work with nested data. You can access data through `field1.field2.field3` and such with JSON. Best, Burak On Sat,

Re: Does mapWithState need checkpointing to be specified in Spark Streaming?

2017-07-16 Thread Yuval.Itzchakov
Yes, you do. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-mapWithState-need-checkpointing-to-be-specified-in-Spark-Streaming-tp28858p28862.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: underlying checkpoint

2017-07-16 Thread Mendelson, Assaf
Actually, show is an action. The issue is that unless you have some aggregations, show will only go over some of the dataframe, not all of it and therefore the caching won’t occur (similar to what happens with cache). You need an action which requires to go over the entire dataframe (which count