Re: Joining a RDD to a Dataframe

2016-05-13 Thread Xinh Huynh
Hi Cyril, In the case where there are no documents, it looks like there is a typo in "addresses" (check the number of "d"s): | scala> df.select(explode(df("addresses.id")).as("aid"), df("id")) <== addresses | org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among

Re: Joining a RDD to a Dataframe

2016-05-12 Thread Cyril Scetbon
Nobody has the answer ? Another thing I've seen is that if I have no documents at all : scala> df.select(explode(df("addresses.id")).as("aid")).collect res27: Array[org.apache.spark.sql.Row] = Array() Then scala> df.select(explode(df("addresses.id")).as("aid"), df("id"))

Re: Joining a RDD to a Dataframe

2016-05-08 Thread Cyril Scetbon
Hi Ashish, The issue is not related to converting a RDD to a DF. I did it. I was just asking if I should do it differently. The issue regards the exception when using array_contains with a sql.Column instead of a value. I found another way to do it using explode as follows :

Re: Joining a RDD to a Dataframe

2016-05-08 Thread Ashish Dubey
Is there any reason you dont want to convert this - i dont think join b/w RDD and DF is supported. On Sat, May 7, 2016 at 11:41 PM, Cyril Scetbon wrote: > Hi, > > I have a RDD built during a spark streaming job and I'd like to join it to > a DataFrame (E/S input) to

Joining a RDD to a Dataframe

2016-05-08 Thread Cyril Scetbon
Hi, I have a RDD built during a spark streaming job and I'd like to join it to a DataFrame (E/S input) to enrich it. It seems that I can't join the RDD and the DF without converting first the RDD to a DF (Tell me if I'm wrong). Here are the schemas of both DF : scala> df res32: