You probably want to explode the array to produce one row per element: df.select(explode(df("links")).alias("link"))
On Tue, Jul 7, 2015 at 10:29 AM, Naveen Madhire <vmadh...@umail.iu.edu> wrote: > Hi All, > > I am working with dataframes and have been struggling with this thing, any > pointers would be helpful. > > I've a Json file with the schema like this, > > links: array (nullable = true) > | |-- element: struct (containsNull = true) > | | |-- desc: string (nullable = true) > | | |-- id: string (nullable = true) > > > I want to fetch id and desc as an RDD like this RDD[(String,String)] > > i am using dataframes *df.select("links.desc","links.id > <http://links.id/>").rdd* > > the above dataframe is returning an RDD like this > RDD[(List(String),List(String)] > > > So, links:[{"one","1"},{"two","2"},{"three","3"}] json should return and > RDD[(one,1),(two,2),(three,3)] > > can anyone tell me how the dataframe select should be modified? >