Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-07-18 Thread Naveen Madhire
I am facing the same issue, i tried this but getting compilation error for the $ in the explode function So, I had to modify to the below to make it work. df.select(explode(new Column(entities.user_mentions)).as(mention)) On Wed, Jun 24, 2015 at 2:48 PM, Michael Armbrust

How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-06-24 Thread Gustavo Arjones
Hi All, I am using the new Apache Spark version 1.4.0 Data-frames API to extract information from Twitter's Status JSON, mostly focused on the Entities Object https://dev.twitter.com/overview/api/entities - the relevant part to this question is showed below: { ... ... entities: {

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-06-24 Thread Yin Huai
The function accepted by explode is f: Row = TraversableOnce[A]. Seems user_mentions is an array of structs. So, can you change your pattern matching to the following? case Row(rows: Seq[_]) = rows.asInstanceOf[Seq[Row]].map(elem = ...) On Wed, Jun 24, 2015 at 5:27 AM, Gustavo Arjones

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-06-24 Thread Michael Armbrust
Starting in Spark 1.4 there is also an explode that you can use directly from the select clause (much like in HiveQL): import org.apache.spark.sql.functions._ df.select(explode($entities.user_mentions).as(mention)) Unlike standard HiveQL, you can also include other attributes in the select or