Renaming nested columns in dataframe

2016-05-16 Thread Prashant Bhardwaj
Hi How can I rename nested columns in dataframe through scala API? Like following schema > |-- site: struct (nullable = false) > > ||-- site_id: string (nullable = true) > > ||-- site_name: string (nullable = true) > > ||-- site_domain: string (nullable = true) > > ||-- site_cat:

Re: Creating Nested dataframe from flat data.

2016-05-13 Thread Prashant Bhardwaj
uct($"a", $"b", $"c")).show() > > ---+---+---+---+ | A| B| C| D| +---+---+---+---+ | a| b| > c|[a,b,c]| +---+---+---+---+ > > You can repeat to get the inner nesting. > > Xinh > > On Fri, May 13, 2016 at 4:51 AM, Prashant Bha

Creating Nested dataframe from flat data.

2016-05-13 Thread Prashant Bhardwaj
Hi Let's say I have a flat dataframe with 6 columns like. { "a": "somevalue", "b": "somevalue", "c": "somevalue", "d": "somevalue", "e": "somevalue", "f": "somevalue" } Now I want to convert this dataframe to contain nested column like { "nested_obj1": { "a": "somevalue", "b": "somevalue" },

Re: Filtering records based on empty value of column in SparkSql

2015-12-09 Thread Prashant Bhardwaj
Yu > Sr. Infrastructure Engineer > > cel: 158-0164-9103 > wetchat: azuryy > > > On Wed, Dec 9, 2015 at 7:43 PM, Prashant Bhardwaj < > prashant2006s...@gmail.com> wrote: > >> Hi >> >> I have two columns in my json which can have null, empty a

Filtering records based on empty value of column in SparkSql

2015-12-09 Thread Prashant Bhardwaj
Hi I have two columns in my json which can have null, empty and non-empty string as value. I know how to filter records which have non-null value using following: val req_logs = sqlContext.read.json(filePath) val req_logs_with_dpid = req_log.filter("req_info.dpid is not null or

Re: Filtering records based on empty value of column in SparkSql

2015-12-09 Thread Prashant Bhardwaj
Anyway I got it. I have to use !== instead of ===. Thank BTW. On Wed, Dec 9, 2015 at 9:39 PM, Prashant Bhardwaj < prashant2006s...@gmail.com> wrote: > I have to do opposite of what you're doing. I have to filter non-empty > records. > > On Wed, Dec 9, 2015 at 9:33 PM, Gokula

Re: Filtering records based on empty value of column in SparkSql

2015-12-09 Thread Prashant Bhardwaj
[115,Aster,,30] > [116,Harrison,,20] > > Total No.of Records with AGE <=15 2 > [110,Harrison,Male,15] > [113,Harrison,,15] > > Thanks & Regards, > Gokula Krishnan* (Gokul)* > Contact :+1 980-298-1740 > > On Wed, Dec 9, 2015 at 8:24 AM, Prashant Bhardwaj <

Spark and Kafka Integration

2015-12-07 Thread Prashant Bhardwaj
Hi Some Background: We have a Kafka cluster with ~45 topics. Some of topics contains logs in Json format and some in PSV(pipe separated value) format. Now I want to consume these logs using Spark streaming and store them in Parquet format in HDFS. Now my question is: 1. Can we create a