Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
I think this could be the reason : DataFrame sorts the column of each record lexicographically if we do a *select **. So, if we wish to maintain a specific column ordering while processing we should use do *select col1, col2...* instead of select *. However, this is just what I feel. Let's wait

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
Cool. Here is it how it goes... I am reading Avro objects from a Kafka topic as a DStream, converting it into a DataFrame so that I can filter out records based on some conditions and finally do some aggregations on these filtered records. During the process I also need to tag each record based

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Sainath Palla
Hi Tariq, Can you tell in brief what kind of operation you have to do? I can try helping you out with that. In general, if you are trying to use any group operations you can use window operations. On Wed, Mar 2, 2016 at 6:40 PM, Mohammad Tariq wrote: > Hi Sainath, > > Thank

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
Hi Sainath, Thank you for the prompt response! Could you please elaborate your answer a bit? I'm sorry I didn't quite get this. What kind of operation I can perform using SQLContext? It just helps us during things like DF creation, schema application etc, IMHO. [image: http://] Tariq,

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Sainath Palla
Instead of collecting the data frame, you can try using a sqlContext on the data frame. But it depends on what kind of operations are you trying to perform. On Wed, Mar 2, 2016 at 6:21 PM, Mohammad Tariq wrote: > Hi list, > > *Scenario :* > I am creating a DStream by reading

Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
Hi list, *Scenario :* I am creating a DStream by reading an Avro object from a Kafka topic and then converting it into a DataFrame to perform some operations on the data. I call DataFrame.collect() and perform the intended operation on each Row of Array[Row] returned by DataFrame.collect().