Instead of collecting the data frame, you can try using a sqlContext on the data frame. But it depends on what kind of operations are you trying to perform.
On Wed, Mar 2, 2016 at 6:21 PM, Mohammad Tariq <donta...@gmail.com> wrote: > Hi list, > > *Scenario :* > I am creating a DStream by reading an Avro object from a Kafka topic and > then converting it into a DataFrame to perform some operations on the data. > I call DataFrame.collect() and perform the intended operation on each Row > of Array[Row] returned by DataFrame.collect(). > > *Problem : * > Calling DataFrame.collect() changes the schema of the underlying record, > thus making it impossible to get the columns by index(as the order gets > changed). > > *Query :* > Is it the way DataFrame.collect() behaves or am I doing something wrong > here? In former case is there any way I can maintain the schema while > getting each Row? > > Any pointers/suggestions would be really helpful. Many thanks! > > > [image: http://] > > Tariq, Mohammad > about.me/mti > [image: http://] > <http://about.me/mti> > >