Instead of collecting the data frame, you can try using a sqlContext on the
data frame. But it depends on what kind of operations are you trying to
perform.

On Wed, Mar 2, 2016 at 6:21 PM, Mohammad Tariq <donta...@gmail.com> wrote:

> Hi list,
>
> *Scenario :*
> I am creating a DStream by reading an Avro object from a Kafka topic and
> then converting it into a DataFrame to perform some operations on the data.
> I call DataFrame.collect() and perform the intended operation on each Row
> of Array[Row] returned by DataFrame.collect().
>
> *Problem : *
> Calling DataFrame.collect() changes the schema of the underlying record,
> thus making it impossible to get the columns by index(as the order gets
> changed).
>
> *Query :*
> Is it the way DataFrame.collect() behaves or am I doing something wrong
> here? In former case is there any way I can maintain the schema while
> getting each Row?
>
> Any pointers/suggestions would be really helpful. Many thanks!
>
>
> [image: http://]
>
> Tariq, Mohammad
> about.me/mti
> [image: http://]
> <http://about.me/mti>
>
>

Reply via email to