Hi list,

*Scenario :*
I am creating a DStream by reading an Avro object from a Kafka topic and
then converting it into a DataFrame to perform some operations on the data.
I call DataFrame.collect() and perform the intended operation on each Row
of Array[Row] returned by DataFrame.collect().

*Problem : *
Calling DataFrame.collect() changes the schema of the underlying record,
thus making it impossible to get the columns by index(as the order gets
changed).

*Query :*
Is it the way DataFrame.collect() behaves or am I doing something wrong
here? In former case is there any way I can maintain the schema while
getting each Row?

Any pointers/suggestions would be really helpful. Many thanks!


[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>

Reply via email to