https://issues.apache.org/jira/browse/SPARK-17668
On Mon, Sep 26, 2016 at 3:40 PM, Koert Kuipers wrote:
> ok will create jira
>
> On Mon, Sep 26, 2016 at 3:27 PM, Michael Armbrust
> wrote:
>
>> I agree this should work. We just haven't finished
ok will create jira
On Mon, Sep 26, 2016 at 3:27 PM, Michael Armbrust
wrote:
> I agree this should work. We just haven't finished killing the old
> reflection based conversion logic now that we have more powerful/efficient
> encoders. Please open a JIRA.
>
> On Sun,
I agree this should work. We just haven't finished killing the old
reflection based conversion logic now that we have more powerful/efficient
encoders. Please open a JIRA.
On Sun, Sep 25, 2016 at 2:41 PM, Koert Kuipers wrote:
> after having gotten used to have case classes
Case classes are serializable by default (they extend java Serializable
trait)
I am not using RDD or Dataset because I need to transform one column out of
200 or so.
Dataset has the mechanisms to convert rows to case classes as needed (and
make sure it's consistent with the schema). Why would
forces usage of Row for complex types?
after having gotten used to have case classes represent complex structures in
Datasets, i am surprised to find out that when i work in DataFrames with udfs
no such magic exists, and i have to fall back to manipulating Row objects,
which is error prone
Hi Koert,
these case classes you are talking about, should be serializeable to be
efficient (like kryo or just plain java serialization).
DataFrame is not simply a collection of Rows (which are serializeable by
default), it also contains a schema with different type for each column.
This way any
after having gotten used to have case classes represent complex structures
in Datasets, i am surprised to find out that when i work in DataFrames with
udfs no such magic exists, and i have to fall back to manipulating Row
objects, which is error prone and somewhat ugly.
for example:
case class