Re: udf forces usage of Row for complex types?

2016-09-26 Thread Koert Kuipers
https://issues.apache.org/jira/browse/SPARK-17668 On Mon, Sep 26, 2016 at 3:40 PM, Koert Kuipers wrote: > ok will create jira > > On Mon, Sep 26, 2016 at 3:27 PM, Michael Armbrust > wrote: > >> I agree this should work. We just haven't finished

Re: udf forces usage of Row for complex types?

2016-09-26 Thread Koert Kuipers
ok will create jira On Mon, Sep 26, 2016 at 3:27 PM, Michael Armbrust wrote: > I agree this should work. We just haven't finished killing the old > reflection based conversion logic now that we have more powerful/efficient > encoders. Please open a JIRA. > > On Sun,

Re: udf forces usage of Row for complex types?

2016-09-26 Thread Michael Armbrust
I agree this should work. We just haven't finished killing the old reflection based conversion logic now that we have more powerful/efficient encoders. Please open a JIRA. On Sun, Sep 25, 2016 at 2:41 PM, Koert Kuipers wrote: > after having gotten used to have case classes

Re: udf forces usage of Row for complex types?

2016-09-26 Thread Koert Kuipers
Case classes are serializable by default (they extend java Serializable trait) I am not using RDD or Dataset because I need to transform one column out of 200 or so. Dataset has the mechanisms to convert rows to case classes as needed (and make sure it's consistent with the schema). Why would

RE: udf forces usage of Row for complex types?

2016-09-26 Thread ming.he
It should be UserDefinedType. You can refer to https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala From: Koert Kuipers [mailto:ko...@tresata.com] Sent: Monday, September 26, 2016 5:42 AM To: user@spark.apache.org Subject: udf

Re: udf forces usage of Row for complex types?

2016-09-25 Thread Bedrytski Aliaksandr
Hi Koert, these case classes you are talking about, should be serializeable to be efficient (like kryo or just plain java serialization). DataFrame is not simply a collection of Rows (which are serializeable by default), it also contains a schema with different type for each column. This way any