Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Muthu Jayakumar
Thanks Micheal. Let me test it with a recent master code branch. Also for every mapping step should I have to create a new case class? I cannot use Tuple as I have ~130 columns to process. Earlier I had used a Seq[Any] (actually Array[Any] to optimize on serialization) but processed it using RDD

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Michael Armbrust
> > df1.as[TestCaseClass].map(_.toMyMap).show() //fails > > This looks like a bug. What is the error? It might be fixed in branch-1.6/master if you can test there. > Please advice on what I may be missing here? > > > Also for join, may I suggest to have a custom encoder / transformation to >

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Muthu Jayakumar
I tried to rerun the same code with current snapshot version of 1.6 and 2.0 from https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.11/ But I still see an exception around the same line. Here is the exception below. Filed an issue against the same

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Michael Armbrust
Awesome, thanks for opening the JIRA! We'll take a look. On Tue, Jan 12, 2016 at 1:53 PM, Muthu Jayakumar wrote: > I tried to rerun the same code with current snapshot version of 1.6 and > 2.0 from >

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-11 Thread Michael Armbrust
> > Also, while extracting a value into Dataset using as[U] method, how could > I specify a custom encoder/translation to case class (where I don't have > the same column-name mapping or same data-type mapping)? > There is no public API yet for defining your own encoders. You change the column

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-11 Thread Arkadiusz Bicz
Hi, There are some documentation in https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Aggregator.html and also you can check out tests of DatasetSuite in spark sources. BR, Arkadiusz Bicz On Mon, Jan 11, 2016 at 5:37 AM, Muthu Jayakumar

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-11 Thread Muthu Jayakumar
Hello Michael, Thank you for the suggestion. This should do the trick for column names. But how could I transform columns value type? Do I have to use an UDF? In case if I use UDF, then the other question I may have is pertaining to the map step in dataset, where I am running into an error when I

Spark 1.6 udf/udaf alternatives in dataset?

2016-01-10 Thread Muthu Jayakumar
Hello there, While looking at the features of Dataset, it seem to provide an alternative way towards udf and udaf. Any documentation or sample code snippet to write this would be helpful in rewriting existing UDFs into Dataset mapping step. Also, while extracting a value into Dataset using as[U]