Re: Filter applied on merged Parquet shemsa with new column fails.

2015-10-28 Thread Cheng Lian
Hey Hyukjin, Sorry that I missed the JIRA ticket. Thanks for bring this issue up here, your detailed investigation. From my side, I think this is a bug of Parquet. Parquet was designed to support schema evolution. When scanning a Parquet, if a column exists in the requested schema but

Re: Pickle Spark DataFrame

2015-10-28 Thread Reynold Xin
What are you trying to accomplish to pickle a Spark DataFrame? If your dataset is large, it doesn't make much sense to pickle it. If your dataset is small, maybe it's best to just pickle a Pandas dataframe. On Tue, Oct 27, 2015 at 9:47 PM, agg212 wrote: > Hi, I'd like to

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
I tried adding the aggregate functions in the registry and they work, other than mean, for which Ted has forwarded some code changes. I will try out those changes and update the status here. On Wed, Oct 28, 2015 at 9:03 AM, Shagun Sodhani wrote: > Yup avg works good.

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
Since there is already Average, the simplest change is the following: $ git diff sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
Wouldnt it be: +expression[Max]("avg"), On Wed, Oct 28, 2015 at 4:06 PM, Ted Yu wrote: > Since there is already Average, the simplest change is the following: > > $ git diff > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala >

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
Also are the other aggregate functions to be treated as bugs or not? On Wed, Oct 28, 2015 at 4:08 PM, Shagun Sodhani wrote: > Wouldnt it be: > > +expression[Max]("avg"), > > On Wed, Oct 28, 2015 at 4:06 PM, Ted Yu wrote: > >> Since there is

Re: Exception when using some aggregate operators

2015-10-28 Thread Reynold Xin
I don't think these are bugs. The SQL standard for average is "avg", not "mean". Similarly, a distinct count is supposed to be written as "count(distinct col)", not "countDistinct(col)". We can, however, make "mean" an alias for "avg" to improve compatibility between DataFrame and SQL. On Wed,

Re: Exception when using some aggregate operators

2015-10-28 Thread Ted Yu
Created SPARK-11371 with a patch. Will create PR soon. On Wed, Oct 28, 2015 at 3:42 AM, Reynold Xin wrote: > I don't think these are bugs. The SQL standard for average is "avg", not > "mean". Similarly, a distinct count is supposed to be written as > "count(distinct col)",

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
@Reynold I seem to be missing something. Aren't the functions listed here to be treated as sql operators as well? I do see that these are mentioned as Functions available for DataFrame

Re: Exception when using some aggregate operators

2015-10-28 Thread Reynold Xin
No those are just functions for the DataFrame programming API. On Wed, Oct 28, 2015 at 11:49 AM, Shagun Sodhani wrote: > @Reynold I seem to be missing something. Aren't the functions listed here >

Re: Exception when using some aggregate operators

2015-10-28 Thread Shagun Sodhani
Ohh great! Thanks for the clarification. On Wed, Oct 28, 2015 at 4:21 PM, Reynold Xin wrote: > No those are just functions for the DataFrame programming API. > > On Wed, Oct 28, 2015 at 11:49 AM, Shagun Sodhani > wrote: > >> @Reynold I seem to be