Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-18 Thread Olivier Girardot
a écrit : Thank you, should I open a JIRA for this issue? -- *From:* Olivier Girardot [mailto:ssab...@gmail.com] *Sent:* Tuesday, May 12, 2015 5:12 AM *To:* Reynold Xin *Cc:* Haopu Wang; user *Subject:* Re: [SparkSQL 1.4.0] groupBy columns are always nullable

RE: [SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-14 Thread Haopu Wang
Thank you, should I open a JIRA for this issue? From: Olivier Girardot [mailto:ssab...@gmail.com] Sent: Tuesday, May 12, 2015 5:12 AM To: Reynold Xin Cc: Haopu Wang; user Subject: Re: [SparkSQL 1.4.0] groupBy columns are always nullable? I'll look

[SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-11 Thread Haopu Wang
I try to get the result schema of aggregate functions using DataFrame API. However, I find the result field of groupBy columns are always nullable even the source field is not nullable. I want to know if this is by design, thank you! Below is the simple code to show the issue. == import

Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-11 Thread Olivier Girardot
Hi Haopu, actually here `key` is nullable because this is your input's schema : scala result.printSchema root |-- key: string (nullable = true) |-- SUM(value): long (nullable = true) scala df.printSchema root |-- key: string (nullable = true) |-- value: long (nullable = false) I tried it with a

Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-11 Thread Reynold Xin
Thanks for catching this. I didn't read carefully enough. It'd make sense to have the udaf result be non-nullable, if the exprs are indeed non-nullable. On Mon, May 11, 2015 at 1:32 PM, Olivier Girardot ssab...@gmail.com wrote: Hi Haopu, actually here `key` is nullable because this is your

Re: [SparkSQL 1.4.0] groupBy columns are always nullable?

2015-05-11 Thread Reynold Xin
Not by design. Would you be interested in submitting a pull request? On Mon, May 11, 2015 at 1:48 AM, Haopu Wang hw...@qilinsoft.com wrote: I try to get the result schema of aggregate functions using DataFrame API. However, I find the result field of groupBy columns are always nullable even