Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Maciej Bryński
+1 At last :) 2016-09-26 19:56 GMT+02:00 Sameer Agarwal : > +1 (non-binding) > > On Mon, Sep 26, 2016 at 9:54 AM, Davies Liu wrote: > >> +1 (non-binding) >> >> On Mon, Sep 26, 2016 at 9:36 AM, Joseph Bradley >> wrote: >> > +1

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Holden Karau
I'm seeing some test failures with Python 3 that could definitely be environmental (going to rebuild my virtual env and double check), I'm just wondering if other people are also running the Python tests on this release or if everyone is focused on the Scala tests? On Mon, Sep 26, 2016 at 11:48

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Sameer Agarwal
+1 (non-binding) On Mon, Sep 26, 2016 at 9:54 AM, Davies Liu wrote: > +1 (non-binding) > > On Mon, Sep 26, 2016 at 9:36 AM, Joseph Bradley > wrote: > > +1 > > > > On Mon, Sep 26, 2016 at 7:47 AM, Denny Lee > wrote: > >> > >>

Sliding Window Memory use

2016-09-26 Thread Jeremy Davis
Hi, I posted this to users, but didn’t get any responses. I just wanted to highlight what seems like excessive memory use when using sliding windows. I have attached a test case where starting with certainly less than 1MB of data I can OOM a 10G heap. Regards, -JD -- import

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Denny Lee
+1 (non-binding) On Sun, Sep 25, 2016 at 23:20 Jeff Zhang wrote: > +1 > > On Mon, Sep 26, 2016 at 2:03 PM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> +1 >> >> On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee wrote: >> >>> +1 >>> >>> >>> On

Re: ArrayType support in Spark SQL

2016-09-26 Thread Takeshi Yamamuro
Hi, Since `Literal#default` can handle array types, it seems there is no strong reason for unsupporting the type in `Literal#apply`, that is, `functions.lit`. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L119 //

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-26 Thread Marcelo Vanzin
The part I don't understand is: why do you care so much about the mesos profile? The same code exists in branch-2.0, it just doesn't need a separate profile to be enabled (it's part of core). As Sean said, the change in master was purely organizational, there's no added or lost functionality. On

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Davies Liu
+1 (non-binding) On Mon, Sep 26, 2016 at 9:36 AM, Joseph Bradley wrote: > +1 > > On Mon, Sep 26, 2016 at 7:47 AM, Denny Lee wrote: >> >> +1 (non-binding) >> On Sun, Sep 25, 2016 at 23:20 Jeff Zhang wrote: >>> >>> +1 >>> >>> On

StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-09-26 Thread Holden Karau
Hi Spark Developers, After some discussion on SPARK-16407 (and on the PR ) we’ve decided to jump back to the developer list (SPARK-16407 itself comes

Re: ArrayType support in Spark SQL

2016-09-26 Thread Reynold Xin
Seems fair & easy to support. Can somebody open a JIRA ticket and patch? On Mon, Sep 26, 2016 at 9:05 AM, Takeshi Yamamuro wrote: > Hi, > > Since `Literal#default` can handle array types, it seems there is no > strong reason > for unsupporting the type in

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Joseph Bradley
+1 On Mon, Sep 26, 2016 at 7:47 AM, Denny Lee wrote: > +1 (non-binding) > On Sun, Sep 25, 2016 at 23:20 Jeff Zhang wrote: > >> +1 >> >> On Mon, Sep 26, 2016 at 2:03 PM, Shixiong(Ryan) Zhu < >> shixi...@databricks.com> wrote: >> >>> +1 >>> >>> On Sun,

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Shixiong(Ryan) Zhu
+1 On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee wrote: > +1 > > > On Sun, Sep 25, 2016 at 3:26 PM, Herman van Hövell tot Westerflier < > hvanhov...@databricks.com> wrote: > >> +1 (non-binding) >> >> On Sun, Sep 25, 2016 at 2:05 PM, Ricardo Almeida < >>

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Jeff Zhang
+1 On Mon, Sep 26, 2016 at 2:03 PM, Shixiong(Ryan) Zhu wrote: > +1 > > On Sun, Sep 25, 2016 at 10:43 PM, Pete Lee wrote: > >> +1 >> >> >> On Sun, Sep 25, 2016 at 3:26 PM, Herman van Hövell tot Westerflier < >> hvanhov...@databricks.com> wrote:

Re: Deep Equals support on Maptype

2016-09-26 Thread Takeshi Yamamuro
Hi, Have you check this jira? https://issues.apache.org/jira/browse/SPARK-9415 // maropu On Mon, Sep 26, 2016 at 7:09 PM, Lakshmi Rajagopalan wrote: > Hi, > > We wanted to extend the existing '===' on Column to support deep equals > on Maps. > > > Currently it checks for

Re: Deep Equals support on Maptype

2016-09-26 Thread Lakshmi Rajagopalan
If optimization is the problem, can we use precomputed hashes? On Mon, Sep 26, 2016 at 4:50 PM, Lakshmi Rajagopalan wrote: > Can you please help me understand why the MapType shouldn't be part of > equality tests? Practically, if we are using json line formats, the ideal >

Re: Deep Equals support on Maptype

2016-09-26 Thread Takeshi Yamamuro
yea, for all I know, there is no reasonable way to implement fast and efficient equality checks on ArrayBasedMapData (See also: https://github.com/apache/spark/pull/13847). On Mon, Sep 26, 2016 at 9:04 PM, Lakshmi Rajagopalan wrote: > If optimization is the problem, can we

Re: Deep Equals support on Maptype

2016-09-26 Thread Lakshmi Rajagopalan
Ok, but at least we can have a separate binary comparison called DeepEqualTo as an Expression which at least makes the performance issues explicit and also have a way to achieve the equality on maps. In our case, the maps are very small. And this restriction completely reduces the expressibility

Re: Deep Equals support on Maptype

2016-09-26 Thread Lakshmi Rajagopalan
Can you please help me understand why the MapType shouldn't be part of equality tests? Practically, if we are using json line formats, the ideal equals is every key should map to exactly the same value in both the maps Which also hold true in Aesthetic case where a MapType can be thought of as a

Re: StructuredStreaming Custom Sinks (motivated by Structured Streaming Machine Learning)

2016-09-26 Thread Shivaram Venkataraman
Disclaimer - I am not very closely involved with Structured Streaming design / development, so this is just my two cents from looking at the discussion in the linked JIRAs and PRs. It seems to me there are a couple of issues being conflated here: (a) is the question of how to specify or add more

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Hyukjin Kwon
+1 (non-binding) 2016-09-27 13:22 GMT+09:00 Denny Lee : > +1 on testing with Python2. > > > On Mon, Sep 26, 2016 at 3:13 PM Krishna Sankar > wrote: > >> I do run both Python and Scala. But via iPython/Python2 with my own test >> code. Not running the

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Holden Karau
+1 (non-binding) PySpark Core, ML, MLlib, SQL tests pass w/Python3 on Ubuntu 14.04 - some intermittent weird failures when running the streaming suite but seem to be flaky test issue & not a real issue, which is makes sense given how some of the Python streaming tests are structured. On Mon, Sep

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Jean-Baptiste Onofré
+1 (non binding) Regards JB On 09/27/2016 07:51 AM, Hyukjin Kwon wrote: +1 (non-binding) 2016-09-27 13:22 GMT+09:00 Denny Lee >: +1 on testing with Python2. On Mon, Sep 26, 2016 at 3:13 PM Krishna Sankar

Re: renaming "minor release" to "feature release"

2016-09-26 Thread Reynold Xin
Yup that's a good point. I think we can easily explain that in the extended description. I will update the wiki page to reflect that. On Fri, Jul 29, 2016 at 7:52 AM, Mark Hamstra wrote: > One issue worth at least considering is that our minor releases usually do > not

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Yanbo Liang
+1 On Mon, Sep 26, 2016 at 4:53 PM, akchin wrote: > +1 (non-bind) > -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Psparkr > CentOS 7.2 / openjdk version "1.8.0_101" > > > > > - > IBM Spark Technology Center > -- > View this message in context: http://apache-spark- >

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Denny Lee
+1 on testing with Python2. On Mon, Sep 26, 2016 at 3:13 PM Krishna Sankar wrote: > I do run both Python and Scala. But via iPython/Python2 with my own test > code. Not running the tests from the distribution. > Cheers > > > On Mon, Sep 26, 2016 at 11:59 AM, Holden Karau

Re: Sliding Window Memory use

2016-09-26 Thread Reynold Xin
I ran it on Databricks community edition which was a local[8] cluster with 6GB of RAM. It ran fine. That said, looking at the plan, we can definitely simplify this quite a bit. We had a new Window physical execution node for each window expression, when we could have collapsed all of them into a

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread Krishna Sankar
I do run both Python and Scala. But via iPython/Python2 with my own test code. Not running the tests from the distribution. Cheers On Mon, Sep 26, 2016 at 11:59 AM, Holden Karau wrote: > I'm seeing some test failures with Python 3 that could definitely be > environmental

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-26 Thread akchin
+1 (non-bind) -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Psparkr CentOS 7.2 / openjdk version "1.8.0_101" - IBM Spark Technology Center -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-2-0-1-RC3-tp19044p19093.html