Re: [VOTE] Release Apache Spark 1.1.0 (RC3)

2014-09-03 Thread Patrick Wendell
I'm cancelling this release in favor of RC4. Happy voting! On Tue, Sep 2, 2014 at 9:55 PM, Patrick Wendell pwend...@gmail.com wrote: Thanks everyone for voting on this. There were two minor issues (one a blocker) were found that warrant cutting a new RC. For those who voted +1 on this release,

[VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc4 (commit 2f9b2bd): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=2f9b2bd7844ee8393dc9c319f4fefedf95f5e460 The release files, including signatures, digests, etc.

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
I'll kick it off with a +1 On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.1.0! The tag to be voted on is v1.1.0-rc4 (commit 2f9b2bd):

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Reynold Xin
+1 Tested locally on Mac OS X with local-cluster mode. On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com wrote: I'll kick it off with a +1 On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Michael Armbrust
+1 On Wed, Sep 3, 2014 at 12:29 AM, Reynold Xin r...@databricks.com wrote: +1 Tested locally on Mac OS X with local-cluster mode. On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com wrote: I'll kick it off with a +1 On Wed, Sep 3, 2014 at 12:24 AM, Patrick

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Xiangrui Meng
+1. Tested some MLlib example code. For default changes, maybe it is useful to mention the default broadcast factory changed to torrent. On Wed, Sep 3, 2014 at 12:34 AM, Michael Armbrust mich...@databricks.com wrote: +1 On Wed, Sep 3, 2014 at 12:29 AM, Reynold Xin r...@databricks.com wrote:

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Andrew Or
+1 Tested on Yarn and Windows. Also verified that standalone cluster mode is now fixed. 2014-09-03 1:25 GMT-07:00 Xiangrui Meng men...@gmail.com: +1. Tested some MLlib example code. For default changes, maybe it is useful to mention the default broadcast factory changed to torrent. On

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Sean Owen
+1 signatures still fine, tests still pass. On Mac OS X I get the following failure but I think it's spurious. Only mentioning it to see if anyone else sees it. It doesn't happen on Linux. [error] Test org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream failed:

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Nicholas Chammas
On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell pwend...@gmail.com wrote: == What default changes should I be aware of? == 1. The default value of spark.io.compression.codec is now snappy -- Old behavior can be restored by switching to lzf 2. PySpark now performs external spilling during

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Matthew Farrellee
+1 built from sha w/ make-distribution.sh tested basic examples (0 data) w/ local on fedora 20 (openjdk 1.7, python 2.7.5) tested detection and log processing (25GB data) w/ mesos (0.19.0) nfs on rhel 7 (openjdk 1.7, python 2.7.5) On 09/03/2014 03:24 AM, Patrick Wendell wrote: Please vote

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Patrick Wendell
Hey Nick, Yeah we'll put those in the release notes. On Wed, Sep 3, 2014 at 7:23 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Wed, Sep 3, 2014 at 3:24 AM, Patrick Wendell pwend...@gmail.com wrote: == What default changes should I be aware of? == 1. The default value of

spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Nicholas Chammas
Spawned by this discussion https://github.com/apache/spark/pull/1120#issuecomment-54305831. See these 2 lines in spark_ec2.py: - spark_ec2 L42 https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L42 - spark_ec2 L566

Re: spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Shivaram Venkataraman
The spark-ec2 repository isn't a part of Mesos. Back in the days, Spark used to be hosted in the Mesos github organization as well and so we put scripts that were used by Spark under the same organization. FWIW I don't think these scripts belong in the Spark repository. They are helper scripts

Re: spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Matthew Farrellee
that's not a bad idea. it would also break the circular dep in versions that results in spark X's ec2 script installing spark X-1 by default. best, matt On 09/03/2014 01:17 PM, Shivaram Venkataraman wrote: The spark-ec2 repository isn't a part of Mesos. Back in the days, Spark used to be

Is breeze thread safe in Spark?

2014-09-03 Thread Ulanov, Alexander
Hi, Is breeze library called thread safe from Spark mllib code in case when native libs for blas and lapack are used? Might it be an issue when running Spark locally? Best regards, Alexander - To unsubscribe, e-mail:

Re: spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Shivaram Venkataraman
Actually the circular dependency doesn't depend on the spark-ec2 scripts -- The scripts contain download links to many Spark versions and you can configure which one should be used. Shivaram On Wed, Sep 3, 2014 at 10:22 AM, Matthew Farrellee m...@redhat.com wrote: that's not a bad idea. it

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Marcelo Vanzin
+1 (non-binding) - checked checksums of a few packages - ran few jobs against yarn client/cluster using hadoop2.3 package - played with spark-shell in yarn-client mode On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as

Re: spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Matthew Farrellee
oh, i see pwendell is did a patch to the release branch to make the release version == --spark-version default best, matt On 09/03/2014 01:30 PM, Shivaram Venkataraman wrote: Actually the circular dependency doesn't depend on the spark-ec2 scripts -- The scripts contain download links to

Re: Ask something about spark

2014-09-03 Thread Matthew Farrellee
reynold, would you folks be willing to put some creative commons license information on the site and its content? best, matt On 09/02/2014 06:32 PM, Reynold Xin wrote: I think in general that is fine. It would be great if your slides come with proper attribution. On Tue, Sep 2, 2014 at

Re: Ask something about spark

2014-09-03 Thread Reynold Xin
I am not sure if I can just go ahead and update the website with a creative common license. IIRC, ASF websites are also Apache 2.0 license. Might need somebody from legal to chime in. On Wed, Sep 3, 2014 at 11:15 AM, Matthew Farrellee m...@redhat.com wrote: reynold, would you folks be

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Josh Rosen
+1.  Tested on Windows and EC2.  Confirmed that the EC2 pvm-hvm switch fixed the SPARK-3358 regression. On September 3, 2014 at 10:33:45 AM, Marcelo Vanzin (van...@cloudera.com) wrote: +1 (non-binding) - checked checksums of a few packages - ran few jobs against yarn client/cluster using

Re: Ask something about spark

2014-09-03 Thread Matthew Farrellee
CC or Apache, it'd be helpful to have it listed in the footer of pages best, matt On 09/03/2014 02:23 PM, Reynold Xin wrote: I am not sure if I can just go ahead and update the website with a creative common license. IIRC, ASF websites are also Apache 2.0 license. Might need somebody from

Re: Is breeze thread safe in Spark?

2014-09-03 Thread RJ Nowling
David, Can you confirm that += is not thread safe but + is? I'm assuming + allocates a new object for the write, while += doesn't. Thanks! RJ On Wed, Sep 3, 2014 at 2:50 PM, David Hall d...@cs.berkeley.edu wrote: In general, in Breeze we allocate separate work arrays for each call to

Re: Is breeze thread safe in Spark?

2014-09-03 Thread Evan R. Sparks
Additionally, at the higher level, MLlib allocates separate Breeze Vectors/Matrices on a Per-executor basis. The only place I can think of where data structures might be over-written concurrently is in a .aggregate() call, and these calls happen sequentially. RJ - Do you have a JIRA reference for

Re: Is breeze thread safe in Spark?

2014-09-03 Thread RJ Nowling
Never filed a JIRA -- I actually forgot about it. Let me file one now. On Wed, Sep 3, 2014 at 2:58 PM, Evan R. Sparks evan.spa...@gmail.com wrote: Additionally, at the higher level, MLlib allocates separate Breeze Vectors/Matrices on a Per-executor basis. The only place I can think of

Re: Is breeze thread safe in Spark?

2014-09-03 Thread David Hall
mutating operations are not thread safe. Operations that don't mutate should be thread safe. I can't speak to what Evan said, but I would guess that the way they're using += should be safe. On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote: David, Can you confirm that +=

Re: Is breeze thread safe in Spark?

2014-09-03 Thread RJ Nowling
Here's the JIRA: https://issues.apache.org/jira/browse/SPARK-3384 Even if the current implementation uses += in a thread safe manner, it can be easy to make the mistake of accidentally using += in a parallelized context. I suggest changing all instances of += to +. I would encourage others to

Re: Is breeze thread safe in Spark?

2014-09-03 Thread Xiangrui Meng
RJ, could you provide a code example that can re-produce the bug you observed in local testing? Breeze's += is not thread-safe. But in a Spark job, calls to a resultHandler is synchronized:

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Cheng Lian
+1. Tested locally on OSX 10.9, built with Hadoop 2.4.1 - Checked Datanucleus jar files - Tested Spark SQL Thrift server and CLI under local mode and standalone cluster against MySQL backed metastore On Wed, Sep 3, 2014 at 11:25 AM, Josh Rosen rosenvi...@gmail.com wrote: +1. Tested on

Re: Is breeze thread safe in Spark?

2014-09-03 Thread Ulanov, Alexander
What about the allocation of a new breeze vector? Can it happen unsafe within Spark (in several threads)? Best regards, Alexander 03.09.2014, в 23:17, Xiangrui Meng men...@gmail.com написал(а): RJ, could you provide a code example that can re-produce the bug you observed in local testing?

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Mubarak Seyed
+1 (non-binding) Tested locally on Mac OS X with local-cluster mode. On Wed, Sep 3, 2014 at 12:23 PM, Cheng Lian lian.cs@gmail.com wrote: +1. Tested locally on OSX 10.9, built with Hadoop 2.4.1 - Checked Datanucleus jar files - Tested Spark SQL Thrift server and CLI under local mode

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Nan Zhu
+1 tested thrift server with our in-house application, everything works fine -- Nan Zhu On Wednesday, September 3, 2014 at 4:43 PM, Matei Zaharia wrote: +1 Matei On September 3, 2014 at 12:24:32 PM, Cheng Lian (lian.cs@gmail.com (mailto:lian.cs@gmail.com)) wrote: +1.

Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Denny Lee
+1 on OSX Yosemite, built with Hadoop 2.4.1, Hive 0.12 testing SparkSQL, Thrift, MySQL metastore On Wed, Sep 3, 2014 at 4:02 PM, Jeremy Freeman freeman.jer...@gmail.com wrote: +1 -- View this message in context:

memory size for caching RDD

2014-09-03 Thread 牛兆捷
Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- *Regards,* *Zhaojie*

Re: memory size for caching RDD

2014-09-03 Thread Patrick Wendell
Changing this is not supported, it si immutable similar to other spark configuration settings. On Wed, Sep 3, 2014 at 8:13 PM, 牛兆捷 nzjem...@gmail.com wrote: Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts,