date:20150203

Re: Welcoming three new committers

2015-02-03 Thread Joseph Bradley

Thanks to everyone in the community for past collaborations, and I look
forward to continuing in the future!
Joseph

On Tue, Feb 3, 2015 at 6:23 PM, Shixiong Zhu zsxw...@gmail.com wrote:

 Congrats guys!

 Best Regards,
 Shixiong Zhu

 2015-02-04 6:34 GMT+08:00 Matei Zaharia matei.zaha...@gmail.com:

 Hi all,

 The PMC recently voted to add three new committers: Cheng Lian, Joseph
 Bradley and Sean Owen. All three have been major contributors to Spark in
 the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many
 pieces throughout Spark Core. Join me in welcoming them as committers!

 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

ASF Git / GitHub sync is down

2015-02-03 Thread Reynold Xin

Haven't sync-ed anything for the last 4 hours. Seems like this little piece
of infrastructure always stops working around our own code freeze time ...

Re: Welcoming three new committers

2015-02-03 Thread Manish Amde

Congratulations Cheng, Joseph and Sean.

On Tuesday, February 3, 2015, Zhan Zhang zzh...@hortonworks.com wrote:

 Congratulations!

 On Feb 3, 2015, at 2:34 PM, Matei Zaharia matei.zaha...@gmail.com
 javascript:; wrote:

  Hi all,
 
  The PMC recently voted to add three new committers: Cheng Lian, Joseph
 Bradley and Sean Owen. All three have been major contributors to Spark in
 the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many
 pieces throughout Spark Core. Join me in welcoming them as committers!
 
  Matei
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:;
  For additional commands, e-mail: dev-h...@spark.apache.org
 javascript:;
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:;
 For additional commands, e-mail: dev-h...@spark.apache.org javascript:;

Re: Welcoming three new committers

2015-02-03 Thread Ye Xianjin

Congratulations!

-- 
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, February 4, 2015 at 6:34 AM, Matei Zaharia wrote:

 Hi all,
 
 The PMC recently voted to add three new committers: Cheng Lian, Joseph 
 Bradley and Sean Owen. All three have been major contributors to Spark in the 
 past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many 
 pieces throughout Spark Core. Join me in welcoming them as committers!
 
 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
 (mailto:dev-unsubscr...@spark.apache.org)
 For additional commands, e-mail: dev-h...@spark.apache.org 
 (mailto:dev-h...@spark.apache.org)

Re: Welcoming three new committers

2015-02-03 Thread Zhan Zhang

Congratulations!

On Feb 3, 2015, at 2:34 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

 Hi all,
 
 The PMC recently voted to add three new committers: Cheng Lian, Joseph 
 Bradley and Sean Owen. All three have been major contributors to Spark in the 
 past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many 
 pieces throughout Spark Core. Join me in welcoming them as committers!
 
 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: ASF Git / GitHub sync is down

2015-02-03 Thread Reynold Xin

I filed an INFRA ticket: https://issues.apache.org/jira/browse/INFRA-9115



I wish ASF can reconsider requests like this in order to handle downtime
gracefully https://issues.apache.org/jira/browse/INFRA-8738

On Tue, Feb 3, 2015 at 9:09 PM, Reynold Xin r...@databricks.com wrote:

 Haven't sync-ed anything for the last 4 hours. Seems like this little
 piece of infrastructure always stops working around our own code freeze
 time ...

Re: Welcoming three new committers

2015-02-03 Thread Shixiong Zhu

Congrats guys!

Best Regards,
Shixiong Zhu

2015-02-04 6:34 GMT+08:00 Matei Zaharia matei.zaha...@gmail.com:

 Hi all,

 The PMC recently voted to add three new committers: Cheng Lian, Joseph
 Bradley and Sean Owen. All three have been major contributors to Spark in
 the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many
 pieces throughout Spark Core. Join me in welcoming them as committers!

 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread prabeesh k

Congratulations!

On 4 February 2015 at 02:34, Matei Zaharia matei.zaha...@gmail.com wrote:

 Hi all,

 The PMC recently voted to add three new committers: Cheng Lian, Joseph
 Bradley and Sean Owen. All three have been major contributors to Spark in
 the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many
 pieces throughout Spark Core. Join me in welcoming them as committers!

 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: SparkSubmit.scala and stderr

2015-02-03 Thread Marcelo Vanzin

Hi Jay,

On Tue, Feb 3, 2015 at 6:28 AM, jayhutfles jayhutf...@gmail.com wrote:
 // Exposed for testing
 private[spark] var printStream: PrintStream = System.err

 But as the comment states that it's for testing, maybe I'm
 misunderstanding its intent...

The comment is there to tell someone reading the code that this field
is a `var` and not private just because test code (SparkSubmitSuite in
this case) needs to modify it, otherwise it wouldn't exist or would be
private. Similar in spirit to this annotation:

http://guava-libraries.googlecode.com/svn/tags/release09/javadoc/com/google/common/annotations/VisibleForTesting.html

(Which I'd probably have used in this case, but is not really common
in Spark code.)

-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: SparkSubmit.scala and stderr

2015-02-03 Thread Reynold Xin

We can use ScalaTest's privateMethodTester also instead of exposing that.

On Tue, Feb 3, 2015 at 2:18 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Hi Jay,

 On Tue, Feb 3, 2015 at 6:28 AM, jayhutfles jayhutf...@gmail.com wrote:
  // Exposed for testing
  private[spark] var printStream: PrintStream = System.err

  But as the comment states that it's for testing, maybe I'm
  misunderstanding its intent...

 The comment is there to tell someone reading the code that this field
 is a `var` and not private just because test code (SparkSubmitSuite in
 this case) needs to modify it, otherwise it wouldn't exist or would be
 private. Similar in spirit to this annotation:


 http://guava-libraries.googlecode.com/svn/tags/release09/javadoc/com/google/common/annotations/VisibleForTesting.html

 (Which I'd probably have used in this case, but is not really common
 in Spark code.)

 --
 Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Jenkins install reference

2015-02-03 Thread shane knapp

here's the wiki describing the system setup:
https://cwiki.apache.org/confluence/display/SPARK/Spark+QA+Infrastructure

we have 1 master and 8 worker nodes, 12 executors per worker (we'd be
better off w/more and smaller worker nodes however).

you don't need to install sbt -- it's in the build/ directory.

the pull request builder builds in parallel, but the master builds require
specific ports to be reserved and each build effectively locks down a
worker until it's done.  since we have 8 worker nodes, it's not *that* big
of a deal...

shane

On Tue, Feb 3, 2015 at 4:36 AM, scwf wangf...@huawei.com wrote:

 Here my question is:
 1 How to set jenkins to make it build for multi PR parallel?. or one
 machine only support one PR building?
 2 do we need install sbt on the CI machine since the script
 dev/run-tests will auto fetch the sbt jar ?

 - Fei



 On 2015/2/3 15:53, scwf wrote:

 Hi, all
we want to set up a CI env for spark in our team, is there any
 reference of how to install jenkins over spark?
Thanks

 Fei


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org






 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: [spark-sql] JsonRDD

2015-02-03 Thread Daniil Osipov

Thanks Reynold,

Case sensitivity issues are definitely orthogonal. I'll submit a bug or PR.

Is there a way to rename the object to eliminate the confusion? Not sure
how locked down the API is at this time, but it seems like a potential
confusion point for developers.

On Mon, Feb 2, 2015 at 4:30 PM, Reynold Xin r...@databricks.com wrote:

 It's bad naming - JsonRDD is actually not an RDD. It is just a set of util
 methods.

 The case sensitivity issues seem orthogonal, and would be great to be able
 to control that with a flag.


 On Mon, Feb 2, 2015 at 4:16 PM, Daniil Osipov daniil.osi...@shazam.com
 wrote:

 Hey Spark developers,

 Is there a good reason for JsonRDD being a Scala object as opposed to
 class? Seems most other RDDs are classes, and can be extended.

 The reason I'm asking is that there is a problem with Hive
 interoperability
 with JSON DataFrames where jsonFile generates case sensitive schema, while
 Hive expects case insensitive and fails with an exception during
 saveAsTable if there are two columns with the same name in different case.

 I'm trying to resolve the problem, but that requires me to extend JsonRDD,
 which I can't do. Other RDDs are subclass friendly, why is JsonRDD
 different?

 Dan

Re: Accessing indices and values in SparseVector

2015-02-03 Thread Sean Owen

When you are describing an error, you should say what the error is.
Here I'm pretty sure it says there is no such member of Vector, right?
You explicitly made the type of sv2 Vector and not SparseVector, and
the trait does not have any indices member. No it's not a problem, and
I think the compiler tells you what's happening in this case.

On Tue, Feb 3, 2015 at 6:17 AM, Manoj Kumar
manojkumarsivaraj...@gmail.com wrote:
 Hello,

 This is related to one of the issues that I'm working on. I am not sure if
 this is expected behavior or not.

 This works fine.
 val sv2 = new SparseVector(3, Array(0, 2), Array(1.1, 3.0))
 sv2.indices

 But when I do this
 val sv2: Vector = Vectors.sparse(3, Array(0, 2), Array(1.1, 3.0))
 sv2.indices

 It raises an error.

 If agreed that this is not expected, I can send a Pull Request.
 Thanks.



 --
 Godspeed,
 Manoj Kumar,
 http://manojbits.wordpress.com
 http://goog_1017110195
 http://github.com/MechCoder

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Accessing indices and values in SparseVector

2015-02-03 Thread Manoj Kumar

Alright, thanks for the quick clarification.

Accessing indices and values in SparseVector

2015-02-03 Thread Manoj Kumar

Hello,

This is related to one of the issues that I'm working on. I am not sure if
this is expected behavior or not.

This works fine.
val sv2 = new SparseVector(3, Array(0, 2), Array(1.1, 3.0))
sv2.indices

But when I do this
val sv2: Vector = Vectors.sparse(3, Array(0, 2), Array(1.1, 3.0))
sv2.indices

It raises an error.

If agreed that this is not expected, I can send a Pull Request.
Thanks.



-- 
Godspeed,
Manoj Kumar,
http://manojbits.wordpress.com
http://goog_1017110195
http://github.com/MechCoder

Re: Can spark provide an option to start reduce stage early?

2015-02-03 Thread Kay Ousterhout

There's a JIRA tracking this here:
https://issues.apache.org/jira/browse/SPARK-2387

On Mon, Feb 2, 2015 at 9:48 PM, Xuelin Cao xuelincao2...@gmail.com wrote:

 In hadoop MR, there is an option *mapred.reduce.slowstart.completed.maps*

 which can be used to start reducer stage when X% mappers are completed. By
 doing this, the data shuffling process is able to parallel with the map
 process.

 In a large multi-tenancy cluster, this option is usually tuned off. But, in
 some cases, turn on the option could accelerate some high priority jobs.

 Will spark provide similar option?

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-03 Thread Dirceu Semighini Filho

Hi Patrick,
I work in an Startup and we want make one of our projects as open source.
This project is based on Spark, and it will help users to instantiate spark
clusters in a cloud environment.
But for that project we need to use the repl, hive and thrift-server.
Can the decision of not publishing this libraries be changed in this
release?

Kind Regards,
Dirceu

2015-02-03 10:18 GMT-02:00 Sean Owen so...@cloudera.com:

 +1

 The signatures are still fine.
 Building for Hadoop 2.6 with YARN works; tests pass, except that
 MQTTStreamSuite, which we established is a test problem and already
 fixed in master.

 On Tue, Feb 3, 2015 at 12:34 AM, Krishna Sankar ksanka...@gmail.com
 wrote:
  +1 (non-binding, of course)
 
  1. Compiled OSX 10.10 (Yosemite) OK Total time: 11:13 min
   mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
  -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11
  2. Tested pyspark, mlib - running as well as compare results with 1.1.x 
  1.2.0
  2.1. statistics (min,max,mean,Pearson,Spearman) OK
  2.2. Linear/Ridge/Laso Regression OK
  2.3. Decision Tree, Naive Bayes OK
  2.4. KMeans OK
 Center And Scale OK
 Fixed : org.apache.spark.SparkException in zip !
  2.5. rdd operations OK
State of the Union Texts - MapReduce, Filter,sortByKey (word count)
  2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
 Model evaluation/optimization (rank, numIter, lmbda) with
 itertools
  OK
  3. Scala - MLLib
  3.1. statistics (min,max,mean,Pearson,Spearman) OK
  3.2. LinearRegressionWIthSGD OK
  3.3. Decision Tree OK
  3.4. KMeans OK
  3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
 
  Cheers
  k/
 
 
  On Mon, Feb 2, 2015 at 8:57 PM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  Please vote on releasing the following candidate as Apache Spark version
  1.2.1!
 
  The tag to be voted on is v1.2.1-rc3 (commit b6eaf77):
 
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-1.2.1-rc3/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1065/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-1.2.1-rc3-docs/
 
  Changes from rc2:
  A single patch fixing a windows issue.
 
  Please vote on releasing this package as Apache Spark 1.2.1!
 
  The vote is open until Friday, February 06, at 05:00 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.2.1
  [ ] -1 Do not release this package because ...
 
  For a list of fixes in this release, see http://s.apache.org/Mpn.
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-03 Thread Nicholas Chammas

I believe this was changed for 1.2.1. Here are the relevant JIRA issues
https://issues.apache.org/jira/browse/SPARK-5289?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%201.2.1%20AND%20text%20~%20%22publish%22%20order%20by%20priority
.

On Tue Feb 03 2015 at 10:43:59 AM Dirceu Semighini Filho 
dirceu.semigh...@gmail.com wrote:

 Hi Patrick,
 I work in an Startup and we want make one of our projects as open source.
 This project is based on Spark, and it will help users to instantiate spark
 clusters in a cloud environment.
 But for that project we need to use the repl, hive and thrift-server.
 Can the decision of not publishing this libraries be changed in this
 release?

 Kind Regards,
 Dirceu

 2015-02-03 10:18 GMT-02:00 Sean Owen so...@cloudera.com:

  +1
 
  The signatures are still fine.
  Building for Hadoop 2.6 with YARN works; tests pass, except that
  MQTTStreamSuite, which we established is a test problem and already
  fixed in master.
 
  On Tue, Feb 3, 2015 at 12:34 AM, Krishna Sankar ksanka...@gmail.com
  wrote:
   +1 (non-binding, of course)
  
   1. Compiled OSX 10.10 (Yosemite) OK Total time: 11:13 min
mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
   -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11
   2. Tested pyspark, mlib - running as well as compare results with
 1.1.x 
   1.2.0
   2.1. statistics (min,max,mean,Pearson,Spearman) OK
   2.2. Linear/Ridge/Laso Regression OK
   2.3. Decision Tree, Naive Bayes OK
   2.4. KMeans OK
  Center And Scale OK
  Fixed : org.apache.spark.SparkException in zip !
   2.5. rdd operations OK
 State of the Union Texts - MapReduce, Filter,sortByKey (word
 count)
   2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
  Model evaluation/optimization (rank, numIter, lmbda) with
  itertools
   OK
   3. Scala - MLLib
   3.1. statistics (min,max,mean,Pearson,Spearman) OK
   3.2. LinearRegressionWIthSGD OK
   3.3. Decision Tree OK
   3.4. KMeans OK
   3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
  
   Cheers
   k/
  
  
   On Mon, Feb 2, 2015 at 8:57 PM, Patrick Wendell pwend...@gmail.com
  wrote:
  
   Please vote on releasing the following candidate as Apache Spark
 version
   1.2.1!
  
   The tag to be voted on is v1.2.1-rc3 (commit b6eaf77):
  
  
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
 b6eaf77d4332bfb0a698849b1f5f917d20d70e97
  
   The release files, including signatures, digests, etc. can be found
 at:
   http://people.apache.org/~pwendell/spark-1.2.1-rc3/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/pwendell.asc
  
   The staging repository for this release can be found at:
   https://repository.apache.org/content/repositories/
 orgapachespark-1065/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~pwendell/spark-1.2.1-rc3-docs/
  
   Changes from rc2:
   A single patch fixing a windows issue.
  
   Please vote on releasing this package as Apache Spark 1.2.1!
  
   The vote is open until Friday, February 06, at 05:00 UTC and passes
   if a majority of at least 3 +1 PMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 1.2.1
   [ ] -1 Do not release this package because ...
  
   For a list of fixes in this release, see http://s.apache.org/Mpn.
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org

SparkSubmit.scala and stderr

2015-02-03 Thread jayhutfles

Hi all,

I just saw that the SparkSubmit.scala class has the following lines:

  object SparkSubmit {
...
// Exposed for testing
private[spark] var printStream: PrintStream = System.err
...
  }

This causes all verbose logging messages elsewhere in SparkSubmit to go to
stderr, not stdout.  Is this by design?  Verbose messages don't necessarily
reflect errors.  But as the comment states that it's for testing, maybe I'm
misunderstanding its intent...

Thanks in advance
  -Jay



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/SparkSubmit-scala-and-stderr-tp10417.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.2.1 (RC3)

2015-02-03 Thread Chip Senkbeil

+1 Tested the REPL release against the Spark Kernel project
(compilation/testing/manual execution). Everything still checks out fine.

Signed,
Chip Senkbeil
IBM Emerging Technologies Software Engineer

On Tue Feb 03 2015 at 12:50:12 PM Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 I believe this was changed for 1.2.1. Here are the relevant JIRA issues
 https://issues.apache.org/jira/browse/SPARK-5289?jql=
 project%20%3D%20SPARK%20AND%20fixVersion%20%3D%201.2.1%
 20AND%20text%20~%20%22publish%22%20order%20by%20priority
 .

 On Tue Feb 03 2015 at 10:43:59 AM Dirceu Semighini Filho 
 dirceu.semigh...@gmail.com wrote:

  Hi Patrick,
  I work in an Startup and we want make one of our projects as open source.
  This project is based on Spark, and it will help users to instantiate
 spark
  clusters in a cloud environment.
  But for that project we need to use the repl, hive and thrift-server.
  Can the decision of not publishing this libraries be changed in this
  release?
 
  Kind Regards,
  Dirceu
 
  2015-02-03 10:18 GMT-02:00 Sean Owen so...@cloudera.com:
 
   +1
  
   The signatures are still fine.
   Building for Hadoop 2.6 with YARN works; tests pass, except that
   MQTTStreamSuite, which we established is a test problem and already
   fixed in master.
  
   On Tue, Feb 3, 2015 at 12:34 AM, Krishna Sankar ksanka...@gmail.com
   wrote:
+1 (non-binding, of course)
   
1. Compiled OSX 10.10 (Yosemite) OK Total time: 11:13 min
 mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11
2. Tested pyspark, mlib - running as well as compare results with
  1.1.x 
1.2.0
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Laso Regression OK
2.3. Decision Tree, Naive Bayes OK
2.4. KMeans OK
   Center And Scale OK
   Fixed : org.apache.spark.SparkException in zip !
2.5. rdd operations OK
  State of the Union Texts - MapReduce, Filter,sortByKey (word
  count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
   Model evaluation/optimization (rank, numIter, lmbda) with
   itertools
OK
3. Scala - MLLib
3.1. statistics (min,max,mean,Pearson,Spearman) OK
3.2. LinearRegressionWIthSGD OK
3.3. Decision Tree OK
3.4. KMeans OK
3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
   
Cheers
k/
   
   
On Mon, Feb 2, 2015 at 8:57 PM, Patrick Wendell pwend...@gmail.com
   wrote:
   
Please vote on releasing the following candidate as Apache Spark
  version
1.2.1!
   
The tag to be voted on is v1.2.1-rc3 (commit b6eaf77):
   
   
   https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=
  b6eaf77d4332bfb0a698849b1f5f917d20d70e97
   
The release files, including signatures, digests, etc. can be found
  at:
http://people.apache.org/~pwendell/spark-1.2.1-rc3/
   
Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc
   
The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/
  orgapachespark-1065/
   
The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.2.1-rc3-docs/
   
Changes from rc2:
A single patch fixing a windows issue.
   
Please vote on releasing this package as Apache Spark 1.2.1!
   
The vote is open until Friday, February 06, at 05:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.
   
[ ] +1 Release this package as Apache Spark 1.2.1
[ ] -1 Do not release this package because ...
   
For a list of fixes in this release, see http://s.apache.org/Mpn.
   
To learn more about Apache Spark, please see
http://spark.apache.org/
   

 -
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org
   
   
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org

[ANNOUNCE] branch-1.3 has been cut

2015-02-03 Thread Patrick Wendell

Hey All,

Just wanted to announce that we've cut the 1.3 branch which will
become the 1.3 release after community testing.

There are still some features that will go in (in higher level
libraries, and some stragglers in spark core), but overall this
indicates the end of major feature development for Spark 1.3 and a
transition into testing.

Within a few days I'll cut a snapshot package release for this so that
people can begin testing.

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-1.3

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Ted Yu

Congratulations, Cheng, Joseph and Sean.

On Tue, Feb 3, 2015 at 2:53 PM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 Congratulations guys!

 On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia matei.zaha...@gmail.com
 wrote:

  Hi all,
 
  The PMC recently voted to add three new committers: Cheng Lian, Joseph
  Bradley and Sean Owen. All three have been major contributors to Spark in
  the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and
 many
  pieces throughout Spark Core. Join me in welcoming them as committers!
 
  Matei
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org

Welcoming three new committers

2015-02-03 Thread Matei Zaharia

Hi all,

The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley 
and Sean Owen. All three have been major contributors to Spark in the past 
year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many pieces 
throughout Spark Core. Join me in welcoming them as committers!

Matei
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Nicholas Chammas

Congratulations guys!

On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia matei.zaha...@gmail.com
wrote:

 Hi all,

 The PMC recently voted to add three new committers: Cheng Lian, Joseph
 Bradley and Sean Owen. All three have been major contributors to Spark in
 the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many
 pieces throughout Spark Core. Join me in welcoming them as committers!

 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Hari Shreedharan

Congrats Cheng, Joseph and Owen! Well done!




Thanks, Hari

On Tue, Feb 3, 2015 at 2:55 PM, Ted Yu yuzhih...@gmail.com wrote:

 Congratulations, Cheng, Joseph and Sean.
 On Tue, Feb 3, 2015 at 2:53 PM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:
 Congratulations guys!

 On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia matei.zaha...@gmail.com
 wrote:

  Hi all,
 
  The PMC recently voted to add three new committers: Cheng Lian, Joseph
  Bradley and Sean Owen. All three have been major contributors to Spark in
  the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and
 many
  pieces throughout Spark Core. Join me in welcoming them as committers!
 
  Matei
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org

RE: Welcoming three new committers

2015-02-03 Thread Pritish Nawlakhe

Congrats and welcome back!!



Thank you!!

Regards
Pritish
Nirvana International Inc.

Big Data, Hadoop, Oracle EBS and IT Solutions
VA - SWaM, MD - MBE Certified Company
prit...@nirvana-international.com 
http://www.nirvana-international.com 
Twitter: @nirvanainternat 

-Original Message-
From: Hari Shreedharan [mailto:hshreedha...@cloudera.com] 
Sent: Tuesday, February 3, 2015 6:02 PM
To: Ted Yu
Cc: Nicholas Chammas; dev; Joseph Bradley; Cheng Lian; Matei Zaharia; Sean Owen
Subject: Re: Welcoming three new committers

Congrats Cheng, Joseph and Owen! Well done!




Thanks, Hari

On Tue, Feb 3, 2015 at 2:55 PM, Ted Yu yuzhih...@gmail.com wrote:

 Congratulations, Cheng, Joseph and Sean.
 On Tue, Feb 3, 2015 at 2:53 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com
 wrote:
 Congratulations guys!

 On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia 
 matei.zaha...@gmail.com
 wrote:

  Hi all,
 
  The PMC recently voted to add three new committers: Cheng Lian, 
  Joseph Bradley and Sean Owen. All three have been major 
  contributors to Spark in the past year: Cheng on Spark SQL, Joseph 
  on MLlib, and Sean on ML and
 many
  pieces throughout Spark Core. Join me in welcoming them as committers!
 
  Matei
  ---
  -- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
  additional commands, e-mail: dev-h...@spark.apache.org
 
 



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Timothy Chen

Congrats all!

Tim


 On Feb 4, 2015, at 7:10 AM, Pritish Nawlakhe 
 prit...@nirvana-international.com wrote:
 
 Congrats and welcome back!!
 
 
 
 Thank you!!
 
 Regards
 Pritish
 Nirvana International Inc.
 
 Big Data, Hadoop, Oracle EBS and IT Solutions
 VA - SWaM, MD - MBE Certified Company
 prit...@nirvana-international.com 
 http://www.nirvana-international.com 
 Twitter: @nirvanainternat 
 
 -Original Message-
 From: Hari Shreedharan [mailto:hshreedha...@cloudera.com] 
 Sent: Tuesday, February 3, 2015 6:02 PM
 To: Ted Yu
 Cc: Nicholas Chammas; dev; Joseph Bradley; Cheng Lian; Matei Zaharia; Sean 
 Owen
 Subject: Re: Welcoming three new committers
 
 Congrats Cheng, Joseph and Owen! Well done!
 
 
 
 
 Thanks, Hari
 
 On Tue, Feb 3, 2015 at 2:55 PM, Ted Yu yuzhih...@gmail.com wrote:
 
 Congratulations, Cheng, Joseph and Sean.
 On Tue, Feb 3, 2015 at 2:53 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com
 wrote:
 Congratulations guys!
 
 On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia 
 matei.zaha...@gmail.com
 wrote:
 
 Hi all,
 
 The PMC recently voted to add three new committers: Cheng Lian, 
 Joseph Bradley and Sean Owen. All three have been major 
 contributors to Spark in the past year: Cheng on Spark SQL, Joseph 
 on MLlib, and Sean on ML and
 many
 pieces throughout Spark Core. Join me in welcoming them as committers!
 
 Matei
 ---
 -- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
 additional commands, e-mail: dev-h...@spark.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Evan Chan

Congrats everyone!!!

On Tue, Feb 3, 2015 at 3:17 PM, Timothy Chen tnac...@gmail.com wrote:
 Congrats all!

 Tim


 On Feb 4, 2015, at 7:10 AM, Pritish Nawlakhe 
 prit...@nirvana-international.com wrote:

 Congrats and welcome back!!



 Thank you!!

 Regards
 Pritish
 Nirvana International Inc.

 Big Data, Hadoop, Oracle EBS and IT Solutions
 VA - SWaM, MD - MBE Certified Company
 prit...@nirvana-international.com
 http://www.nirvana-international.com
 Twitter: @nirvanainternat

 -Original Message-
 From: Hari Shreedharan [mailto:hshreedha...@cloudera.com]
 Sent: Tuesday, February 3, 2015 6:02 PM
 To: Ted Yu
 Cc: Nicholas Chammas; dev; Joseph Bradley; Cheng Lian; Matei Zaharia; Sean 
 Owen
 Subject: Re: Welcoming three new committers

 Congrats Cheng, Joseph and Owen! Well done!




 Thanks, Hari

 On Tue, Feb 3, 2015 at 2:55 PM, Ted Yu yuzhih...@gmail.com wrote:

 Congratulations, Cheng, Joseph and Sean.
 On Tue, Feb 3, 2015 at 2:53 PM, Nicholas Chammas
 nicholas.cham...@gmail.com
 wrote:
 Congratulations guys!

 On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia
 matei.zaha...@gmail.com
 wrote:

 Hi all,

 The PMC recently voted to add three new committers: Cheng Lian,
 Joseph Bradley and Sean Owen. All three have been major
 contributors to Spark in the past year: Cheng on Spark SQL, Joseph
 on MLlib, and Sean on ML and
 many
 pieces throughout Spark Core. Join me in welcoming them as committers!

 Matei
 ---
 -- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For
 additional commands, e-mail: dev-h...@spark.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: SparkSubmit.scala and stderr

2015-02-03 Thread Evan Chan

Why not just use SLF4J?

On Tue, Feb 3, 2015 at 2:22 PM, Reynold Xin r...@databricks.com wrote:
 We can use ScalaTest's privateMethodTester also instead of exposing that.

 On Tue, Feb 3, 2015 at 2:18 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Hi Jay,

 On Tue, Feb 3, 2015 at 6:28 AM, jayhutfles jayhutf...@gmail.com wrote:
  // Exposed for testing
  private[spark] var printStream: PrintStream = System.err

  But as the comment states that it's for testing, maybe I'm
  misunderstanding its intent...

 The comment is there to tell someone reading the code that this field
 is a `var` and not private just because test code (SparkSubmitSuite in
 this case) needs to modify it, otherwise it wouldn't exist or would be
 private. Similar in spirit to this annotation:


 http://guava-libraries.googlecode.com/svn/tags/release09/javadoc/com/google/common/annotations/VisibleForTesting.html

 (Which I'd probably have used in this case, but is not really common
 in Spark code.)

 --
 Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: IDF for ml pipeline

2015-02-03 Thread masaki rikitoku

Thank you for your reply. I will do it.



—
Mailbox から送信

On Tue, Feb 3, 2015 at 6:12 PM, Xiangrui Meng men...@gmail.com wrote:

 Yes, we need a wrapper under spark.ml. Feel free to create a JIRA for
 it. -Xiangrui
 On Mon, Feb 2, 2015 at 8:56 PM, masaki rikitoku rikima3...@gmail.com wrote:
 Hi all

 I am trying the ml pipeline for text classfication now.

 recently, i succeed to execute the pipeline processing in ml packages,
 which consist of the original Japanese tokenizer, hashingTF,
 logisticRegression.

 then,  i failed to  executed the pipeline with idf in mllib package directly.

 To use the idf feature in ml package,
 do i have to implement the wrapper for idf in ml package like the hashingTF?

 best

 Masaki Rikitoku

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Corey Nolet

Congrats guys!

On Tue, Feb 3, 2015 at 7:01 PM, Evan Chan velvia.git...@gmail.com wrote:

 Congrats everyone!!!

 On Tue, Feb 3, 2015 at 3:17 PM, Timothy Chen tnac...@gmail.com wrote:
  Congrats all!
 
  Tim
 
 
  On Feb 4, 2015, at 7:10 AM, Pritish Nawlakhe 
 prit...@nirvana-international.com wrote:
 
  Congrats and welcome back!!
 
 
 
  Thank you!!
 
  Regards
  Pritish
  Nirvana International Inc.
 
  Big Data, Hadoop, Oracle EBS and IT Solutions
  VA - SWaM, MD - MBE Certified Company
  prit...@nirvana-international.com
  http://www.nirvana-international.com
  Twitter: @nirvanainternat
 
  -Original Message-
  From: Hari Shreedharan [mailto:hshreedha...@cloudera.com]
  Sent: Tuesday, February 3, 2015 6:02 PM
  To: Ted Yu
  Cc: Nicholas Chammas; dev; Joseph Bradley; Cheng Lian; Matei Zaharia;
 Sean Owen
  Subject: Re: Welcoming three new committers
 
  Congrats Cheng, Joseph and Owen! Well done!
 
 
 
 
  Thanks, Hari
 
  On Tue, Feb 3, 2015 at 2:55 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  Congratulations, Cheng, Joseph and Sean.
  On Tue, Feb 3, 2015 at 2:53 PM, Nicholas Chammas
  nicholas.cham...@gmail.com
  wrote:
  Congratulations guys!
 
  On Tue Feb 03 2015 at 2:36:12 PM Matei Zaharia
  matei.zaha...@gmail.com
  wrote:
 
  Hi all,
 
  The PMC recently voted to add three new committers: Cheng Lian,
  Joseph Bradley and Sean Owen. All three have been major
  contributors to Spark in the past year: Cheng on Spark SQL, Joseph
  on MLlib, and Sean on ML and
  many
  pieces throughout Spark Core. Join me in welcoming them as
 committers!
 
  Matei
  ---
  -- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For
  additional commands, e-mail: dev-h...@spark.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Xuefeng Wu

Congratulations！well done. 

Yours, Xuefeng Wu 吴雪峰 敬上

 On 2015年2月4日, at 上午6:34, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 Hi all,
 
 The PMC recently voted to add three new committers: Cheng Lian, Joseph 
 Bradley and Sean Owen. All three have been major contributors to Spark in the 
 past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many 
 pieces throughout Spark Core. Join me in welcoming them as committers!
 
 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: IDF for ml pipeline

2015-02-03 Thread Xiangrui Meng

Yes, we need a wrapper under spark.ml. Feel free to create a JIRA for
it. -Xiangrui

On Mon, Feb 2, 2015 at 8:56 PM, masaki rikitoku rikima3...@gmail.com wrote:
 Hi all

 I am trying the ml pipeline for text classfication now.

 recently, i succeed to execute the pipeline processing in ml packages,
 which consist of the original Japanese tokenizer, hashingTF,
 logisticRegression.

 then,  i failed to  executed the pipeline with idf in mllib package directly.

 To use the idf feature in ml package,
 do i have to implement the wrapper for idf in ml package like the hashingTF?

 best

 Masaki Rikitoku

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: SparkSubmit.scala and stderr

2015-02-03 Thread Sean Owen

Despite its name, stderr is frequently used as the destination for
anything that's not the output of the program, which includes log
messages. That way, for example, you can redirect the output of such a
program to capture its result without also capturing log or error
messages, which will still just print to the console.

On Tue, Feb 3, 2015 at 8:28 AM, jayhutfles jayhutf...@gmail.com wrote:
 Hi all,

 I just saw that the SparkSubmit.scala class has the following lines:

   object SparkSubmit {
 ...
 // Exposed for testing
 private[spark] var printStream: PrintStream = System.err
 ...
   }

 This causes all verbose logging messages elsewhere in SparkSubmit to go to
 stderr, not stdout.  Is this by design?  Verbose messages don't necessarily
 reflect errors.  But as the comment states that it's for testing, maybe I'm
 misunderstanding its intent...

 Thanks in advance
   -Jay



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/SparkSubmit-scala-and-stderr-tp10417.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Nan Zhu

Congratulations!

--  
Nan Zhu
http://codingcat.me


On Tuesday, February 3, 2015 at 8:08 PM, Xuefeng Wu wrote:

 Congratulations！well done.  
  
 Yours, Xuefeng Wu 吴雪峰 敬上
  
  On 2015年2月4日, at 上午6:34, Matei Zaharia matei.zaha...@gmail.com 
  (mailto:matei.zaha...@gmail.com) wrote:
   
  Hi all,
   
  The PMC recently voted to add three new committers: Cheng Lian, Joseph 
  Bradley and Sean Owen. All three have been major contributors to Spark in 
  the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many 
  pieces throughout Spark Core. Join me in welcoming them as committers!
   
  Matei
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
  (mailto:dev-unsubscr...@spark.apache.org)
  For additional commands, e-mail: dev-h...@spark.apache.org 
  (mailto:dev-h...@spark.apache.org)
   
  
  
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
 (mailto:dev-unsubscr...@spark.apache.org)
 For additional commands, e-mail: dev-h...@spark.apache.org 
 (mailto:dev-h...@spark.apache.org)

Re: Welcoming three new committers

2015-02-03 Thread Mridul Muralidharan

Congratulations !
Keep up the good work :-)

Regards
Mridul


On Tuesday, February 3, 2015, Matei Zaharia matei.zaha...@gmail.com wrote:

 Hi all,

 The PMC recently voted to add three new committers: Cheng Lian, Joseph
 Bradley and Sean Owen. All three have been major contributors to Spark in
 the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many
 pieces throughout Spark Core. Join me in welcoming them as committers!

 Matei
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:;
 For additional commands, e-mail: dev-h...@spark.apache.org javascript:;

Re: 2GB limit for partitions?

2015-02-03 Thread Mridul Muralidharan

That is fairly out of date (we used to run some of our jobs on it ... But
that is forked off 1.1 actually).

Regards
Mridul

On Tuesday, February 3, 2015, Imran Rashid iras...@cloudera.com wrote:

Thanks for the explanations, makes sense. For the record looks like this
was worked on a while back (and maybe the work is even close to a
solution?)

https://issues.apache.org/jira/browse/SPARK-1476

and perhaps an independent solution was worked on here?

https://issues.apache.org/jira/browse/SPARK-1391

On Tue, Feb 3, 2015 at 5:20 PM, Reynold Xin r...@databricks.com
javascript:; wrote:

cc dev list

How are you saving the data? There are two relevant 2GB limits:

1. Caching

2. Shuffle

For caching, a partition is turned into a single block.

For shuffle, each map partition is partitioned into R blocks, where R =
number of reduce tasks. It is unlikely a shuffle block 2G, although it
can still happen.

I think the 2nd problem is easier to fix than the 1st, because we can
handle that in the network transport layer. It'd require us to divide the
transfer of a very large block into multiple smaller blocks.

On Tue, Feb 3, 2015 at 3:00 PM, Imran Rashid iras...@cloudera.com
javascript:; wrote:

Michael,

you are right, there is definitely some limit at 2GB. Here is a trivial
example to demonstrate it:

import org.apache.spark.storage.StorageLevel
val d = sc.parallelize(1 to 1e6.toInt, 1).map{i = new
Array[Byte](5e3.toInt)}.persist(StorageLevel.DISK_ONLY)
d.count()

It gives the same error you are observing. I was under the same
impression as Sean about the limits only being on blocks, not
partitions --
but clearly that isn't the case here.

I don't know the whole story yet, but I just wanted to at least let you
know you aren't crazy :)
At the very least this suggests that you might need to make smaller
partitions for now.

Imran

On Tue, Feb 3, 2015 at 4:58 AM, Michael Albert
m_albert...@yahoo.com.invalid wrote:

Greetings!

Thanks for the response.

Below is an example of the exception I saw.
I'd rather not post code at the moment, so I realize it is completely
unreasonable to ask for a diagnosis.
However, I will say that adding a partitionBy() was the last change
before this error was created.

Thanks for your time and any thoughts you might have.

Sincerely,
Mike

Exception in thread main org.apache.spark.SparkException: Job aborted
due to stage failure: Task 4 in stage 5.0 failed 4 times, most recent
failure: Lost task 4.3 in stage 5.0 (TID 6012,
ip-10-171-0-31.ec2.internal): java.lang.RuntimeException:
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:828)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:123)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:132)
at

org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:517)
at

org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:307)
at

org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
at

scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at

scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at

org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)

--
*From:* Sean Owen so...@cloudera.com javascript:;
*To:* Michael Albert m_albert...@yahoo.com javascript:;
*Cc:* u...@spark.apache.org javascript:; u...@spark.apache.org
javascript:;
*Sent:* Monday, February 2, 2015 10:13 PM
*Subject:* Re: 2GB limit for partitions?

The limit is on blocks, not partitions. Partitions have many blocks.

It sounds like you are creating very large values in memory, but I'm
not sure given your description. You will run into problems if a
single object is more than 2GB, of course. More of the stack trace
might show what is mapping that much memory.

If you simply want data into 1000 files it's a lot simpler. Just
repartition into 1000 partitions and save the data. If you need more
control over what goes into which partition, use a Partitioner, yes.

On Mon, Feb 2, 2015 at 8:40 PM, Michael Albert
m_albert...@yahoo.com.invalid wrote:
Greetings!

SPARK-1476 says that there is a 2G limit for blocks.
Is

Re: Welcoming three new committers

2015-02-03 Thread Chao Chen


Congratulations guys, well done!

在 15-2-4 上午9:26, Nan Zhu 写道:

Congratulations!

--
Nan Zhu
http://codingcat.me


On Tuesday, February 3, 2015 at 8:08 PM, Xuefeng Wu wrote:


Congratulations！well done.
  
Yours, Xuefeng Wu 吴雪峰 敬上
  

On 2015年2月4日, at 上午6:34, Matei Zaharia matei.zaha...@gmail.com 
(mailto:matei.zaha...@gmail.com) wrote:
  
Hi all,
  
The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley and Sean Owen. All three have been major contributors to Spark in the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many pieces throughout Spark Core. Join me in welcoming them as committers!
  
Matei

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
(mailto:dev-unsubscr...@spark.apache.org)
For additional commands, e-mail: dev-h...@spark.apache.org 
(mailto:dev-h...@spark.apache.org)
  
  
  
-

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org 
(mailto:dev-unsubscr...@spark.apache.org)
For additional commands, e-mail: dev-h...@spark.apache.org 
(mailto:dev-h...@spark.apache.org)
  
  






-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Denny Lee

Awesome stuff - congratulations! :)

On Tue Feb 03 2015 at 5:34:06 PM Chao Chen crazy...@gmail.com wrote:

 Congratulations guys, well done!

 在 15-2-4 上午9:26, Nan Zhu 写道:
  Congratulations!
 
  --
  Nan Zhu
  http://codingcat.me
 
 
  On Tuesday, February 3, 2015 at 8:08 PM, Xuefeng Wu wrote:
 
  Congratulations！well done.
 
  Yours, Xuefeng Wu 吴雪峰 敬上
 
  On 2015年2月4日, at 上午6:34, Matei Zaharia matei.zaha...@gmail.com
 (mailto:matei.zaha...@gmail.com) wrote:
 
  Hi all,
 
  The PMC recently voted to add three new committers: Cheng Lian, Joseph
 Bradley and Sean Owen. All three have been major contributors to Spark in
 the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many
 pieces throughout Spark Core. Join me in welcoming them as committers!
 
  Matei
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:
 dev-unsubscr...@spark.apache.org)
  For additional commands, e-mail: dev-h...@spark.apache.org (mailto:
 dev-h...@spark.apache.org)
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:
 dev-unsubscr...@spark.apache.org)
  For additional commands, e-mail: dev-h...@spark.apache.org (mailto:
 dev-h...@spark.apache.org)
 
 
 
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: Welcoming three new committers

2015-02-03 Thread Debasish Das

Congratulations !

Keep helping the community :-)

On Tue, Feb 3, 2015 at 5:34 PM, Denny Lee denny.g@gmail.com wrote:

 Awesome stuff - congratulations! :)

 On Tue Feb 03 2015 at 5:34:06 PM Chao Chen crazy...@gmail.com wrote:

  Congratulations guys, well done!
 
  在 15-2-4 上午9:26, Nan Zhu 写道:
   Congratulations!
  
   --
   Nan Zhu
   http://codingcat.me
  
  
   On Tuesday, February 3, 2015 at 8:08 PM, Xuefeng Wu wrote:
  
   Congratulations！well done.
  
   Yours, Xuefeng Wu 吴雪峰 敬上
  
   On 2015年2月4日, at 上午6:34, Matei Zaharia matei.zaha...@gmail.com
  (mailto:matei.zaha...@gmail.com) wrote:
  
   Hi all,
  
   The PMC recently voted to add three new committers: Cheng Lian,
 Joseph
  Bradley and Sean Owen. All three have been major contributors to Spark in
  the past year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and
 many
  pieces throughout Spark Core. Join me in welcoming them as committers!
  
   Matei
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:
  dev-unsubscr...@spark.apache.org)
   For additional commands, e-mail: dev-h...@spark.apache.org (mailto:
  dev-h...@spark.apache.org)
  
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org (mailto:
  dev-unsubscr...@spark.apache.org)
   For additional commands, e-mail: dev-h...@spark.apache.org (mailto:
  dev-h...@spark.apache.org)
  
  
  
  
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org

40 matches

Mail list logo