Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Soumitra Kumar
+1 (non-binding)

For: https://issues.apache.org/jira/browse/SPARK-3660

. Docs OK
. Example code is good

-Soumitra.


On Mon, Feb 23, 2015 at 10:33 AM, Marcelo Vanzin 
wrote:

> Hi Tom, are you using an sbt-built assembly by any chance? If so, take
> a look at SPARK-5808.
>
> I haven't had any problems with the maven-built assembly. Setting
> SPARK_HOME on the executors is a workaround if you want to use the sbt
> assembly.
>
> On Fri, Feb 20, 2015 at 2:56 PM, Tom Graves
>  wrote:
> > Trying to run pyspark on yarn in client mode with basic wordcount
> example I see the following error when doing the collect:
> > Error from python worker:  /usr/bin/python: No module named
> sqlPYTHONPATH was:
> /grid/3/tmp/yarn-local/usercache/tgraves/filecache/20/spark-assembly-1.3.0-hadoop2.6.0.1.1411101121.jarjava.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
>   at
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
>   at
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
>   at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)
>   at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)at
> org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:308)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:64)at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
>   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> > any ideas on this?
> > Tom
> >
> >  On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell <
> pwend...@gmail.com> wrote:
> >
> >
> >  Please vote on releasing the following candidate as Apache Spark
> version 1.3.0!
> >
> > The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
> >
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-1.3.0-rc1/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1069/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/
> >
> > Please vote on releasing this package as Apache Spark 1.3.0!
> >
> > The vote is open until Saturday, February 21, at 08:03 UTC and passes
> > if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.3.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see
> > http://spark.apache.org/
> >
> > == How can I help test this release? ==
> > If you are a Spark user, you can help us test this release by
> > taking a Spark 1.2 workload and running on this release candidate,
> > then reporting any regressions.
> >
> > == What justifies a -1 vote for this release? ==
> > This vote is happening towards the end of the 1.3 QA period,
> > so -1 votes should only occur for significant regressions from 1.2.1.
> > Bugs already present in 1.2.X, minor regressions, or bugs related
> > to new features will not block this release.
> >
> > - Patrick
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
> >
> >
> >
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


SPARK-3660 : Initial RDD for updateStateByKey transformation

2014-09-23 Thread Soumitra Kumar
Hello fellow developers,

Thanks TD for relevant pointers.

I have created an issue :
https://issues.apache.org/jira/browse/SPARK-3660

Copying the description from JIRA:
"
How to initialize state tranformation updateStateByKey?

I have word counts from previous spark-submit run, and want to load that in 
next spark-submit job to start counting over that.

One proposal is to add following argument to updateStateByKey methods.
initial : Option [RDD [(K, S)]] = None

This will maintain the backward compatibility as well.

I have a working code as well.

This thread started on spark-user list at:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initialize-updateStateByKey-operation-td14772.html
"

Please let me know if I shall add a parameter "initial : Option [RDD [(K, S)]] 
= None" to all updateStateByKey methods or create new ones?

Thanks,
-Soumitra.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: SPARK-3660 : Initial RDD for updateStateByKey transformation

2014-10-05 Thread Soumitra Kumar
Hello,

I have submitted a pull request (Adding support of initial value for state 
update. #2665), please review and let me know.

Excited to submit my first pull request.

-Soumitra.

- Original Message -
From: "Soumitra Kumar" 
To: dev@spark.apache.org
Sent: Tuesday, September 23, 2014 1:28:21 PM
Subject: SPARK-3660 : Initial RDD for updateStateByKey transformation

Hello fellow developers,

Thanks TD for relevant pointers.

I have created an issue :
https://issues.apache.org/jira/browse/SPARK-3660

Copying the description from JIRA:
"
How to initialize state tranformation updateStateByKey?

I have word counts from previous spark-submit run, and want to load that in 
next spark-submit job to start counting over that.

One proposal is to add following argument to updateStateByKey methods.
initial : Option [RDD [(K, S)]] = None

This will maintain the backward compatibility as well.

I have a working code as well.

This thread started on spark-user list at:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initialize-updateStateByKey-operation-td14772.html
"

Please let me know if I shall add a parameter "initial : Option [RDD [(K, S)]] 
= None" to all updateStateByKey methods or create new ones?

Thanks,
-Soumitra.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org