Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! This release brings operational and performance improvements in Spark

Re: Which committers care about Kafka?

2014-12-19 Thread Dibyendu Bhattacharya
Hi, Thanks to Jerry for mentioning the Kafka Spout for Trident. The Storm Trident has done the exact-once guarantee by processing the tuple in a batch and assigning same transaction-id for a given batch . The replay for a given batch with a transaction-id will have exact same set of tuples and

Re: Announcing Spark 1.2!

2014-12-19 Thread Shixiong Zhu
Congrats! A little question about this release: Which commit is this release based on? v1.2.0 and v1.2.0-rc2 are pointed to different commits in https://github.com/apache/spark/releases Best Regards, Shixiong Zhu 2014-12-19 16:52 GMT+08:00 Patrick Wendell pwend...@gmail.com: I'm happy to

Re: Announcing Spark 1.2!

2014-12-19 Thread Sean Owen
Tag 1.2.0 is older than 1.2.0-rc2. I wonder if it just didn't get updated. I assume it's going to be 1.2.0-rc2 plus a few commits related to the release process. On Fri, Dec 19, 2014 at 9:50 AM, Shixiong Zhu zsxw...@gmail.com wrote: Congrats! A little question about this release: Which commit

Re:Re: Announcing Spark 1.2!

2014-12-19 Thread wyphao.2007
In the http://spark.apache.org/downloads.html page,We cann't download the newest Spark release. At 2014-12-19 17:55:29,Sean Owen so...@cloudera.com wrote: Tag 1.2.0 is older than 1.2.0-rc2. I wonder if it just didn't get updated. I assume it's going to be 1.2.0-rc2 plus a few commits

Re: Re: Announcing Spark 1.2!

2014-12-19 Thread Sean Owen
I can download it. Make sure you refresh the page, maybe, so that it shows the 1.2.0 download as an option. On Fri, Dec 19, 2014 at 11:16 AM, wyphao.2007 wyphao.2...@163.com wrote: In the http://spark.apache.org/downloads.html page,We cann't download the newest Spark release. At

spark-yarn_2.10 1.2.0 artifacts

2014-12-19 Thread David McWhorter
Hi all, Thanks for your work on spark! I am trying to locate spark-yarn jars for the new 1.2.0 release. The jars for spark-core, etc, are on maven central, but the spark-yarn jars are missing. Confusingly and perhaps relatedly, I also can't seem to get the spark-yarn artifact to install

Re: spark-yarn_2.10 1.2.0 artifacts

2014-12-19 Thread Sean Owen
I believe spark-yarn does not exist from 1.2 onwards. Have a look at spark-network-yarn for where some of that went, I believe. On Fri, Dec 19, 2014 at 5:09 PM, David McWhorter mcwhor...@ccri.com wrote: Hi all, Thanks for your work on spark! I am trying to locate spark-yarn jars for the new

Re: Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
Thanks for pointing out the tag issue. I've updated all links to point to the correct tag (from the vote thread): a428c446e23e628b746e0626cc02b7b3cadf588e On Fri, Dec 19, 2014 at 1:55 AM, Sean Owen so...@cloudera.com wrote: Tag 1.2.0 is older than 1.2.0-rc2. I wonder if it just didn't get

Re: Confirming race condition in DagScheduler (NoSuchElementException)

2014-12-19 Thread thlee
any comments? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Confirming-race-condition-in-DagScheduler-NoSuchElementException-tp9798p9855.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Spark master OOMs with exception stack trace stored in JobProgressListener (SPARK-4906)

2014-12-19 Thread Mingyu Kim
Hi, I just filed a bug SPARK-4906https://issues.apache.org/jira/browse/SPARK-4906, regarding Spark master OOMs. If I understand correctly, the UI states for all running applications are kept in memory retained by JobProgressListener, and when there are a lot of exception stack traces, this UI

Re: Which committers care about Kafka?

2014-12-19 Thread Hari Shreedharan
Hi Dibyendu, Thanks for the details on the implementation. But I still do not believe that it is no duplicates - what they achieve is that the same batch is processed exactly the same way every time (but see it may be processed more than once) - so it depends on the operation being idempotent. I

Spark Dev

2014-12-19 Thread Harikrishna Kamepalli
i am interested to contribute to spark

Re: Spark Dev

2014-12-19 Thread Sandy Ryza
Hi Harikrishna, A good place to start is taking a look at the wiki page on contributing: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark -Sandy On Fri, Dec 19, 2014 at 2:43 PM, Harikrishna Kamepalli harikrishna.kamepa...@gmail.com wrote: i am interested to contribute

Re: Which committers care about Kafka?

2014-12-19 Thread Sean McNamara
Please feel free to correct me if I’m wrong, but I think the exactly once spark streaming semantics can easily be solved using updateStateByKey. Make the key going into updateStateByKey be a hash of the event, or pluck off some uuid from the message. The updateFunc would only emit the message

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-19 Thread Andy Konwinski
Yesterday, I changed the domain name in the mailing list archive settings to remove .incubator so maybe it'll work now. However, I also sent two emails about this through the nabble interface (in this same thread) yesterday and they don't appear to have made it through so not sure if it actually

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-19 Thread Ted Yu
Andy: I saw two emails from you from yesterday. See this thread: http://search-hadoop.com/m/JW1q5opRsY1 Cheers On Fri, Dec 19, 2014 at 12:51 PM, Andy Konwinski andykonwin...@gmail.com wrote: Yesterday, I changed the domain name in the mailing list archive settings to remove .incubator so

Re: Which committers care about Kafka?

2014-12-19 Thread Cody Koeninger
The problems you guys are discussing come from trying to store state in spark, so don't do that. Spark isn't a distributed database. Just map kafka partitions directly to rdds, llet user code specify the range of offsets explicitly, and let them be in charge of committing offsets. Using the

Re: Which committers care about Kafka?

2014-12-19 Thread Hari Shreedharan
Can you explain your basic algorithm for the once-only-delivery? It is quite a bit of very Kafka-specific code, that would take more time to read than I can currently afford? If you can explain your algorithm a bit, it might help. Thanks, Hari On Fri, Dec 19, 2014 at 1:48 PM, Cody Koeninger

Re: Which committers care about Kafka?

2014-12-19 Thread Cody Koeninger
That KafkaRDD code is dead simple. Given a user specified map (topic1, partition0) - (startingOffset, endingOffset) (topic1, partition1) - (startingOffset, endingOffset) ... turn each one of those entries into a partition of an rdd, using the simple consumer. That's it. No recovery logic, no

Re: Which committers care about Kafka?

2014-12-19 Thread Koert Kuipers
yup, we at tresata do the idempotent store the same way. very simple approach. On Fri, Dec 19, 2014 at 5:32 PM, Cody Koeninger c...@koeninger.org wrote: That KafkaRDD code is dead simple. Given a user specified map (topic1, partition0) - (startingOffset, endingOffset) (topic1, partition1)

EndpointWriter : Dropping message failure ReliableDeliverySupervisor errors...

2014-12-19 Thread jay vyas
Hi spark. Im trying to understand the akka debug messages when networking doesnt work properly. any hints would be great on this. SIMPLE TESTS I RAN - i tried a ping, works. - i tried a telnet to the 7077 port of master, from slave, also works. LOGS 1) On the master I see this WARN log