Re: 2GB limit for partitions?

2015-02-04 Thread Mridul Muralidharan
That work is from more than an year back and is not maintained anymore since we do not use it inhouse now. Also note that there have been quite a lot of changes in spark ... Including some which break assumptions made in the patch, so it's value is very low - having said that, do feel free to work

Re: 1.2.1-rc3 - Avro input format for Hadoop 2 broken/fix?

2015-02-04 Thread Patrick Wendell
Hi Markus, That won't be included in 1.2.1 most likely because the release votes have already started, and at that point we don't hold the release except for major regression issues from 1.2.0. However, if this goes through we can backport it into the 1.2 branch and it will end up in a future

Spark Cluster vs Spark on YARN jar loading

2015-02-04 Thread Sergey Belousov
Hi All We have our farjar that using asynchbase throwing following exception. ERROR [Executor task launch worker-2-EventThread:ClientCnxn$EventThread@610] - Caught unexpected throwable java.lang.IllegalAccessError: class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass

Re: multi-line comment style

2015-02-04 Thread Reynold Xin
We should update the style doc to reflect what we have in most places (which I think is //). On Wed, Feb 4, 2015 at 2:09 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: FWIW I like the multi-line // over /* */ from a purely style standpoint. The Google Java style guide[1] has

Re: multi-line comment style

2015-02-04 Thread Patrick Wendell
Personally I have no opinion, but agree it would be nice to standardize. - Patrick On Wed, Feb 4, 2015 at 1:58 PM, Sean Owen so...@cloudera.com wrote: One thing Marcelo pointed out to me is that the // style does not interfere with commenting out blocks of code with /* */, which is a small

Re: multi-line comment style

2015-02-04 Thread Shivaram Venkataraman
FWIW I like the multi-line // over /* */ from a purely style standpoint. The Google Java style guide[1] has some comment about code formatting tools working better with /* */ but there doesn't seem to be any strong arguments for one over the other I can find Thanks Shivaram [1]

ZMQ and python streaming

2015-02-04 Thread Sasha Kacanski
Hi, is it possible to integrate zmq with pyspark.streaming to receive messages over TCP socket. I seem to not be able to find working example for ZeroMQ implementation. Regards, -- Aleksandar Kacanski

Re: Welcoming three new committers

2015-02-04 Thread Jian Zhou
Congratulations! On Wed Feb 04 2015 at 7:26:00 AM Nick Pentreath nick.pentre...@gmail.com wrote: Congrats and welcome Sean, Joseph and Cheng! On Wed, Feb 4, 2015 at 2:10 PM, Sean Owen so...@cloudera.com wrote: Thanks all, I appreciate the vote of trust. I'll do my best to help keep JIRA

Broken record a bit here: building spark on intellij with sbt

2015-02-04 Thread Stephen Boesch
For building in intellij with sbt my mileage has varied widely: it had built as late as Monday (after the 1.3.0 release) - and with zero 'special' steps: just import as sbt project. However I can not presently repeat the process. The wiki page has the latest instructions on how to build with

When will Spark Streaming supports Kafka-simple consumer API?

2015-02-04 Thread Xuelin Cao
Hi, In our environment, Kafka can only be used with simple consumer API, like storm spout does. And, also, I found there are suggestions that Kafka connector of Spark should not be used in production http://markmail.org/message/2lb776ta5sq6lgtw because it is based on the high-level

Re: Welcoming three new committers

2015-02-04 Thread scwf
Congratulations! On 2015/2/4 20:25, Nick Pentreath wrote: Congrats and welcome Sean, Joseph and Cheng! On Wed, Feb 4, 2015 at 2:10 PM, Sean Owen so...@cloudera.com wrote: Thanks all, I appreciate the vote of trust. I'll do my best to help keep JIRA and commits moving along, and am ramping

Re: Spark Summit CFP - Tracks guidelines

2015-02-04 Thread Kay Ousterhout
Did you see the longer descriptions under the Learn More link? Developer This track will present technical deep dive content across a wide range of advanced/basic topics. Data Science This track will focus on the practice of data science using Spark. Sessions should cover innovative techniques,

Re: 1.2.1-rc3 - Avro input format for Hadoop 2 broken/fix?

2015-02-04 Thread M. Dale
On 02/04/2015 02:04 PM, Josh Rosen wrote: It looks like you replied just to me; mind CC’ing the mailing list, too? On February 4, 2015 at 11:02:34 AM, M. Dale (medal...@yahoo.com mailto:medal...@yahoo.com) wrote: Josh, That was a bug that was present earlier. It was marked as fixed in

Spark Summit CFP - Tracks guidelines

2015-02-04 Thread Evan Chan
Hey guys, Is there any guidance on what the different tracks for Spark Summit West mean? There are some new ones, like Third Party Apps, which seems like it would be similar to the Use Cases. Any further guidance would be great. thanks, Evan

Re: Welcoming three new committers

2015-02-04 Thread Nick Pentreath
Congrats and welcome Sean, Joseph and Cheng! On Wed, Feb 4, 2015 at 2:10 PM, Sean Owen so...@cloudera.com wrote: Thanks all, I appreciate the vote of trust. I'll do my best to help keep JIRA and commits moving along, and am ramping up carefully this week. Now get back to work reviewing

Re: 2GB limit for partitions?

2015-02-04 Thread Imran Rashid
Hi Mridul, do you think you'll keep working on this, or should this get picked up by others? Looks like there was a lot of work put into LargeByteBuffer, seems promising. thanks, Imran On Tue, Feb 3, 2015 at 7:32 PM, Mridul Muralidharan mri...@gmail.com wrote: That is fairly out of date (we

Hive window functions in 1.2+

2015-02-04 Thread Al Thompson
Hi All: We want to use Hive window functions on Spark on time series data. I take it from viewing issues SPARK-1442 and SPARK-4226 that work on window function support is ongoing and remains unresolved. Are there good workarounds to working with windows on time series stored on Spark? Does

Re: Welcoming three new committers

2015-02-04 Thread Sean Owen
Thanks all, I appreciate the vote of trust. I'll do my best to help keep JIRA and commits moving along, and am ramping up carefully this week. Now get back to work reviewing things! On Tue, Feb 3, 2015 at 4:34 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi all, The PMC recently voted to

1.2.1-rc3 - Avro input format for Hadoop 2 broken/fix?

2015-02-04 Thread M. Dale
SPARK-3039 Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API was reopened and prevents v.1.2.1-rc3 from using Avro Input format for Hadoop 2 API/instances (it includes the hadoop1 avro-mapred library files). What are the chances of getting the fix outlined here

Re: Broken record a bit here: building spark on intellij with sbt

2015-02-04 Thread Akhil Das
Here's the sbt version https://docs.sigmoidanalytics.com/index.php/Step_by_Step_instructions_on_how_to_build_Spark_App_with_IntelliJ_IDEA Thanks Best Regards On Thu, Feb 5, 2015 at 8:55 AM, Stephen Boesch java...@gmail.com wrote: For building in intellij with sbt my mileage has varied widely:

Re: When will Spark Streaming supports Kafka-simple consumer API?

2015-02-04 Thread Tathagata Das
1. There is already a third-party low-level kafka receiver - http://spark-packages.org/package/5 2. There is a new experimental Kafka stream that will be available in Spark 1.3 release. This is based on the low level API, and might suffice your purpose. JIRA -