Re: Structured Streaming with Kafka sources/sinks

2016-08-15 Thread Cody Koeninger
https://issues.apache.org/jira/browse/SPARK-15406 I'm not working on it (yet?), never got an answer to the question of who was planning to work on it. On Mon, Aug 15, 2016 at 9:12 PM, Guo, Chenzhao wrote: > Hi all, > > > > I’m trying to write Structured Streaming test

Structured Streaming with Kafka sources/sinks

2016-08-15 Thread Guo, Chenzhao
Hi all, I'm trying to write Structured Streaming test code and will deal with Kafka source. Currently Spark 2.0 doesn't support Kafka sources/sinks. I found some Databricks slides saying that Kafka sources/sinks will be implemented in Spark 2.0, so is there anybody working on this? And when

Re: Spark 2.0.1 / 2.1.0 on Maven

2016-08-15 Thread Jacek Laskowski
Thanks Sean. That reflects my sentiments so well! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Aug 15, 2016 at 1:08 AM, Sean Owen

Number of tasks on executors become negative after executor failures

2016-08-15 Thread Rachana Srivastava
Summary: I am running Spark 1.5 on CDH5.5.1. Under extreme load intermittently I am getting this connection failure exception and later negative executor in the Spark UI. Exception: TRACE: org.apache.hadoop.hbase.ipc.AbstractRpcClient - Call: Multi, callTime: 76ms INFO :

Re: How to resolve the SparkExecption : Size exceeds Integer.MAX_VALUE

2016-08-15 Thread Ewan Leith
I think this is more suited to the user mailing list than the dev one, but this almost always means you need to repartition your data into smaller partitions as one of the partitions is over 2GB. When you create your dataset, put something like . repartition(1000) at the end of the command

How to resolve the SparkExecption : Size exceeds Integer.MAX_VALUE

2016-08-15 Thread Minudika Malshan
Hi all, I am trying to create and train a model for a Kaggle competition dataset using Apache spark. The dataset has more than 10 million rows of data. But when training the model, I get an exception "*Size exceeds Integer.MAX_VALUE*". I found the same question has been raised in Stack overflow

Spark hangs after OOM in Serializer

2016-08-15 Thread mikhainin
Hi guys, I'm using Spark 1.6.2 and faced some problem so I kindly ask you to help. Sometimes, when DAGScheduler tries to serialise pair OOM exception is thrown inside closureSerializer.serialize() call (you may see a stack-trace below). But it isn't a problem itself, the problem is

Re: Welcoming Felix Cheung as a committer

2016-08-15 Thread mayur bhole
Congrats Felix! On Mon, Aug 15, 2016 at 2:57 PM, Paul Roy wrote: > Congrats Felix > > Paul Roy. > > On Mon, Aug 8, 2016 at 9:15 PM, Matei Zaharia > wrote: > >> Hi all, >> >> The PMC recently voted to add Felix Cheung as a committer. Felix has been

Re: Spark 2.0.1 / 2.1.0 on Maven

2016-08-15 Thread Steve Loughran
As well as the legal issue 'nightly builds haven't been through the strict review and license check process for ASF releases', and the engineering issue 'release off a nightly and your users will hate you', there's an ASF community one: ASF projects want to build a dev community as well as a

Re: Welcoming Felix Cheung as a committer

2016-08-15 Thread Paul Roy
Congrats Felix Paul Roy. On Mon, Aug 8, 2016 at 9:15 PM, Matei Zaharia wrote: > Hi all, > > The PMC recently voted to add Felix Cheung as a committer. Felix has been > a major contributor to SparkR and we're excited to have him join > officially. Congrats and welcome,

Re: Spark 2.0.1 / 2.1.0 on Maven

2016-08-15 Thread Sean Owen
I believe Chris was being a bit facetious. The ASF guidance is right, that it's important people don't consume non-blessed snapshot builds as like other releases. The intended audience is developers and so the easiest default policy is to only advertise the snapshots where only developers are