Re: can spark take advantage of ordered data?

2017-03-10 Thread sourabh chaki
, but that project does not utilise the pre-existing partitions in the feed. Any pointer will be helpful. Thanks Sourabh On Thu, Mar 12, 2015 at 6:35 AM, Imran Rashid <iras...@cloudera.com> wrote: > Hi Jonathan, > > you might be interested in https://issues.apache.org/ > jira

Re: spark.streaming.kafka.maxRatePerPartition for direct stream

2015-10-02 Thread Sourabh Chandak
Thanks Cody, will try to do some estimation. Thanks Nicolae, will try out this config. Thanks, Sourabh On Thu, Oct 1, 2015 at 11:01 PM, Nicolae Marasoiu < nicolae.maras...@adswizz.com> wrote: > Hi, > > > Set 10ms and spark.streaming.backpressure.enabled=true > > >

Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
for checkpointing. Spark streaming is done using a backported code. Running nodetool shows that the Read latency of the cfs keyspace is ~8.5 ms. Can someone please help me resolve this? Thanks, Sourabh

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
I can see the entries processed in the table very fast but after that it takes a long time for the checkpoint update. Haven't tried other methods of checkpointing yet, we are using DSE on Azure. Thanks, Sourabh On Fri, Oct 2, 2015 at 6:52 AM, Cody Koeninger <c...@koeninger.org> wrote:

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
Tried using local checkpointing as well, and even that becomes slow after sometime. Any idea what can be wrong? Thanks, Sourabh On Fri, Oct 2, 2015 at 9:35 AM, Sourabh Chandak <sourabh3...@gmail.com> wrote: > I can see the entries processed in the table very fast but after that i

Re: Checkpointing is super slow

2015-10-02 Thread Sourabh Chandak
ata), or RDD checkpointing > (which saves the actual intermediate RDD data) > > TD > > On Fri, Oct 2, 2015 at 2:56 PM, Sourabh Chandak <sourabh3...@gmail.com > <javascript:_e(%7B%7D,'cvml','sourabh3...@gmail.com');>> wrote: > >> Tried using local checkpointing as well

spark.streaming.kafka.maxRatePerPartition for direct stream

2015-10-01 Thread Sourabh Chandak
. Thanks, Sourabh

Re: Adding / Removing worker nodes for Spark Streaming

2015-09-28 Thread Sourabh Chandak
of node failure how will a new node know the checkpoint of the failed node? The amount of data we have is huge and we can't run from the smallest offset. Thanks, Sourabh On Mon, Sep 28, 2015 at 11:43 AM, Augustus Hong <augus...@branchmetrics.io> wrote: > Got it, thank you! > > > On

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
ing("Throwing this errir\n")), ok => ok ) } On Thu, Sep 24, 2015 at 3:00 PM, Sourabh Chandak <sourabh3...@gmail.com> wrote: > I was able to get pass this issue. I was pointing the SSL port whereas > SimpleConsumer should point to the PLAINTEXT port. But after fixing that

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
a) Thanks, Sourabh On Thu, Sep 24, 2015 at 2:04 PM, Cody Koeninger <c...@koeninger.org> wrote: > That looks like the OOM is in the driver, when getting partition metadata > to create the direct stream. In that case, executor memory allocation > doesn't matter. > > Allocate more d

ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) I have tried allocating 100G of memory with 1 executor but it is still failing. Spark version: 1.2.2 Kafka version ported: 0.8.2 Kafka server version: trunk version with SSL enabled Can someone please help me debug this. Thanks, Sourabh

Re: ERROR BoundedByteBufferReceive: OOME with size 352518400

2015-09-24 Thread Sourabh Chandak
Adding Cody and Sriharsha On Thu, Sep 24, 2015 at 1:25 PM, Sourabh Chandak <sourabh3...@gmail.com> wrote: > Hi, > > I have ported receiver less spark streaming for kafka to Spark 1.2 and am > trying to run a spark streaming job to consume data form my broker, but I > am

Re: SSL between Kafka and Spark Streaming API

2015-08-28 Thread Sourabh Chandak
Can we use the existing kafka spark streaming jar to connect to a kafka server running in SSL mode? We are fine with non SSL consumer as our kafka cluster and spark cluster are in the same network Thanks, Sourabh On Fri, Aug 28, 2015 at 12:03 PM, Gwen Shapira g...@confluent.io wrote: I can't

Re: Reliable Streaming Receiver

2015-08-05 Thread Sourabh Chandak
Thanks Tathagata. I tried that but BlockGenerator internally uses SystemClock which is again private. We are using DSE so stuck with Spark 1.2 hence can't use the receiver-less version. Is it possible to use the same code as a separate API with 1.2? Thanks, Sourabh On Wed, Aug 5, 2015 at 6:13

Reliable Streaming Receiver

2015-08-05 Thread Sourabh Chandak
to tackle this issue? Thanks, Sourabh

Re: JAVA_HOME problem

2015-04-28 Thread sourabh chaki
. Any pointer why this could happen? Thanks Sourabh On Fri, Apr 24, 2015 at 3:52 PM, sourabh chaki chaki.sour...@gmail.com wrote: Yes Akhil. This is the same issue. I have updated my comment in that ticket. Thanks Sourabh On Fri, Apr 24, 2015 at 12:02 PM, Akhil Das ak

Re: JAVA_HOME problem

2015-04-24 Thread sourabh chaki
Yes Akhil. This is the same issue. I have updated my comment in that ticket. Thanks Sourabh On Fri, Apr 24, 2015 at 12:02 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Isn't this related to this https://issues.apache.org/jira/browse/SPARK-6681 Thanks Best Regards On Fri, Apr 24, 2015

Re: JAVA_HOME problem

2015-04-24 Thread sourabh chaki
-with-upgrade-to-spark-1-3-0 Any pointer will be helpful. Thanks Sourabh On Thu, Apr 2, 2015 at 1:23 PM, 董帅阳 917361...@qq.com wrote: spark 1.3.0 spark@pc-zjqdyyn1:~ tail /etc/profile export JAVA_HOME=/usr/jdk64/jdk1.7.0_45 export PATH=$PATH:$JAVA_HOME/bin # # End of /etc/profile

Re: train many decision tress with a single spark job

2015-01-13 Thread sourabh chaki
{ (data) = DecisionTree.trainClassifier(toLabelPoints(data)) } def toLablePoint(data: RDD[Double]) : RDD[LabeledPoint] = { // convert data RDD to lablepoint RDD } For your case, I think, you need custom logic to split the dataset. Thanks Sourabh On Tue, Jan 13, 2015 at 3:55 PM, Sean Owen so

Re: MLLIB model export: PMML vs MLLIB serialization

2014-12-15 Thread sourabh
Thanks Vincenzo. Are you trying out all the models implemented in mllib? Actually I don't see decision tree there. Sorry if I missed it. When are you planning to merge this to spark branch? Thanks Sourabh On Sun, Dec 14, 2014 at 5:54 PM, selvinsource [via Apache Spark User List] ml-node

Re: Serialize mllib's MatrixFactorizationModel

2014-12-15 Thread sourabh chaki
the mllib trained model to a different system. Thanks Sourabh On Mon, Dec 15, 2014 at 10:39 PM, Albert Manyà alber...@eml.cc wrote: In that case, what is the strategy to train a model in some background batch process and make recommendations for some other service in real time? Run both

MLLIB model export: PMML vs MLLIB serialization

2014-12-03 Thread sourabh
not be deserializable using a different version of mllib entity(?). I think this is a quite common problem.I am really interested to hear from you people how you are solving this and what are the approaches and pros and cons. Thanks Sourabh -- View this message in context: http://apache