Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread HARSH TAKKAR
Hi, No we are not creating any thread for kafka DStream however, we have a single thread for refreshing a resource cache on driver, but that is totally separate to this connection. On Mon, Sep 18, 2017 at 12:29 AM kant kodali wrote: > Are you creating threads in your

spark 2.1.1 ml.LogisticRegression with large feature set cause Kryo serialization failed: Buffer overflow

2017-09-17 Thread haibo wu
I try to train a big model. I have 40 million instances and 50 million feature set, and it is sparse. I am using 40 executors with 20 GB each + driver with 40 GB. The number of data partitions is 5000, the treeAggregate depth is 4, the spark.kryoserializer.buffer.max is 2016m, the

Spark 2.1.1 Driver OOM when use interaction for large scale Sparse Vector

2017-09-17 Thread haibo wu
I'm working on large scale logistic regression for ctr prediction, and when user interaction for feature engineer, driver OOM. For detail, I interact among userid(one-hot, 30w dimension, sparse) and base features(60 dimensions, dense), driver memory is set to 40g. So, I try to debug from remote,

Re: ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread kant kodali
Are you creating threads in your application? On Sun, Sep 17, 2017 at 7:48 AM, HARSH TAKKAR wrote: > > Hi > > I am using spark 2.1.0 with scala 2.11.8, and while iterating over the > partitions of each rdd in a dStream formed using KafkaUtils, i am getting > the below

ConcurrentModificationException using Kafka Direct Stream

2017-09-17 Thread HARSH TAKKAR
Hi I am using spark 2.1.0 with scala 2.11.8, and while iterating over the partitions of each rdd in a dStream formed using KafkaUtils, i am getting the below exception, please suggest a fix. I have following config kafka : enable.auto.commit:"true", auto.commit.interval.ms:"1000",