Re: Unsubscribe

2019-01-30 Thread Soumitra Johri
unsubscribe On Wed, Jan 30, 2019 at 10:05 PM wang wei wrote: > Unsubscribe >

Re: Kafka failover with multiple data centers

2017-03-27 Thread Soumitra Johri
Hi, did you guys figure it out? Thanks Soumitra On Sun, Mar 5, 2017 at 9:51 PM nguyen duc Tuan wrote: > Hi everyone, > We are deploying kafka cluster for ingesting streaming data. But > sometimes, some of nodes on the cluster have troubles (node dies, kafka > daemon is

Re: Is executor computing time affected by network latency?

2016-09-22 Thread Soumitra Johri
If your job involves a shuffle then the compute for the entire batch will increase with network latency. What would be interesting is to see how much time each task/job/stage takes. On Thu, Sep 22, 2016 at 5:11 PM Peter Figliozzi wrote: > It seems to me they must

How does MapWithStateRDD distribute the data

2016-08-03 Thread Soumitra Johri
Hi, I am running a steaming job with 4 executors and 16 cores so that each executor has two cores to work with. The input Kafka topic has 4 partitions. With this given configuration I was expecting MapWithStateRDD to be evenly distributed across all executors, how ever I see that it uses only two

Re: Spark Streaming : is spark.streaming.receiver.maxRate valid for DirectKafkaApproach

2016-05-10 Thread Soumitra Johri
I think a better partitioning scheme can help u too. On Tue, May 10, 2016 at 10:31 AM Cody Koeninger wrote: > maxRate is not used by the direct stream. > > Significant skew in rate across different partitions for the same > topic is going to cause you all kinds of problems,

inter spark application communication

2016-04-18 Thread Soumitra Johri
Hi, I have two applications : App1 and App2. On a single cluster I have to spawn 5 instances os App1 and 1 instance of App2. What would be the best way to send data from the 5 App1 instances to the single App2 instance ? Right now I am using Kafka to send data from one spark application to the

UpdateStateByKey : Partitioning and Shuffle

2016-01-05 Thread Soumitra Johri
Hi, I am relatively new to Spark and am using updateStateByKey() operation to maintain state in my Spark Streaming application. The input data is coming through a Kafka topic. 1. I want to understand how are DStreams partitioned? 2. How does the partitioning work with mapWithState() or

2 of 20,675 Spark Streaming : Out put frequency different from read frequency in StatefulNetworkWordCount

2015-12-30 Thread Soumitra Johri
Hi, in the example : https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala the streaming frequency is 1 seconds however I do not want to print the contents of the word-counts every minute and resent the word counts

RDD Indexes and how to fetch all edges with a given label

2014-10-14 Thread Soumitra Johri
Hi All, I have a Graph with millions of edges. Edges are represented by org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[String]] = MappedRDD[4] . I have two questions : 1)How can I fetch all the nodes with a given edge label ( all edges with a given property ) 2) Is it possible to create

Re: RDD Indexes and how to fetch all edges with a given label

2014-10-14 Thread Soumitra Johri
Hi, With respect to the first issue, one possible way is to filter the graph via 'graph.subgraph(epred = e = e.attr == edgeLabel)' , but I am still curious if we can index RDDs. Warm Regards Soumitra On Tue, Oct 14, 2014 at 2:46 PM, Soumitra Johri soumitra.siddha...@gmail.com wrote: Hi