Re: Streaming scheduling delay

2015-03-01 Thread Josh J
On Fri, Feb 13, 2015 at 2:21 AM, Gerard Maas gerard.m...@gmail.com wrote: KafkaOutputServicePool Could you please give an example code of how KafkaOutputServicePool would look like? When I tried object pooling I end up with various not serializable exceptions. Thanks! Josh

Re: throughput in the web console?

2015-02-25 Thread Josh J
at 10:29 PM, Josh J joshjd...@gmail.com wrote: Hi, I plan to run a parameter search varying the number of cores, epoch, and parallelism. The web console provides a way to archive the previous runs, though is there a way to view in the console the throughput? Rather than logging

Re: throughput in the web console?

2015-02-25 Thread Josh J
On Wed, Feb 25, 2015 at 7:54 AM, Akhil Das ak...@sigmoidanalytics.com wrote: For SparkStreaming applications, there is already a tab called Streaming which displays the basic statistics. Would I just need to extend this tab to add the throughput?

throughput in the web console?

2015-02-24 Thread Josh J
Hi, I plan to run a parameter search varying the number of cores, epoch, and parallelism. The web console provides a way to archive the previous runs, though is there a way to view in the console the throughput? Rather than logging the throughput separately to the log files and correlating the

measuring time taken in map, reduceByKey, filter, flatMap

2015-01-30 Thread Josh J
Hi, I have a stream pipeline which invokes map, reduceByKey, filter, and flatMap. How can I measure the time taken in each stage? Thanks, Josh

Re: dockerized spark executor on mesos?

2015-01-14 Thread Josh J
We have dockerized Spark Master and worker(s) separately and are using it in our dev environment. Is this setup available on github or dockerhub? On Tue, Dec 9, 2014 at 3:50 PM, Venkat Subramanian vsubr...@gmail.com wrote: We have dockerized Spark Master and worker(s) separately and are

spark standalone master with workers on two nodes

2015-01-13 Thread Josh J
Hi, I'm trying to run Spark Streaming standalone on two nodes. I'm able to run on a single node fine. I start both workers and it registers in the Spark UI. However, the application says SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 Any ideas? Thanks, Josh

sample is not a member of org.apache.spark.streaming.dstream.DStream

2014-12-28 Thread Josh J
Hi, I'm trying to using sampling with Spark Streaming. I imported the following import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.SparkContext._ I then call sample val streamtoread = KafkaUtils.createStream(ssc, zkQuorum, group,

kafka pipeline exactly once semantics

2014-11-30 Thread Josh J
Hi, In the spark docs http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node it mentions However, output operations (like foreachRDD) have *at-least once* semantics, that is, the transformed data may get written to an external entity more than once in the

Re: Publishing a transformed DStream to Kafka

2014-11-30 Thread Josh J
Is there a way to do this that preserves exactly once semantics for the write to Kafka? On Tue, Sep 2, 2014 at 12:30 PM, Tim Smith secs...@gmail.com wrote: I'd be interested in finding the answer too. Right now, I do: val kafkaOutMsgs = kafkInMessages.map(x=myFunc(x._2,someParam))

Adaptive stream processing and dynamic batch sizing

2014-11-14 Thread Josh J
Hi, I was wondering if the adaptive stream processing and dynamic batch processing was available to use in spark streaming? If someone could help point me in the right direction? Thanks, Josh

Re: Adaptive stream processing and dynamic batch sizing

2014-11-14 Thread Josh J
Referring to this paper http://dl.acm.org/citation.cfm?id=2670995. On Fri, Nov 14, 2014 at 10:42 AM, Josh J joshjd...@gmail.com wrote: Hi, I was wondering if the adaptive stream processing and dynamic batch processing was available to use in spark streaming? If someone could help point me

concat two Dstreams

2014-11-11 Thread Josh J
Hi, Is it possible to concatenate or append two Dstreams together? I have an incoming stream that I wish to combine with data that's generated by a utility. I then need to process the combined Dstream. Thanks, Josh

Re: concat two Dstreams

2014-11-11 Thread Josh J
I think it's just called union On Tue, Nov 11, 2014 at 2:41 PM, Josh J joshjd...@gmail.com wrote: Hi, Is it possible to concatenate or append two Dstreams together? I have an incoming stream that I wish to combine with data that's generated by a utility. I then need to process the combined

convert ListString to dstream

2014-11-10 Thread Josh J
Hi, I have some data generated by some utilities that returns the results as a ListString. I would like to join this with a Dstream of strings. How can I do this? I tried the following though get scala compiler errors val list_scalaconverted = ssc.sparkContext.parallelize(listvalues.toArray())

Re: scala RDD sortby compilation error

2014-11-04 Thread Josh J
: Ordering[K], implicit ctag: scala.reflect.ClassTag[K])org.apache.spark.rdd.RDD[String]. Unspecified value parameter f. On Tue, Nov 4, 2014 at 11:28 AM, Josh J joshjd...@gmail.com wrote: Hi, Does anyone have any good examples of using sortby for RDDs and scala? I'm receiving not enough

random shuffle streaming RDDs?

2014-11-03 Thread Josh J
Hi, Is there a nice or optimal method to randomly shuffle spark streaming RDDs? Thanks, Josh

Re: random shuffle streaming RDDs?

2014-11-03 Thread Josh J
? in general RDDs don't have ordering at all -- excepting when you sort for example -- so a permutation doesn't make sense. Do you just want a well-defined but random ordering of the data? Do you just want to (re-)assign elements randomly to partitions? On Mon, Nov 3, 2014 at 4:33 PM, Josh J joshjd

Re: random shuffle streaming RDDs?

2014-11-03 Thread Josh J
is guaranteed about that. If you want to permute an RDD, how about a sortBy() on a good hash function of each value plus some salt? (Haven't thought this through much but sounds about right.) On Mon, Nov 3, 2014 at 4:59 PM, Josh J joshjd...@gmail.com wrote: When I'm outputting the RDDs

run multiple spark applications in parallel

2014-10-28 Thread Josh J
Hi, How do I run multiple spark applications in parallel? I tried to run on yarn cluster, though the second application submitted does not run. Thanks, Josh

Re: run multiple spark applications in parallel

2014-10-28 Thread Josh J
, Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz 32 GB RAM Thanks, Josh On Tue, Oct 28, 2014 at 4:15 PM, Soumya Simanta soumya.sima...@gmail.com wrote: Try reducing the resources (cores and memory) of each application. On Oct 28, 2014, at 7:05 PM, Josh J joshjd...@gmail.com wrote: Hi, How

exact count using rdd.count()?

2014-10-27 Thread Josh J
Hi, Is the following guaranteed to always provide an exact count? foreachRDD(foreachFunc = rdd = { rdd.count() In the literature it mentions However, output operations (like foreachRDD) have *at-least once* semantics, that is, the transformed data may get written to an external entity more

combine rdds?

2014-10-27 Thread Josh J
Hi, How could I combine rdds? I would like to combine two RDDs if the count in an RDD is not above some threshold. Thanks, Josh

docker spark 1.1.0 cluster

2014-10-24 Thread Josh J
Hi, Is there a dockerfiles available which allow to setup a docker spark 1.1.0 cluster? Thanks, Josh

streaming join sliding windows

2014-10-22 Thread Josh J
Hi, How can I join neighbor sliding windows in spark streaming? Thanks, Josh

Re: countByWindow save the count ?

2014-08-26 Thread Josh J
of countByWindow with a function that performs the save operation. On Fri, Aug 22, 2014 at 1:58 AM, Josh J joshjd...@gmail.com wrote: Hi, Hopefully a simple question. Though is there an example of where to save the output of countByWindow ? I would like to save the results to external storage (kafka

countByWindow save the count ?

2014-08-22 Thread Josh J
Hi, Hopefully a simple question. Though is there an example of where to save the output of countByWindow ? I would like to save the results to external storage (kafka or redis). The examples show only stream.print() Thanks, Josh

multiple windows from the same DStream ?

2014-08-21 Thread Josh J
Hi, Can I build two sliding windows in parallel from the same Dstream ? Will these two window streams run in parallel and process the same data? I wish to do two different functions (aggregration on one window and storage for the other window) across the same original dstream data though the same

DStream start a separate DStream

2014-08-21 Thread Josh J
Hi, I would like to have a sliding window dstream perform a streaming computation and store these results. Once these results are stored, I then would like to process the results. Though I must wait until the final computation done for all tuples in the sliding window, before I begin the new

Difference between amplab docker and spark docker?

2014-08-20 Thread Josh J
Hi, Whats the difference between amplab docker https://github.com/amplab/docker-scripts and spark docker https://github.com/apache/spark/tree/master/docker? Thanks, Josh