On Fri, Feb 13, 2015 at 2:21 AM, Gerard Maas gerard.m...@gmail.com wrote:
KafkaOutputServicePool
Could you please give an example code of how KafkaOutputServicePool would
look like? When I tried object pooling I end up with various not
serializable exceptions.
Thanks!
Josh
at 10:29 PM, Josh J joshjd...@gmail.com wrote:
Hi,
I plan to run a parameter search varying the number of cores, epoch, and
parallelism. The web console provides a way to archive the previous runs,
though is there a way to view in the console the throughput? Rather than
logging
On Wed, Feb 25, 2015 at 7:54 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
For SparkStreaming applications, there is already a tab called Streaming
which displays the basic statistics.
Would I just need to extend this tab to add the throughput?
Hi,
I plan to run a parameter search varying the number of cores, epoch, and
parallelism. The web console provides a way to archive the previous runs,
though is there a way to view in the console the throughput? Rather than
logging the throughput separately to the log files and correlating the
Hi,
I have a stream pipeline which invokes map, reduceByKey, filter, and
flatMap. How can I measure the time taken in each stage?
Thanks,
Josh
We have dockerized Spark Master and worker(s) separately and are using it
in
our dev environment.
Is this setup available on github or dockerhub?
On Tue, Dec 9, 2014 at 3:50 PM, Venkat Subramanian vsubr...@gmail.com
wrote:
We have dockerized Spark Master and worker(s) separately and are
Hi,
I'm trying to run Spark Streaming standalone on two nodes. I'm able to run
on a single node fine. I start both workers and it registers in the Spark
UI. However, the application says
SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
Any ideas?
Thanks,
Josh
Hi,
I'm trying to using sampling with Spark Streaming. I imported the following
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
I then call sample
val streamtoread = KafkaUtils.createStream(ssc, zkQuorum, group,
Hi,
In the spark docs
http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node
it mentions However, output operations (like foreachRDD) have *at-least
once* semantics, that is, the transformed data may get written to an
external entity more than once in the
Is there a way to do this that preserves exactly once semantics for the
write to Kafka?
On Tue, Sep 2, 2014 at 12:30 PM, Tim Smith secs...@gmail.com wrote:
I'd be interested in finding the answer too. Right now, I do:
val kafkaOutMsgs = kafkInMessages.map(x=myFunc(x._2,someParam))
Hi,
I was wondering if the adaptive stream processing and dynamic batch
processing was available to use in spark streaming? If someone could help
point me in the right direction?
Thanks,
Josh
Referring to this paper http://dl.acm.org/citation.cfm?id=2670995.
On Fri, Nov 14, 2014 at 10:42 AM, Josh J joshjd...@gmail.com wrote:
Hi,
I was wondering if the adaptive stream processing and dynamic batch
processing was available to use in spark streaming? If someone could help
point me
Hi,
Is it possible to concatenate or append two Dstreams together? I have an
incoming stream that I wish to combine with data that's generated by a
utility. I then need to process the combined Dstream.
Thanks,
Josh
I think it's just called union
On Tue, Nov 11, 2014 at 2:41 PM, Josh J joshjd...@gmail.com wrote:
Hi,
Is it possible to concatenate or append two Dstreams together? I have an
incoming stream that I wish to combine with data that's generated by a
utility. I then need to process the combined
Hi,
I have some data generated by some utilities that returns the results as
a ListString. I would like to join this with a Dstream of strings. How
can I do this? I tried the following though get scala compiler errors
val list_scalaconverted = ssc.sparkContext.parallelize(listvalues.toArray())
: Ordering[K], implicit ctag:
scala.reflect.ClassTag[K])org.apache.spark.rdd.RDD[String].
Unspecified value parameter f.
On Tue, Nov 4, 2014 at 11:28 AM, Josh J joshjd...@gmail.com wrote:
Hi,
Does anyone have any good examples of using sortby for RDDs and scala?
I'm receiving
not enough
Hi,
Is there a nice or optimal method to randomly shuffle spark streaming RDDs?
Thanks,
Josh
? in general RDDs don't have
ordering at all -- excepting when you sort for example -- so a
permutation doesn't make sense. Do you just want a well-defined but
random ordering of the data? Do you just want to (re-)assign elements
randomly to partitions?
On Mon, Nov 3, 2014 at 4:33 PM, Josh J joshjd
is guaranteed about that.
If you want to permute an RDD, how about a sortBy() on a good hash
function of each value plus some salt? (Haven't thought this through
much but sounds about right.)
On Mon, Nov 3, 2014 at 4:59 PM, Josh J joshjd...@gmail.com wrote:
When I'm outputting the RDDs
Hi,
How do I run multiple spark applications in parallel? I tried to run on
yarn cluster, though the second application submitted does not run.
Thanks,
Josh
, Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
32 GB RAM
Thanks,
Josh
On Tue, Oct 28, 2014 at 4:15 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:
Try reducing the resources (cores and memory) of each application.
On Oct 28, 2014, at 7:05 PM, Josh J joshjd...@gmail.com wrote:
Hi,
How
Hi,
Is the following guaranteed to always provide an exact count?
foreachRDD(foreachFunc = rdd = {
rdd.count()
In the literature it mentions However, output operations (like foreachRDD)
have *at-least once* semantics, that is, the transformed data may get
written to an external entity more
Hi,
How could I combine rdds? I would like to combine two RDDs if the count in
an RDD is not above some threshold.
Thanks,
Josh
Hi,
Is there a dockerfiles available which allow to setup a docker spark 1.1.0
cluster?
Thanks,
Josh
Hi,
How can I join neighbor sliding windows in spark streaming?
Thanks,
Josh
of countByWindow with a
function that performs the save operation.
On Fri, Aug 22, 2014 at 1:58 AM, Josh J joshjd...@gmail.com wrote:
Hi,
Hopefully a simple question. Though is there an example of where to save
the output of countByWindow ? I would like to save the results to external
storage (kafka
Hi,
Hopefully a simple question. Though is there an example of where to save
the output of countByWindow ? I would like to save the results to external
storage (kafka or redis). The examples show only stream.print()
Thanks,
Josh
Hi,
Can I build two sliding windows in parallel from the same Dstream ? Will
these two window streams run in parallel and process the same data? I wish
to do two different functions (aggregration on one window and storage for
the other window) across the same original dstream data though the same
Hi,
I would like to have a sliding window dstream perform a streaming
computation and store these results. Once these results are stored, I then
would like to process the results. Though I must wait until the final
computation done for all tuples in the sliding window, before I begin the
new
Hi,
Whats the difference between amplab docker
https://github.com/amplab/docker-scripts and spark docker
https://github.com/apache/spark/tree/master/docker?
Thanks,
Josh
30 matches
Mail list logo