Hi everyone,
I have a JavaPairDStream object and I'd like the Driver to
create a txt file (on HDFS) containing all of its elements.
At the moment, I use the /coalesce(1, true)/ method:
JavaPairDStream unified = [partitioned stuff]
unified.foreachRDD(new
Hello everyone,
in the new Kafka Direct API, what are the benefits of setting a value for
*spark.streaming.maxRatePerPartition*?
In my case, I have 2 seconds batches consuming ~15k tuples from a topic
split into 48 partitions (4 workers, 16 total cores).
Is there any particular value I should
Hi everyone,
I recently started using the new Kafka direct approach.
Now, as far as I understood, each Kafka partition /is/ an RDD partition that
will be processed by a single core.
What I don't understand is the relation between those partitions and the
blocks generated every blockInterval.
Hi everyone,
I'm working with Spark Streaming, and I need to perform some offline
performance measures.
What I'd like to have is a CSV file that reports something like this:
*Batch number/timestampInput SizeTotal Delay*
which is in fact similar to what the UI outputs.
I tried
Hi,
Spark's configuration file (useful to retrieve metrics), namely
//conf/metrics.properties/, states what follows:
Within an instance, a source specifies a particular set of grouped
metrics. there are two kinds of sources:
1. Spark internal sources, like /MasterSource/, /WorkerSource/, etc,
Hi everybody,
is there in Spark anything sharing the philosophy of Storm's field grouping?
I'd like to manage data partitioning across the workers by sending tuples
sharing the same key to the very same worker in the cluster, but I did not
find any method to do that.
Suggestions?
:)
--
View
Has there been any follow up on this topic?
Here http://search-hadoop.com/m/q3RTtm4EtI1hoD8d2 there were suggestions
that someone was going to publish some code, but no news since (TD himself
looked pretty interested).
Did anybody come up with something in the last months?
--
View this
Hi everybody,
I think I could use some help with the /updateStateByKey()/ JAVA method in
Spark Streaming.
*Context:*
I have a /JavaReceiverInputDStreamDataUpdate du/ DStream, where object
/DataUpdate/ mainly has 2 fields of interest (in my case), namely
du.personId (an Integer) and