Sorry I made a mistake. Please ignore my question.
On Tue, Mar 3, 2015 at 2:47 AM, Saiph Kappa saiph.ka...@gmail.com wrote:
I performed repartitioning and everything went fine with respect to the
number of CPU cores being used (and respective times). However, I noticed
something very strange:
I performed repartitioning and everything went fine with respect to the
number of CPU cores being used (and respective times). However, I noticed
something very strange: inside a map operation I was doing a very simple
calculation and always using the same dataset (small enough to be entirely
One more question: while processing the exact same batch I noticed that
giving more CPUs to the worker does not decrease the duration of the batch.
I tried this with 4 and 8 CPUs. Though, I noticed that giving only 1 CPU
the duration increased, but apart from that the values were pretty similar,
By setting spark.eventLog.enabled to true it is possible to see the
application UI after the application has finished its execution, however
the Streaming tab is no longer visible.
For measuring the duration of batches in the code I am doing something like
this:
«wordCharValues.foreachRDD(rdd = {
If you have one receiver, and you are doing only map-like operaitons then
the process will primarily happen on one machine. To use all the machines,
either receiver in parallel with multiple receivers, or spread out the
computation by explicitly repartitioning the received streams
Let me ask like this, what would be the easiest way to display the
throughput in the web console? Would I need to create a new tab and add the
metrics? Any good or simple examples showing how this can be done?
On Wed, Feb 25, 2015 at 12:07 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Did you
For SparkStreaming applications, there is already a tab called Streaming
which displays the basic statistics.
Thanks
Best Regards
On Wed, Feb 25, 2015 at 8:55 PM, Josh J joshjd...@gmail.com wrote:
Let me ask like this, what would be the easiest way to display the
throughput in the web console
On Wed, Feb 25, 2015 at 7:54 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
For SparkStreaming applications, there is already a tab called Streaming
which displays the basic statistics.
Would I just need to extend this tab to add the throughput?
By throughput you mean Number of events processed etc?
[image: Inline image 1]
Streaming tab already have these statistics.
Thanks
Best Regards
On Wed, Feb 25, 2015 at 9:59 PM, Josh J joshjd...@gmail.com wrote:
On Wed, Feb 25, 2015 at 7:54 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Hi Josh,
SPM will show you this info. I see you use Kafka, too, whose numerous metrics
you can also see in SPM side by side with your Spark metrics. Sounds like
trends is what you are after, so I hope this helps. See http://sematext.com/spm
Otis
On Feb 24, 2015, at 11:59, Josh J
Yes. # tuples processed in a batch = sum of all the tuples received by all
the receivers.
In screen shot, there was a batch with 69.9K records, and there was a batch
which took 1 s 473 ms. These two batches can be the same, can be different
batches.
TD
On Wed, Feb 25, 2015 at 10:11 AM, Josh J
Did you have a look at
https://spark.apache.org/docs/1.0.2/api/scala/index.html#org.apache.spark.scheduler.SparkListener
And for Streaming:
https://spark.apache.org/docs/1.0.2/api/scala/index.html#org.apache.spark.streaming.scheduler.StreamingListener
Thanks
Best Regards
On Tue, Feb 24,
Hi,
I plan to run a parameter search varying the number of cores, epoch, and
parallelism. The web console provides a way to archive the previous runs,
though is there a way to view in the console the throughput? Rather than
logging the throughput separately to the log files and correlating the
13 matches
Mail list logo