Re: Stress testing hdfs with Spark

Jan Holmberg Tue, 05 Apr 2016 13:23:02 -0700

Yep,
I used dfsio and also Teragen but I would like to experiment with ad-hoc Spark 
prog.
-jan


On 05 Apr 2016, at 23:13, Sebastian Piu 
<sebastian....@gmail.com<mailto:sebastian....@gmail.com>> wrote:


You could they using TestDFSIO for raw hdfs performance, but we found it not 
very relevant

Another way could be to either generate a file and then read it and write it 
back. For some of our use cases we are populated a Kafka queue on the cluster 
(on different disks) and used spark streaming to do a simple transformation and 
write back.

You can use graphite+grafana for the IO monitoring

On Tue, 5 Apr 2016, 20:56 Jan Holmberg, 
<jan.holmb...@perigeum.fi<mailto:jan.holmb...@perigeum.fi>> wrote:
I'm trying to get rough estimate how much data I can write within certain time 
period (GB/sec).
-jan

On 05 Apr 2016, at 22:49, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:

Hi Jan,

What is the definition of stress test in here? What are the matrices? 
Throughput of data, latency, velocity, volume?

HTH


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>



On 5 April 2016 at 20:42, Jan Holmberg 
<jan.holmb...@perigeum.fi<mailto:jan.holmb...@perigeum.fi>> wrote:
Hi,
I'm trying to figure out how to write lots of data from each worker. I tried 
rdd.saveAsTextFile but got OOM when generating 1024MB string for a worker. 
Increasing worker memory would mean that I should drop the number of workers.
Soo, any idea how to write ex. 1gb file from each worker?

cheers,
-jan
---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

Re: Stress testing hdfs with Spark

Reply via email to