Yep, I used dfsio and also Teragen but I would like to experiment with ad-hoc Spark prog. -jan
On 05 Apr 2016, at 23:13, Sebastian Piu <sebastian....@gmail.com<mailto:sebastian....@gmail.com>> wrote: You could they using TestDFSIO for raw hdfs performance, but we found it not very relevant Another way could be to either generate a file and then read it and write it back. For some of our use cases we are populated a Kafka queue on the cluster (on different disks) and used spark streaming to do a simple transformation and write back. You can use graphite+grafana for the IO monitoring On Tue, 5 Apr 2016, 20:56 Jan Holmberg, <jan.holmb...@perigeum.fi<mailto:jan.holmb...@perigeum.fi>> wrote: I'm trying to get rough estimate how much data I can write within certain time period (GB/sec). -jan On 05 Apr 2016, at 22:49, Mich Talebzadeh <mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote: Hi Jan, What is the definition of stress test in here? What are the matrices? Throughput of data, latency, velocity, volume? HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/> On 5 April 2016 at 20:42, Jan Holmberg <jan.holmb...@perigeum.fi<mailto:jan.holmb...@perigeum.fi>> wrote: Hi, I'm trying to figure out how to write lots of data from each worker. I tried rdd.saveAsTextFile but got OOM when generating 1024MB string for a worker. Increasing worker memory would mean that I should drop the number of workers. Soo, any idea how to write ex. 1gb file from each worker? cheers, -jan --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>