You could they using TestDFSIO for raw hdfs performance, but we found it
not very relevant

Another way could be to either generate a file and then read it and write
it back. For some of our use cases we are populated a Kafka queue on the
cluster (on different disks) and used spark streaming to do a simple
transformation and write back.

You can use graphite+grafana for the IO monitoring

On Tue, 5 Apr 2016, 20:56 Jan Holmberg, <jan.holmb...@perigeum.fi> wrote:

> I'm trying to get rough estimate how much data I can write within certain
> time period (GB/sec).
> -jan
>
> On 05 Apr 2016, at 22:49, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
> Hi Jan,
>
> What is the definition of stress test in here? What are the matrices?
> Throughput of data, latency, velocity, volume?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 5 April 2016 at 20:42, Jan Holmberg <jan.holmb...@perigeum.fi> wrote:
>
>> Hi,
>> I'm trying to figure out how to write lots of data from each worker. I
>> tried rdd.saveAsTextFile but got OOM when generating 1024MB string for a
>> worker. Increasing worker memory would mean that I should drop the number
>> of workers.
>> Soo, any idea how to write ex. 1gb file from each worker?
>>
>> cheers,
>> -jan
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to