Re: Stress testing hdfs with Spark

2016-04-05 Thread Jan Holmberg
Yes, I realize that there's a standard way and then there's the way where 
client asks 'how fast can it write the data'. That is what I'm trying to figure 
out. At the moment I'm far from disks teorethical write speed when combining 
all the disks together.

On 05 Apr 2016, at 23:21, Mich Talebzadeh 
> wrote:

so that throughput per second. You can try Spark streaming saving it to HDFS 
and increase the throttle.

The general accepted form is to measure service time which is the average 
service time for IO requests in ms


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 5 April 2016 at 20:56, Jan Holmberg 
> wrote:
I'm trying to get rough estimate how much data I can write within certain time 
period (GB/sec).
-jan

On 05 Apr 2016, at 22:49, Mich Talebzadeh 
> wrote:

Hi Jan,

What is the definition of stress test in here? What are the matrices? 
Throughput of data, latency, velocity, volume?

HTH


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 5 April 2016 at 20:42, Jan Holmberg 
> wrote:
Hi,
I'm trying to figure out how to write lots of data from each worker. I tried 
rdd.saveAsTextFile but got OOM when generating 1024MB string for a worker. 
Increasing worker memory would mean that I should drop the number of workers.
Soo, any idea how to write ex. 1gb file from each worker?

cheers,
-jan
-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.org





Re: Stress testing hdfs with Spark

2016-04-05 Thread Jan Holmberg
Yep,
I used dfsio and also Teragen but I would like to experiment with ad-hoc Spark 
prog.
-jan

On 05 Apr 2016, at 23:13, Sebastian Piu 
> wrote:


You could they using TestDFSIO for raw hdfs performance, but we found it not 
very relevant

Another way could be to either generate a file and then read it and write it 
back. For some of our use cases we are populated a Kafka queue on the cluster 
(on different disks) and used spark streaming to do a simple transformation and 
write back.

You can use graphite+grafana for the IO monitoring

On Tue, 5 Apr 2016, 20:56 Jan Holmberg, 
> wrote:
I'm trying to get rough estimate how much data I can write within certain time 
period (GB/sec).
-jan

On 05 Apr 2016, at 22:49, Mich Talebzadeh 
> wrote:

Hi Jan,

What is the definition of stress test in here? What are the matrices? 
Throughput of data, latency, velocity, volume?

HTH


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 5 April 2016 at 20:42, Jan Holmberg 
> wrote:
Hi,
I'm trying to figure out how to write lots of data from each worker. I tried 
rdd.saveAsTextFile but got OOM when generating 1024MB string for a worker. 
Increasing worker memory would mean that I should drop the number of workers.
Soo, any idea how to write ex. 1gb file from each worker?

cheers,
-jan
-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.org




Re: Stress testing hdfs with Spark

2016-04-05 Thread Mich Talebzadeh
so that throughput per second. You can try Spark streaming saving it to
HDFS and increase the throttle.

The general accepted form is to measure service time which is the average
service time for IO requests in ms

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 5 April 2016 at 20:56, Jan Holmberg  wrote:

> I'm trying to get rough estimate how much data I can write within certain
> time period (GB/sec).
> -jan
>
> On 05 Apr 2016, at 22:49, Mich Talebzadeh 
> wrote:
>
> Hi Jan,
>
> What is the definition of stress test in here? What are the matrices?
> Throughput of data, latency, velocity, volume?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 5 April 2016 at 20:42, Jan Holmberg  wrote:
>
>> Hi,
>> I'm trying to figure out how to write lots of data from each worker. I
>> tried rdd.saveAsTextFile but got OOM when generating 1024MB string for a
>> worker. Increasing worker memory would mean that I should drop the number
>> of workers.
>> Soo, any idea how to write ex. 1gb file from each worker?
>>
>> cheers,
>> -jan
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Stress testing hdfs with Spark

2016-04-05 Thread Sebastian Piu
You could they using TestDFSIO for raw hdfs performance, but we found it
not very relevant

Another way could be to either generate a file and then read it and write
it back. For some of our use cases we are populated a Kafka queue on the
cluster (on different disks) and used spark streaming to do a simple
transformation and write back.

You can use graphite+grafana for the IO monitoring

On Tue, 5 Apr 2016, 20:56 Jan Holmberg,  wrote:

> I'm trying to get rough estimate how much data I can write within certain
> time period (GB/sec).
> -jan
>
> On 05 Apr 2016, at 22:49, Mich Talebzadeh 
> wrote:
>
> Hi Jan,
>
> What is the definition of stress test in here? What are the matrices?
> Throughput of data, latency, velocity, volume?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 5 April 2016 at 20:42, Jan Holmberg  wrote:
>
>> Hi,
>> I'm trying to figure out how to write lots of data from each worker. I
>> tried rdd.saveAsTextFile but got OOM when generating 1024MB string for a
>> worker. Increasing worker memory would mean that I should drop the number
>> of workers.
>> Soo, any idea how to write ex. 1gb file from each worker?
>>
>> cheers,
>> -jan
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Stress testing hdfs with Spark

2016-04-05 Thread Jan Holmberg
I'm trying to get rough estimate how much data I can write within certain time 
period (GB/sec).
-jan

On 05 Apr 2016, at 22:49, Mich Talebzadeh 
> wrote:

Hi Jan,

What is the definition of stress test in here? What are the matrices? 
Throughput of data, latency, velocity, volume?

HTH


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



On 5 April 2016 at 20:42, Jan Holmberg 
> wrote:
Hi,
I'm trying to figure out how to write lots of data from each worker. I tried 
rdd.saveAsTextFile but got OOM when generating 1024MB string for a worker. 
Increasing worker memory would mean that I should drop the number of workers.
Soo, any idea how to write ex. 1gb file from each worker?

cheers,
-jan
-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.org




Re: Stress testing hdfs with Spark

2016-04-05 Thread Mich Talebzadeh
Hi Jan,

What is the definition of stress test in here? What are the matrices?
Throughput of data, latency, velocity, volume?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 5 April 2016 at 20:42, Jan Holmberg  wrote:

> Hi,
> I'm trying to figure out how to write lots of data from each worker. I
> tried rdd.saveAsTextFile but got OOM when generating 1024MB string for a
> worker. Increasing worker memory would mean that I should drop the number
> of workers.
> Soo, any idea how to write ex. 1gb file from each worker?
>
> cheers,
> -jan
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>