Yes, I realize that there's a standard way and then there's the way where
client asks 'how fast can it write the data'. That is what I'm trying to figure
out. At the moment I'm far from disks teorethical write speed when combining
all the disks together.
On 05 Apr 2016, at 23:21, Mich
Yep,
I used dfsio and also Teragen but I would like to experiment with ad-hoc Spark
prog.
-jan
On 05 Apr 2016, at 23:13, Sebastian Piu
> wrote:
You could they using TestDFSIO for raw hdfs performance, but we found it not
very relevant
so that throughput per second. You can try Spark streaming saving it to
HDFS and increase the throttle.
The general accepted form is to measure service time which is the average
service time for IO requests in ms
Dr Mich Talebzadeh
LinkedIn *
You could they using TestDFSIO for raw hdfs performance, but we found it
not very relevant
Another way could be to either generate a file and then read it and write
it back. For some of our use cases we are populated a Kafka queue on the
cluster (on different disks) and used spark streaming to do
I'm trying to get rough estimate how much data I can write within certain time
period (GB/sec).
-jan
On 05 Apr 2016, at 22:49, Mich Talebzadeh
> wrote:
Hi Jan,
What is the definition of stress test in here? What are the matrices?
Hi Jan,
What is the definition of stress test in here? What are the matrices?
Throughput of data, latency, velocity, volume?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Hi,
I'm trying to figure out how to write lots of data from each worker. I tried
rdd.saveAsTextFile but got OOM when generating 1024MB string for a worker.
Increasing worker memory would mean that I should drop the number of workers.
Soo, any idea how to write ex. 1gb file from each worker?