Timer jobs

2011-09-01 Thread Per Steffensen

Hi

I use hadoop for a MapReduce job in my system. I would like to have the 
job run very 5th minute. Are there any "distributed" timer job stuff in 
hadoop? Of course I could setup a timer in an external timer framework 
(CRON or something like that) that invokes the MapReduce job. But CRON 
is only running on one particular machine, so if that machine goes down 
my job will not be triggered. Then I could setup the timer on all or 
many machines, but I would not like the job to be run in more than one 
instance every 5th minute, so then the timer jobs would need to 
coordinate who is actually starting the job "this time" and all the rest 
would just have to do nothing. Guess I could come up with a solution to 
that - e.g. writing some "lock" stuff using HDFS files or by using 
ZooKeeper. But I would really like if someone had already solved the 
problem, and provided some kind of a "distributed timer framework" 
running in a "cluster", so that I could just register a timer job with 
the cluster, and then be sure that it is invoked every 5th minute, no 
matter if one or two particular machines in the cluster is down.


Any suggestions are very welcome.

Regards, Per Steffensen


Re: Timer jobs

2011-09-01 Thread Per Steffensen

Hi

Thanks a lot for pointing me to Oozie. I have looked a little bit into 
Oozie and it seems like the "component" triggering jobs is called 
"Coordinator Application". But I really see nowhere that this 
Coordinator Application doesnt just run on a single machine, and that it 
will therefore not trigger anything if this machine is down. Can you 
confirm that the "Coordinator Application"-role is distributed in a 
distribued Oozie setup, so that jobs gets triggered even if one or two 
machines are down?


Regards, Per Steffensen

Ronen Itkin skrev:

Hi

Try to use Oozie for job coordination and work flows.



On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen  wrote:

  

Hi

I use hadoop for a MapReduce job in my system. I would like to have the job
run very 5th minute. Are there any "distributed" timer job stuff in hadoop?
Of course I could setup a timer in an external timer framework (CRON or
something like that) that invokes the MapReduce job. But CRON is only
running on one particular machine, so if that machine goes down my job will
not be triggered. Then I could setup the timer on all or many machines, but
I would not like the job to be run in more than one instance every 5th
minute, so then the timer jobs would need to coordinate who is actually
starting the job "this time" and all the rest would just have to do nothing.
Guess I could come up with a solution to that - e.g. writing some "lock"
stuff using HDFS files or by using ZooKeeper. But I would really like if
someone had already solved the problem, and provided some kind of a
"distributed timer framework" running in a "cluster", so that I could just
register a timer job with the cluster, and then be sure that it is invoked
every 5th minute, no matter if one or two particular machines in the cluster
is down.

Any suggestions are very welcome.

Regards, Per Steffensen






  




Re: Timer jobs

2011-09-01 Thread Per Steffensen

Thanks for your response. See comments below.

Regards, Per Steffensen

Alejandro Abdelnur skrev:

[moving common-user@ to BCC]

Oozie is not HA yet. But it would be relatively easy to make it. It was
designed with that in mind, we even did a prototype.
  
Ok, so if it isnt HA out-of-the-box I believe Oozie is too big a 
framework for my needs - I dont need all this workflow stuff - just a 
plain simple job trigger that triggers every 5th minute. I guess I will 
try out something smaller like Quartz Scheduler. It also only have 
HA/cluster support through JDBC (JobStore) but I guess I could fairly 
easy make a HDFSFilesJobStore which still hold the properties so that 
Quartz clustering works.


But what I would really like to have is a scheduling framework that is 
HA out-of-the-box. Guess Oozie is not the solution for me. Anyone knows 
about other frameworks?

Oozie consists of 2 services, a SQL database to store the Oozie jobs state
and a servlet container where Oozie app proper runs.

The solution for HA for the database, well, it is left to the database. This
means, you'll have to get an HA DB.
  
I would really like to avoid having to run a relational database. 
Couldnt I just do the persistence of Oozie jobs state in files on HDFS?

The solution for HA for the Oozie app is deploying the servlet container
with the Oozie app in more than one box (2 or 3); and front them by a HTTP
load-balancer.

The missing part is that the current Oozie lock-service is currently an
in-memory implementation. This should be replaced with a Zookeeper
implementation. Zookeeper could run externally or internally in all Oozie
servers. This is what was prototyped long ago.
  
Yes but if I have to do ZooKeeper stuff I could just do the scheduler 
myself and make run no all/many boxes. The only hard part about it is 
the "locking" thing that makes sure only one job-triggering happens in 
the entire cluster when only one job-triggering is supposed to happen, 
and that the job-triggering happens no matter how many machines might be 
down.

Thanks.

Alejandro


On Thu, Sep 1, 2011 at 4:14 AM, Ronen Itkin  wrote:

  

If I get you right you are asking about Installing Oozie as Distributed
and/or HA cluster?!
In that case I am not familiar with an out of the box solution by Oozie.
But, I think you can made up a solution of your own, for example:
Installing Oozie on two servers on the same partition which will be
synchronized by DRBD.
You can trigger a "failover" using linux Heartbeat and that way maintain a
virtual IP.





On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen 
wrote:



Hi

Thanks a lot for pointing me to Oozie. I have looked a little bit into
Oozie and it seems like the "component" triggering jobs is called
"Coordinator Application". But I really see nowhere that this Coordinator
Application doesnt just run on a single machine, and that it will
  

therefore


not trigger anything if this machine is down. Can you confirm that the
"Coordinator Application"-role is distributed in a distribued Oozie
  

setup,


so that jobs gets triggered even if one or two machines are down?

Regards, Per Steffensen

Ronen Itkin skrev:

 Hi
  

Try to use Oozie for job coordination and work flows.



On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen 
wrote:





Hi

I use hadoop for a MapReduce job in my system. I would like to have the
job
run very 5th minute. Are there any "distributed" timer job stuff in
hadoop?
Of course I could setup a timer in an external timer framework (CRON or
something like that) that invokes the MapReduce job. But CRON is only
running on one particular machine, so if that machine goes down my job
will
not be triggered. Then I could setup the timer on all or many machines,
but
I would not like the job to be run in more than one instance every 5th
minute, so then the timer jobs would need to coordinate who is actually
starting the job "this time" and all the rest would just have to do
nothing.
Guess I could come up with a solution to that - e.g. writing some
  

"lock"


stuff using HDFS files or by using ZooKeeper. But I would really like
  

if


someone had already solved the problem, and provided some kind of a
"distributed timer framework" running in a "cluster", so that I could
just
register a timer job with the cluster, and then be sure that it is
invoked
every 5th minute, no matter if one or two particular machines in the
cluster
is down.

Any suggestions are very welcome.

Regards, Per Steffensen



  





  

--
*
Ronen Itkin*
Taykey | www.taykey.com




  




Re: Timer jobs

2011-09-01 Thread Per Steffensen
Well I am not sure I get you right, but anyway, basically I want a timer 
framework that triggers my jobs. And the triggering of the jobs need to 
work even though one or two particular machines goes down. So the "timer 
triggering mechanism" has to live in the cluster, so to speak. What I 
dont want is that the timer framework are driven from one particular 
machine, so that the triggering of jobs will not happen if this 
particular machine goes down. Basically if I have e.g. 10 machines in a 
Hadoop cluster I will be able to run e.g. MapReduce jobs even if 3 of 
the 10 machines are down. I want my timer framework to also be 
clustered, distributed and coordinated, so that I will also have my 
timer jobs triggered even though 3 out of 10 machines are down.


Regards, Per Steffensen

Ronen Itkin skrev:

If I get you right you are asking about Installing Oozie as Distributed
and/or HA cluster?!
In that case I am not familiar with an out of the box solution by Oozie.
But, I think you can made up a solution of your own, for example:
Installing Oozie on two servers on the same partition which will be
synchronized by DRBD.
You can trigger a "failover" using linux Heartbeat and that way maintain a
virtual IP.





On Thu, Sep 1, 2011 at 1:59 PM, Per Steffensen  wrote:

  

Hi

Thanks a lot for pointing me to Oozie. I have looked a little bit into
Oozie and it seems like the "component" triggering jobs is called
"Coordinator Application". But I really see nowhere that this Coordinator
Application doesnt just run on a single machine, and that it will therefore
not trigger anything if this machine is down. Can you confirm that the
"Coordinator Application"-role is distributed in a distribued Oozie setup,
so that jobs gets triggered even if one or two machines are down?

Regards, Per Steffensen

Ronen Itkin skrev:

 Hi


Try to use Oozie for job coordination and work flows.



On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen 
wrote:



  

Hi

I use hadoop for a MapReduce job in my system. I would like to have the
job
run very 5th minute. Are there any "distributed" timer job stuff in
hadoop?
Of course I could setup a timer in an external timer framework (CRON or
something like that) that invokes the MapReduce job. But CRON is only
running on one particular machine, so if that machine goes down my job
will
not be triggered. Then I could setup the timer on all or many machines,
but
I would not like the job to be run in more than one instance every 5th
minute, so then the timer jobs would need to coordinate who is actually
starting the job "this time" and all the rest would just have to do
nothing.
Guess I could come up with a solution to that - e.g. writing some "lock"
stuff using HDFS files or by using ZooKeeper. But I would really like if
someone had already solved the problem, and provided some kind of a
"distributed timer framework" running in a "cluster", so that I could
just
register a timer job with the cluster, and then be sure that it is
invoked
every 5th minute, no matter if one or two particular machines in the
cluster
is down.

Any suggestions are very welcome.

Regards, Per Steffensen








  




  




Re: Timer jobs

2011-09-01 Thread Per Steffensen

Vitalii Tymchyshyn skrev:

01.09.11 18:14, Per Steffensen написав(ла):
Well I am not sure I get you right, but anyway, basically I want a 
timer framework that triggers my jobs. And the triggering of the jobs 
need to work even though one or two particular machines goes down. So 
the "timer triggering mechanism" has to live in the cluster, so to 
speak. What I dont want is that the timer framework are driven from 
one particular machine, so that the triggering of jobs will not 
happen if this particular machine goes down. Basically if I have e.g. 
10 machines in a Hadoop cluster I will be able to run e.g. MapReduce 
jobs even if 3 of the 10 machines are down. I want my timer framework 
to also be clustered, distributed and coordinated, so that I will 
also have my timer jobs triggered even though 3 out of 10 machines 
are down.

Hello.

AFAIK now you still have HDFS NameNode and as soon as NameNode is down 
- your cluster is down. So, putting scheduling on the same machine as 
NameNode won't make you cluster worse in terms of SPOF (at least for 
HW failures).


Best regards, Vitalii Tymchyshyn


I believe this is why there is also a secondary namenode. But with two 
namenodes it is still to centralized in my opinion, but guess Hadoop 
people know that, and that the namenode-role will be even more 
distributed in the future. But that does not change the fact that I 
would like to have a real distributed clustered scheduler.


Re: Timer jobs

2011-09-02 Thread Per Steffensen

Vitalii Tymchyshyn skrev:

01.09.11 21:55, Per Steffensen написав(ла):

Vitalii Tymchyshyn skrev:

Hello.

AFAIK now you still have HDFS NameNode and as soon as NameNode is 
down - your cluster is down. So, putting scheduling on the same 
machine as NameNode won't make you cluster worse in terms of SPOF 
(at least for HW failures).


Best regards, Vitalii Tymchyshyn


I believe this is why there is also a secondary namenode. 


Hello.

Not at all. Secondary name node is not even a hot standby. You HDFS 
cluster address is namenode:port and no one who connects with it knows 
about secondary name node, so it's not a HA solution.
AFAIR secondary name node even is not a backup, but simply a tools to 
help main name node to process transaction logs at a scheduled 
fashion. 0.21 has backup name node, but 0.21 is unstable and it's 
backup node does not work (tried it). For 0.20 the backup solution 
mentioned in the docs is to have a NFS mount on name node and specify 
it as a secondary name node data directory.


Best regards, Vitalii Tymchyshyn.


Hmm, then I believe Hadoop has a serious HA problem built-in. That is 
not so smart when most of it is about doing HA. But I guess work is 
going on to solve that - in 0.21 and further forward. But thanks for you 
explanation.


Regards, Per Steffensen


Re: hadoop+lucene

2011-09-12 Thread Per Steffensen
I am not sure exactly what you want to do, but maybe you want to have a 
look at products like elasticsearch, solandra, solr, sphinx etc.


27g skrev:

I wanna use hadoop/contrib/index to create a distrabute lucene index on
hadoop ,who can help me by giving me the sourcecode of the  
hadoop/contrib/index(hadoop 0.20.2),Thank you very much !
(PS:My English is very poor ,sorry) 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/hadoop-lucene-tp3331449p3331449.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


  




Routing and region deletes

2011-12-08 Thread Per Steffensen

Hi

The system we are going to work on will receive 50mio+ new datarecords 
every day. We need to keep a history of 2 years of data (thats 35+ 
billion datarecords in the storage all in all), and that basically means 
that we also need to delete 50mio+ datarecords every day, or e.g. 1,5 
billion every month. We plan to store the datarecords in HBase.


Is it somehow possible to tell HBase to put (route) all datarecords 
belonging to a specific date or month to a designated set of regions 
(and route nothing else there), so that deleting all data belonging to 
that day/month i basically deleting those regions entirely? And is 
explicit deletion of entire regions possible at all?


The reason I want to do this is that I expect it to be much faster than 
doing explicit deletion record by record of 50mio+ records every day.


Regards, Per Steffensen




Re: Routing and region deletes

2011-12-08 Thread Per Steffensen

Thanks for your reply!

Michel Segel skrev:

Per Seffensen,

I would urge you to step away from the keyboard and rethink your design.
  
Will do :-) But would actually still like to receive answers for my 
questions - just pretend that my ideas are not so stupid and let me know 
if it can be done

It sounds like you want to replicate a date partition model similar to what you 
would do if you were attempting this with HBase.

HBase is not a relational database and you have a different way of doing things.
  

I know

You could put the date/time stamp in the key such that your data is sorted by 
date.
  
But I guess that would not guarantee that records with timestamps from a 
specific day or month all exist in the same set of regions and that 
records with timestamps from other days or months all exist outside 
those regions, so that I can delete records from that day or month, just 
by deleting the regions.

However, this would cause hot spots.  Think about how you access the data. It 
sounds like you access the more recent data more frequently than historical 
data.
Not necessarily wrt reading, but certainly I (almost) only write new 
records with timestamps from the current day/month.

  This is a bad idea in HBase.
(note: it may still make sense to do this ... You have to think more about the 
data and consider alternatives.)

I personally would hash the key for even distribution, again depending on the 
data access pattern.  (hashed data means you can't do range queries but again, 
it depends on what you are doing...)

You also have to think about how you purge the data. You don't just drop a 
region.
I know that this is not the "default" way of deleting data, but it is 
possible? Believe a region is basically just a folder with a set of 
files and deleting those would be a matter of a few ms. So if I can 
route all records with timestamps from a certain day or month to a 
designated set of regions, deleting all those records will be a matter 
of deleting #regions-in-that-set folders on disk - very quick. The 
alternative is to do 50mio+ single delete operations every day (or 1,5 
billion operations every month), and that will not even free up space 
immediately since the records will actually just be marked deleted (in a 
new file) - space will not be freed before next compaction of the 
involved regions (see e.g. http://outerthought.org/blog/465-ot.html).

 Doing a full table scan once a month to delete may not be a bad thing.
But I dont believe one full table scan will be enough. For that to be 
possible, at least I would have to be able to provide HBase with all 1,5 
billion records to delete in one "delete"-call - thats probably not 
possible :-)

 Again it depends on what you are doing...

Just my opinion. Others will have their own... Now I'm stepping away from the 
keyboard to get my morning coffee...
  

Enjoy. Then I will consider leaving work (its late afternoon in Europe)

:-)


Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 8, 2011, at 7:13 AM, Per Steffensen  wrote:

  

Hi

The system we are going to work on will receive 50mio+ new datarecords every 
day. We need to keep a history of 2 years of data (thats 35+ billion 
datarecords in the storage all in all), and that basically means that we also 
need to delete 50mio+ datarecords every day, or e.g. 1,5 billion every month. 
We plan to store the datarecords in HBase.

Is it somehow possible to tell HBase to put (route) all datarecords belonging 
to a specific date or month to a designated set of regions (and route nothing 
else there), so that deleting all data belonging to that day/month i basically 
deleting those regions entirely? And is explicit deletion of entire regions 
possible at all?

The reason I want to do this is that I expect it to be much faster than doing 
explicit deletion record by record of 50mio+ records every day.

Regards, Per Steffensen






  




Re: Routing and region deletes

2011-12-08 Thread Per Steffensen
Ahhh stupid me. I probably just want to use different tables for 
different days/months. Believe tables can fairly quickly be deleted on 
HBase?


Regards, Per Steffensen

Per Steffensen skrev:

Thanks for your reply!

Michel Segel skrev:

Per Seffensen,

I would urge you to step away from the keyboard and rethink your design.
  
Will do :-) But would actually still like to receive answers for my 
questions - just pretend that my ideas are not so stupid and let me 
know if it can be done
It sounds like you want to replicate a date partition model similar 
to what you would do if you were attempting this with HBase.


HBase is not a relational database and you have a different way of 
doing things.
  

I know
You could put the date/time stamp in the key such that your data is 
sorted by date.
  
But I guess that would not guarantee that records with timestamps from 
a specific day or month all exist in the same set of regions and that 
records with timestamps from other days or months all exist outside 
those regions, so that I can delete records from that day or month, 
just by deleting the regions.
However, this would cause hot spots.  Think about how you access the 
data. It sounds like you access the more recent data more frequently 
than historical data.
Not necessarily wrt reading, but certainly I (almost) only write new 
records with timestamps from the current day/month.

  This is a bad idea in HBase.
(note: it may still make sense to do this ... You have to think more 
about the data and consider alternatives.)


I personally would hash the key for even distribution, again 
depending on the data access pattern.  (hashed data means you can't 
do range queries but again, it depends on what you are doing...)


You also have to think about how you purge the data. You don't just 
drop a region.
I know that this is not the "default" way of deleting data, but it is 
possible? Believe a region is basically just a folder with a set of 
files and deleting those would be a matter of a few ms. So if I can 
route all records with timestamps from a certain day or month to a 
designated set of regions, deleting all those records will be a matter 
of deleting #regions-in-that-set folders on disk - very quick. The 
alternative is to do 50mio+ single delete operations every day (or 1,5 
billion operations every month), and that will not even free up space 
immediately since the records will actually just be marked deleted (in 
a new file) - space will not be freed before next compaction of the 
involved regions (see e.g. http://outerthought.org/blog/465-ot.html).

 Doing a full table scan once a month to delete may not be a bad thing.
But I dont believe one full table scan will be enough. For that to be 
possible, at least I would have to be able to provide HBase with all 
1,5 billion records to delete in one "delete"-call - thats probably 
not possible :-)

 Again it depends on what you are doing...

Just my opinion. Others will have their own... Now I'm stepping away 
from the keyboard to get my morning coffee...
  

Enjoy. Then I will consider leaving work (its late afternoon in Europe)

:-)


Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 8, 2011, at 7:13 AM, Per Steffensen  wrote:

 

Hi

The system we are going to work on will receive 50mio+ new 
datarecords every day. We need to keep a history of 2 years of data 
(thats 35+ billion datarecords in the storage all in all), and that 
basically means that we also need to delete 50mio+ datarecords every 
day, or e.g. 1,5 billion every month. We plan to store the 
datarecords in HBase.


Is it somehow possible to tell HBase to put (route) all datarecords 
belonging to a specific date or month to a designated set of regions 
(and route nothing else there), so that deleting all data belonging 
to that day/month i basically deleting those regions entirely? And 
is explicit deletion of entire regions possible at all?


The reason I want to do this is that I expect it to be much faster 
than doing explicit deletion record by record of 50mio+ records 
every day.


Regards, Per Steffensen






  







HBase/HDFS very high iowait

2012-02-22 Thread Per Steffensen

Hi

We have a system a.o. with a HBase cluster and a HDFS cluster (primarily 
for HBase persistence). Depending on the environment we have between 3 
and 8 machine running a HBase RegionServer and a HDFS DataNode. OS is 
Ubuntu 10.04. On those machine we see very high iowait and very little 
"real usage" of the CPU, and unexpected low throughput (HBase creates, 
updates, reads and short scans). We do not get more throughput by 
putting more parallel load from the HBase clients on the HBase servers, 
so it is a "real" iowait problem. Any idea what might be wrong, and what 
we can do to improve throughput and lower iowait.


Regards, Per Steffensen
A few dumps
--- jps ---
19498 DataNode
19690 HRegionServer
19327 SecondaryNameNode

 typical top ---
top - 11:13:21 up 14 days, 18:20, 1 user, load average: 4.83, 4.50, 4.25
Tasks: 99 total, 1 running, 98 sleeping, 0 stopped, 0 zombie
Cpu(s): 14.1%us, 4.3%sy, 0.0%ni, 5.4%id, 74.8%wa, 0.0%hi, 1.3%si, 0.0%st
Mem: 7133800k total, 7099632k used, 34168k free, 55540k buffers
Swap: 487416k total, 248k used, 487168k free, 2076804k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
19690 hbase 20 0 4629m 4.2g 9244 S 51 61.7 194:08.84 java
19498 hdfs 20 0 1030m 116m 9076 S 16 1.7 75:29.26 java

 iostat -kd 1 
root@edrxen1-2:~# iostat -kd 1
Linux 2.6.32-29-server (edrxen1-2) 02/22/2012 _x86_64_ (2 CPU)
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 3.53 3.36 15.66 4279502 19973226
dm-0 319.44 6959.14 422.37 8876213913 538720280
dm-1 0.00 0.00 0.00 912 624
xvdb 229.03 6955.81 406.71 8871957888 518747772
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00 0 0
dm-0 122.00 3852.00 0.00 3852 0
dm-1 0.00 0.00 0.00 0 0
xvdb 105.00 3252.00 0.00 3252 0
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 0.00 0.00 0.00 0 0
dm-0 57.00 1712.00 0.00 1712 0
dm-1 0.00 0.00 0.00 0 0
xvdb 78.00 2428.00 0.00 2428 0

--- free -o 
total used free shared buffers cached
Mem: 7133800 7099452 34348 0 55612 2082364
Swap: 487416 248 487168


Re: HBase/HDFS very high iowait

2012-02-22 Thread Per Steffensen
Observe about 50% iowait before even starting clients - that is when 
there is actually no load from clients on the system. So only "internal" 
stuff in HBase/HDFS can cause this - HBase compaction? HDFS?


Regards, Per Steffensen

Per Steffensen skrev:

Hi

We have a system a.o. with a HBase cluster and a HDFS cluster 
(primarily for HBase persistence). Depending on the environment we 
have between 3 and 8 machine running a HBase RegionServer and a HDFS 
DataNode. OS is Ubuntu 10.04. On those machine we see very high iowait 
and very little "real usage" of the CPU, and unexpected low throughput 
(HBase creates, updates, reads and short scans). We do not get more 
throughput by putting more parallel load from the HBase clients on the 
HBase servers, so it is a "real" iowait problem. Any idea what might 
be wrong, and what we can do to improve throughput and lower iowait.


Regards, Per Steffensen




Re: HBase/HDFS very high iowait

2012-02-22 Thread Per Steffensen

Per Steffensen skrev:
Observe about 50% iowait before even starting clients - that is when 
there is actually no load from clients on the system. So only 
"internal" stuff in HBase/HDFS can cause this - HBase compaction? HDFS?
Ahh ok, that was only for half a minute after restart. So basically down 
to 100% idle when no load from clients.


Regards, Per Steffensen

Per Steffensen skrev:

Hi

We have a system a.o. with a HBase cluster and a HDFS cluster 
(primarily for HBase persistence). Depending on the environment we 
have between 3 and 8 machine running a HBase RegionServer and a HDFS 
DataNode. OS is Ubuntu 10.04. On those machine we see very high 
iowait and very little "real usage" of the CPU, and unexpected low 
throughput (HBase creates, updates, reads and short scans). We do not 
get more throughput by putting more parallel load from the HBase 
clients on the HBase servers, so it is a "real" iowait problem. Any 
idea what might be wrong, and what we can do to improve throughput 
and lower iowait.


Regards, Per Steffensen