Hello, Stack > Of course I will insert less rows per second in > > case of 25Kb, but throughput should stay the same. Now I'm trying to run > > several instances of client each of them inserts 100K records (each > record > > is 25Kb). Time of execution grows for each client. > > > > > > > > > > In general, our client ain't to good at multiplexing because of such as > > the > > > above noted limitation (our client does not yet do nio). If you want > to > > > test cluster performance, run multiple concurrent clients each to its > own > > > process. MapReduce is good for doing this. See the > > PerformanceEvaluation > > > code for a sample MR job that floats many clients doing different > loading > > > types. > > > > > > > MapReduce is good idea, but actually we don't have data which is located > in > > hadoop, we processes data in realtime and insert it into hbase. So I > think > > it will be inefficient to write our data in hadoop and then run MapReduce > > work which will insert that data into the tables. > > > > > Agreed. Was just suggesting it as a way of parallellizing clients. I > presume that the source of the data feed is multiple, that you can run > multiple instances of your upload process? >
Yes, I think I can run multiple instances of uploader. > > > > > > Time with several clients is growing. For example when I'm running four > > processes, each of them have one inserter thread I got following results: > > 1) Thread-1 have finished its work in 189 sec > > 2) Thread-1 have finished its work in 198 sec > > 3) Thread-1 have finished its work in 206 sec > > 4) Thread-1 have finished its work in 208 sec > > I.e. each next process works longer than previous. It was timings for > test > > where each process inserts 100K 25Kb rows with WAL on. Btw WAL have great > > impact on performance when I increase size of row. I have about 80 sec > for > > this test with WAL off. Also when running several clients nodes seems > still > > almost idle. > > > > Oh, how many regions in your cluster? At the start, all clients will be > hitting a single region (and thus a single server). Check your master > console at port 60010. > > You could rerun a second upload just after a first upload. As I said I have 6 nodes except master node and each node has 235 regions. 1406 regions total. And throughput without WAL is about 50 Mb/sec and about 15 Mb/sec with WAL on. When I run clients in serial order (i.e. at the moment there is only one working script) time almost stable and not grows. > See what the > numbers are like uploading into a table that is pre-split? Sorry, what you mean pre-split? You mean splitting regions before running script? -- Regards, Lyfar Dmitriy
