When the client does group commits does it group by row key or region server?
On Sun, Jun 28, 2009 at 12:08 AM, Ryan Rawson<[email protected]> wrote: > I imported 9b rows in 5 days or so, a few minor crashes, average speed > between 50-200 k ops/sec. The client needs some love to make it more > efficient on grouping commits during bulk upload. > > On Jun 27, 2009 4:02 PM, "Andrew Purtell" <[email protected]> wrote: > > Test: > > - Latest trunk. > > - Config modified only with a store file split threshold of 1GB > > - 4 node testbed: > 1) namenode, datanode, hmaster, heritrix, jobtracker > 2) datanode, regionserver, heritrix, tasktracker, mapper (2) > 3) datanode, regionserver, heritrix, tasktracker, mapper (2) > 4) datanode, regionserver, heritrix, tasktracker, mapper (2) > > - 100 heritrix threads - 4 hosts, 25 threads each - feeding in ~5MB/sec > average new edits > > - 2 mappers x 3 hosts processing new edits and writing back > serialized/compressed Documents > > - 3K average transactions/sec reported by master > > - 'hadoop balancer -threshold 0.1' > > - 1 hour test run > > Result: > > Passed with no incidents! > > - Andy >
