On Apr 30, 2010, at 4:44 PM, Jean-Daniel Cryans wrote: > On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas <c...@email.com> wrote: >> >> >> I'm also using thrift to connect and am wondering if that itself puts an >> overall limit on scaling? It does seem that no matter how many more mappers >> and servers I add, even without indexing, I am capped at about 5k rows/sec >> total. I'm waiting a bit as the table grows so that it is split across more >> regionservers, hopefully that will help, but as far as I can tell I am not >> hitting any CPU or IO constraint during my tests. > > I don't understand the "I'm also using thrift" and "how many more > mappers" part, you are using Thrift inside a map? Anyways, more > clients won't help since there's a single mega serialization of all > the inserts to the index table per region server. It's normal not to > see any CPU/mem/IO contention since, in this case, it's all about the > speed at which you can process a single row insertion The rest of the > threads just wait... >
Sorry - should have been more clear. I'm testing now with a normal tables and regionservers and I seem to cap out at about 5-7k rows a second for inserts. My method for doing inserts is to use map reduce on hadoop to launch many insert processes, each process uses the local thrift server on each node to connect to hbase. In this case I hope that other threads can insert at the same time. -chris