Re: Spark-HBase connector

2014-12-19 Thread Mukesh Jha
Thanks Stack, looks promising will give it a try. On Fri, Dec 19, 2014 at 3:28 AM, Stack st...@duboce.net wrote: On Tue, Dec 16, 2014 at 10:52 AM, Stack st...@duboce.net wrote: On Sun, Dec 14, 2014 at 10:49 PM, Mukesh Jha me.mukesh@gmail.com wrote: Hello Experts, I've come

Re: Region Server Thread with a Single High Idle CPU

2014-12-19 Thread uamadman
Yes, I tested the following by restarting the cluster and waiting approximately 5-10 minutes for its initial ramp up. There are no clients asking for data. In the following example KVM15 was randomly assigned to serve the META Table. root@KVM15:~# lsof -n | grep :60020- | sed 's/.*-//;s/:.*//' |

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Pradeep Gollakota
Hi Aaron, Just out of curiosity, have you considered using asynchbase? https://github.com/OpenTSDB/asynchbase On Fri, Dec 19, 2014 at 9:00 AM, Nick Dimiduk ndimi...@apache.org wrote: Hi Aaron, Your analysis is spot on and I do not believe this is by design. I see the write buffer is owned

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Andrew Purtell
I believe HTableMultiplexer[1] is meant to stand in for HTablePool for buffered writing. FWIW, I've not used it. 1: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableMultiplexer.html On Fri, Dec 19, 2014 at 9:00 AM, Nick Dimiduk ndimi...@apache.org wrote: Hi Aaron, Your

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Aaron Beppu
Nick : Thanks, I've created an issue [1]. Pradeep : Yes, I have considered using that. However for the moment, we've set it out of scope, since our migration from 0.94 - 0.98 is already a bit complicated, and we hoped to separate isolate these changes by not moving to the async client until after

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Andrew Purtell
Aaron: Please post a copy of that feedback on the JIRA, pretty sure we will be having an improvement discussion there. On Fri, Dec 19, 2014 at 10:58 AM, Aaron Beppu abe...@siftscience.com wrote: Nick : Thanks, I've created an issue [1]. Pradeep : Yes, I have considered using that. However for

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Nick Dimiduk
Thanks for the reminder about the Multiplexer, Andrew. It sort-of solves this problem, but think it's semantics of dropping writes are not desirable in the general case. Further, my understanding was that the new connection implementation is designed to handle this kind of use-case (hence cc'ing

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Andrew Purtell
I don't like the dropped writes either. Just pointing out what we have now. There is a gap no doubt. On Fri, Dec 19, 2014 at 11:16 AM, Nick Dimiduk ndimi...@apache.org wrote: Thanks for the reminder about the Multiplexer, Andrew. It sort-of solves this problem, but think it's semantics of

Re: Region Server Thread with a Single High Idle CPU

2014-12-19 Thread Esteban Gutierrez
Hi Jon, Do you see something interesting in the RS logs from KVM15 or the HBase Master? one possibility is that if there are no requests to META coming from the Thrift server or external clients, then it might be possible that one or many region servers for some reason are updating META too

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Solomon Duskis
Is this critical to sort out before 1.0, or is fixing this a post-1.0 enhancement? -Solomon On Fri, Dec 19, 2014 at 2:19 PM, Andrew Purtell apurt...@apache.org wrote: I don't like the dropped writes either. Just pointing out what we have now. There is a gap no doubt. On Fri, Dec 19, 2014 at

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Andrew Purtell
I think it would be critical if we're contemplating something that requires a breaking API change? Do we have that here? I'm not sure. On Fri, Dec 19, 2014 at 12:02 PM, Solomon Duskis sdus...@gmail.com wrote: Is this critical to sort out before 1.0, or is fixing this a post-1.0 enhancement?

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Solomon Duskis
My first thought based on this discussion was that it would require moving some methods (setAutoFlush() and setWriteBufferSize()) from Table to Connection. That would be a breaking API change. -Solomon On Fri, Dec 19, 2014 at 3:04 PM, Andrew Purtell apurt...@apache.org wrote: I think it would

HBase - bulk loading files

2014-12-19 Thread Rama Ramani
Hello, I am bulk loading a set of files (about 400MB each) with | as the delimiter using ImportTsv. It takes a long time for the 'map' job to complete on both a 4 node and a 16 node cluster. I tried the option to generate the output (providing -Dimporttsv.bulk.output) which took time

Re: HBase - bulk loading files

2014-12-19 Thread Ted Yu
Can you let us know the HBase and hadoop versions you're using ? Were the clusters taking load from other sources when ImportTsv was running ? Cheers On Fri, Dec 19, 2014 at 1:43 PM, Rama Ramani rama.ram...@live.com wrote: Hello, I am bulk loading a set of files (about 400MB each)

RE: HBase - bulk loading files

2014-12-19 Thread Rama Ramani
0.98.0.2.1.9.0-2196-hadoop2Hadoop 2.4.0.2.1.9.0-2196Subversion g...@github.com:hortonworks/hadoop-monarch.git -r cb50542bc92fb77dee52 No, the clusters were not taking additional load. ThanksRama Date: Fri, 19 Dec 2014 13:50:30 -0800 Subject: Re: HBase - bulk loading files From:

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Stack
On Fri, Dec 19, 2014 at 12:20 PM, Solomon Duskis sdus...@gmail.com wrote: My first thought based on this discussion was that it would require moving some methods (setAutoFlush() and setWriteBufferSize()) from Table to Connection. That would be a breaking API change. This will mean a bunch

Re: Efficient use of buffered writes in a post-HTablePool world?

2014-12-19 Thread Nick Dimiduk
Could be in an API-compatible way, though semantics would change, which is probably worse. Table keeps these methods. When setAutoFlush is used, write buffer managed by connection is created. If multiple Table instances for the same table setWriteBufferSize(), perhaps the largest value wins.

ACLs/Quotas for HBase structures

2014-12-19 Thread Manoj Murumkar
Folks, We are trying to control space usage and manage security at HBase namespace level. Think of it in terms of a RDBMS (database and superuser for a database). Is there a simple way to do this? This is what I have in mind. Does it make sense? - Space quotas: Namespace is managed under

Re: ACLs/Quotas for HBase structures

2014-12-19 Thread Esteban Gutierrez
Hello Manoj, Thats a very interesting requirement, unfortunately the existing HBase directory structure needs to be owned by the user that started HBase (usually the 'hbase' user) and HBase will handle all the permissions and ACL rules without exposing details from HDFS to the client API. Even if

Re: ACLs/Quotas for HBase structures

2014-12-19 Thread Ted Yu
Manoj: HBASE-8410 is under active development. If you have time, please go over the feature to see if it fits your need. Cheers On Dec 19, 2014, at 11:42 PM, Esteban Gutierrez este...@cloudera.com wrote: Hello Manoj, Thats a very interesting requirement, unfortunately the existing HBase