Re: EndPoint Coprocessor could be dealocked?

2012-05-17 Thread fding hbase
Hi Michel,
On Fri, May 18, 2012 at 1:39 AM, Michael Segel wrote:

> > You should not let just any user run coprocessors on the server. That's
> madness.
> >
> > Best regards,
> >
> >- Andy
>
> Fei Ding,
>
> I'm a little confused.
> Are you trying to solve the problem of querying  data efficiently from a
> table, or are you trying to find an example of where and when  to use
> co-processors?
>
>
I'm trying to solve the problem of querying data efficiently. Coprocessor
is one of the possible solutions that I've tried.


> You actually have an interesting problem that isn't easily solved in
> relational databases, but I don't think its an appropriate problem if you
> want to stress the use of coprocessors.
>
> Yes with Indexes you want to use coprocessors as a way to keep the index
> in synch with the underlying table.
>
> However beyond that... the solution is really best run as a M/R job.
>
> Considering that HBase has two different access methods. One is as part of
> M/R jobs, the other is a client/server model.  If you wanted to, you could
> create a service/engine/app that would allow you to efficiently query and
> return result sets from your database, as well as manage indexes.
> In part, coprocessors make this a lot easier.
>

I'm not using the coprocessors to maintain index tables, but using extended
client to do this.


>
> If you consider the general flow of my solution earlier in this thread,
> you now have a really great way to implement this.
>
> Note: we're really talking about allowing someone to query data from a
> table using multiple indexes and index types. Think alternate table
> (key/value pair) , Lucene/SOLR, and GeoSpatial.
>
> You could even bench mark it against an Oracle implementation, and
> probably smoke it.
> You could also do efficient joins between tables.
>
> So yeah, I would encourage you to work on your initial problem... ;-)
>
>
Alternate table is also one of the possible solutions, however, it's not
that easy too.  I'm still working on it. ;-)

-- 

Best Regards!

Fei Ding
fding.chu...@gmail.com


Re: Trailer 'header' is wrong; does the trailer size match content

2012-05-17 Thread Something Something
HBase Version:  hbase-0.90.4-cdh3u3

Hadoop Version:  hadoop-0.20.2-cdh3u2


12/05/17 16:37:47 ERROR mapreduce.LoadIncrementalHFiles: IOException during
splitting
java.util.concurrent.ExecutionException: java.io.IOException: Trailer
'header' is wrong; does the trailer size match content?
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplitPhase(LoadIncrementalHFiles.java:333)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:233)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.io.IOException: Trailer 'header' is wrong; does the trailer
size match content?
at
org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1527)
at
org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:885)
at
org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Exception in thread "main" java.io.IOException: Trailer 'header' is wrong;
does the trailer size match content?
at
org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1527)
at
org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:885)
at
org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323)
at
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)




On Thu, May 17, 2012 at 4:55 PM, Ted Yu  wrote:

> Can you post the complete message ?
>
> What HBase version are you using ?
>
> On Thu, May 17, 2012 at 4:48 PM, Something Something <
> mailinglist...@gmail.com> wrote:
>
> > Hello,
> >
> > I keep getting this message while running the 'completebulkload' process.
> > I tried the following solutions that I came across while Googling for
> this
> > error:
> >
> > 1)  setReduceSpeculativeExecution(true)
> >
> > 2)  Made sure that none of the tasks are failing.
> >
> > 3)  The HFileOutput job runs successfully.
> >
> > 4)  The first 2 lines in the output from HFileOutput look like this:
> >
> > 2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 3437
> > 2a 34 39 39 39   row=+9L99/++MWT7f2+2*1*153347*4999,
> > families={(family=info,
> >
> >
> keyvalues=(+9L99/++MWT7f2+2*1*153347*4999/info:frequency/922337203685

Re: Trailer 'header' is wrong; does the trailer size match content

2012-05-17 Thread Ted Yu
Can you post the complete message ?

What HBase version are you using ?

On Thu, May 17, 2012 at 4:48 PM, Something Something <
mailinglist...@gmail.com> wrote:

> Hello,
>
> I keep getting this message while running the 'completebulkload' process.
> I tried the following solutions that I came across while Googling for this
> error:
>
> 1)  setReduceSpeculativeExecution(true)
>
> 2)  Made sure that none of the tasks are failing.
>
> 3)  The HFileOutput job runs successfully.
>
> 4)  The first 2 lines in the output from HFileOutput look like this:
>
> 2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 34 37
> 2a 34 39 39 39   row=+9L99/++MWT7f2+2*1*153347*4999,
> families={(family=info,
>
> keyvalues=(+9L99/++MWT7f2+2*1*153347*4999/info:frequency/9223372036854775807/Put/vlen=1)}
> 2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 34 38
> 2a 34 39 39 39   row=+9L99/++MWT7f2+2*1*153348*4999,
> families={(family=info,
>
> keyvalues=(+9L99/++MWT7f2+2*1*153348*4999/info:frequency/9223372036854775807/Put/vlen=1)}
>
>
> 5)  My Mapper for HFileOutput looks like this:
>
>public static class MyMapper extends MapReduceBase implements
> Mapper {
>
>@Override
>public void map(LongWritable key, Text value,
> OutputCollector output, Reporter reporter)
>throws IOException {
>String[] values = value.toString().split("\t");
>String key1 = values[0];
>String value1 = values[1];
>
>ImmutableBytesWritable ibw = new
> ImmutableBytesWritable(key1.getBytes());
>Put put = new Put(Bytes.toBytes(key1));
>put.add(Bytes.toBytes("info"), Bytes.toBytes("frequency"),
> Bytes.toBytes(value1));
>output.collect(ibw, put);
>}
>
>}
>
>
> Any ideas what could be wrong?  Thanks for your help.
>


Trailer 'header' is wrong; does the trailer size match content

2012-05-17 Thread Something Something
Hello,

I keep getting this message while running the 'completebulkload' process.
I tried the following solutions that I came across while Googling for this
error:

1)  setReduceSpeculativeExecution(true)

2)  Made sure that none of the tasks are failing.

3)  The HFileOutput job runs successfully.

4)  The first 2 lines in the output from HFileOutput look like this:

2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 34 37
2a 34 39 39 39   row=+9L99/++MWT7f2+2*1*153347*4999,
families={(family=info,
keyvalues=(+9L99/++MWT7f2+2*1*153347*4999/info:frequency/9223372036854775807/Put/vlen=1)}
2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 34 38
2a 34 39 39 39   row=+9L99/++MWT7f2+2*1*153348*4999,
families={(family=info,
keyvalues=(+9L99/++MWT7f2+2*1*153348*4999/info:frequency/9223372036854775807/Put/vlen=1)}


5)  My Mapper for HFileOutput looks like this:

public static class MyMapper extends MapReduceBase implements
Mapper {

@Override
public void map(LongWritable key, Text value,
OutputCollector output, Reporter reporter)
throws IOException {
String[] values = value.toString().split("\t");
String key1 = values[0];
String value1 = values[1];

ImmutableBytesWritable ibw = new
ImmutableBytesWritable(key1.getBytes());
Put put = new Put(Bytes.toBytes(key1));
put.add(Bytes.toBytes("info"), Bytes.toBytes("frequency"),
Bytes.toBytes(value1));
output.collect(ibw, put);
}

}


Any ideas what could be wrong?  Thanks for your help.


hbase data

2012-05-17 Thread Rita
Hello,

Currently, using hbase to store sensor data -- basically large time series
data hitting close to 2 billion rows for a type of sensor. I was wondering
how hbase differs from HDF (http://www.hdfgroup.org/HDF5/)  file format.
Most of my operations are scanning a range and getting its values but it
seems I can achieve this usind HDF. Does anyone have experience with this
file container format and shed some light?




-- 
--- Get your facts first, then you can distort them as you please.--


Re: client timeouts after upgrading to 0.92

2012-05-17 Thread Jean-Daniel Cryans
This means that the servers aren't responding in 60 seconds to the
clients, I believe this is new from 0.90 so it could be that you were
used to have long-running requests.

If not, check what's going on with those servers at the address given
in the exception message.

J-D

On Thu, May 17, 2012 at 2:35 PM, Viral Bajaria  wrote:
> Hello,
>
> I just upgraded our production cluster from hbase 0.89 (chd3b2, yeah!!) to
> 0.92.1, ever since the upgrade I see a lot of issues with timeouts on my
> clients.
>
> Below are the log dumps from the client and the regionserver that it was
> requesting the data from. I can overcome this exception by increasing
> hbase.rpc.timeout but I doubt that's the right way of solving this issue.
>
> Has anyone faced this issue in hbase 0.92, if yes how did you go about
> solving it ? If not, any pointers on how to start debugging this ?
>
> Thanks,
> Viral
>
> *Client Logs*
> 2012-05-17T19:44:19.191Z Processed 277000, written 277000, writing row for
> 40036669 (this log line is output for every 1000 rows)
> 12/05/17 19:45:19 WARN client.HConnectionManager$HConnectionImplementation:
> Failed all from
> region=platform,40032999,1323868834966.e9f3d644fa843340129355bd9e005903.,
> hostname=elshadoop-c01, port=60020
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to elshadoop-c01/10.16.80.69:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 6 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/10.16.80.30:57245remote=elshadoop-c01/
> 10.16.80.69:60020]
>        at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>        at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943)
>        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:816)
>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:803)
>        at
> com.x.aggregation.hbase.util.BulkTableWriter.commit(BulkTableWriter.java:191)
>        at
> com.x.metrics.ingestmetadata.DataProvider.writePlatformToShowId(DataProvider.java:1799)
>        at com.x.metrics.ingestmetadata.Program$16.run(Program.java:163)
>        at com.x.metrics.ingestmetadata.Program.runAction(Program.java:24)
>        at com.x.metrics.ingestmetadata.Program.main(Program.java:218)
> Caused by: java.net.SocketTimeoutException: Call to elshadoop-c01/
> 10.16.80.69:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 6 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/10.16.80.30:57245remote=elshadoop-c01/
> 10.16.80.69:60020]
>        at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:949)
>        at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:922)
>        at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>        at $Proxy4.multi(Unknown Source)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
>        at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.SocketTimeoutException: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/10.16.80.30:57245remote=elshadoop-c01/
> 10.16.80.69:60020]
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>        at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>        at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>        at

Re: hbase security

2012-05-17 Thread Eugene Koontz
On 5/17/12 1:22 PM, Stack wrote:
> On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz  wrote:
>> http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/
>>
>> http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/
>>
> 
> Anyone interested in porting these over to
> http://blogs.apache.org/hbase/? They have great stuff in them.
> St.Ack

Hi St. Ack,
Thanks for saying so! I'm planning to port mine (the access controls
post) as soon as my Apache Roller account is granted by the Infra folks.
-Eugene


Re: hbase security

2012-05-17 Thread Gary Helmling
I could repost the "up and running with secure hadoop" one.  But it's
kind of out of date at this point.  I remember, back when the site was
still up, getting some comments on it about things that had already
changed in the 0.20.20X releases.

I can take a look and see how bad it is.


On Thu, May 17, 2012 at 1:22 PM, Stack  wrote:
> On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz  wrote:
>> http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/
>>
>> http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/
>>
>
> Anyone interested in porting these over to
> http://blogs.apache.org/hbase/? They have great stuff in them.
> St.Ack


Re: Schedule major compaction programmatically

2012-05-17 Thread Jimmy Xiang
That's why you can't use it. :)  You can hack it around if you want.  Even
better, you can
start to contribute and work on HBASE-6033.

Cheers,
Jimmy

On Thu, May 17, 2012 at 1:23 PM, Chen Song  wrote:

> Sorry for another dump question. As I am querying such information in
> client code, how to get a HRegionServer from a HRegionInfo,
> or HServerAddress?
>
> I found a way to get HRegionInterface shown below.
>
> HConnection.getHRegionConnection(HServerAddress)
>
> But getMetrics method is not exposed on HRegionInterface and only on
> HRegionServer.
> Thanks
> Chen
>
> On Thu, May 17, 2012 at 3:03 PM, Jimmy Xiang  wrote:
>
> > HRegionServer.java:
> >
> >this.metrics.compactionQueueSize.set(compactSplitThread
> >.getCompactionQueueSize());
> >
> > On Thu, May 17, 2012 at 12:00 PM, Chen Song 
> > wrote:
> >
> > > Can you direct me to the API call to get the queue size metrics?
> > >
> > > On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang 
> > wrote:
> > >
> > > > It is an async call to the region server to request a compaction.
>  Once
> > > the
> > > > request is accepted,
> > > > the call returned.  There is no sync call here.  The request is
> queued
> > > and
> > > > processed by a pool
> > > > of threads.
> > > >
> > > > Currently, there is a metric to show the queue size.  But it doesn't
> > tell
> > > > how many are for major,
> > > > and how many are for minor.  The queue size is the number of store
> > files
> > > > pending compact.
> > > >
> > > > As I know, there is no work around for now.
> > > >
> > > > Jimmy
> > > >
> > > >
> > > > On Thu, May 17, 2012 at 11:42 AM, Chen Song 
> > > > wrote:
> > > >
> > > > > Thanks Jimmy. Meanwhile, is there a work around for this?
> > > > >
> > > > > How does compact/major_compact issued from hbase shell handles this
> > > under
> > > > > the hood? Is it eventually calling HBaseAdmin API or HRegion
> > > synchronous
> > > > > API call?
> > > > >
> > > > > Thanks
> > > > > Chen
> > > > >
> > > > > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang 
> > > > wrote:
> > > > >
> > > > > > I am thinking to add a function to check if a table or region in
> > > > > compaction
> > > > > > (major or minor).
> > > > > >
> > > > > > I filed HBASE-6033. It won't show status of a specific compaction
> > > > > request.
> > > > > > Will this help?
> > > > > >
> > > > > > Thanks,
> > > > > > Jimmy
> > > > > >
> > > > > > On Thu, May 17, 2012 at 11:11 AM, Chen Song <
> > chen.song...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I would like to schedule major compaction on a region
> > > > > programmatically. I
> > > > > > > found the API call below which can properly achieve my goal.
> > > > > > >
> > > > > > > HBaseAdmin.majorCompact(String tableOrRegionName)
> > > > > > >
> > > > > > > It turns out to be an asynchronous call and there seems no call
> > > back
> > > > > > > parameter that can be specified. How can I validate the
> > compaction
> > > > > result
> > > > > > > (e.g., success or failure) ?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Chen
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Chen Song
> > > > > Mobile: 518-445-5096
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Chen Song
> > > Mobile: 518-445-5096
> > >
> >
>
>
>
> --
> Chen Song
> Mobile: 518-445-5096
>


Re: Schedule major compaction programmatically

2012-05-17 Thread Chen Song
Sorry for another dump question. As I am querying such information in
client code, how to get a HRegionServer from a HRegionInfo,
or HServerAddress?

I found a way to get HRegionInterface shown below.

HConnection.getHRegionConnection(HServerAddress)

But getMetrics method is not exposed on HRegionInterface and only on
HRegionServer.
Thanks
Chen

On Thu, May 17, 2012 at 3:03 PM, Jimmy Xiang  wrote:

> HRegionServer.java:
>
>this.metrics.compactionQueueSize.set(compactSplitThread
>.getCompactionQueueSize());
>
> On Thu, May 17, 2012 at 12:00 PM, Chen Song 
> wrote:
>
> > Can you direct me to the API call to get the queue size metrics?
> >
> > On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang 
> wrote:
> >
> > > It is an async call to the region server to request a compaction.  Once
> > the
> > > request is accepted,
> > > the call returned.  There is no sync call here.  The request is queued
> > and
> > > processed by a pool
> > > of threads.
> > >
> > > Currently, there is a metric to show the queue size.  But it doesn't
> tell
> > > how many are for major,
> > > and how many are for minor.  The queue size is the number of store
> files
> > > pending compact.
> > >
> > > As I know, there is no work around for now.
> > >
> > > Jimmy
> > >
> > >
> > > On Thu, May 17, 2012 at 11:42 AM, Chen Song 
> > > wrote:
> > >
> > > > Thanks Jimmy. Meanwhile, is there a work around for this?
> > > >
> > > > How does compact/major_compact issued from hbase shell handles this
> > under
> > > > the hood? Is it eventually calling HBaseAdmin API or HRegion
> > synchronous
> > > > API call?
> > > >
> > > > Thanks
> > > > Chen
> > > >
> > > > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang 
> > > wrote:
> > > >
> > > > > I am thinking to add a function to check if a table or region in
> > > > compaction
> > > > > (major or minor).
> > > > >
> > > > > I filed HBASE-6033. It won't show status of a specific compaction
> > > > request.
> > > > > Will this help?
> > > > >
> > > > > Thanks,
> > > > > Jimmy
> > > > >
> > > > > On Thu, May 17, 2012 at 11:11 AM, Chen Song <
> chen.song...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I would like to schedule major compaction on a region
> > > > programmatically. I
> > > > > > found the API call below which can properly achieve my goal.
> > > > > >
> > > > > > HBaseAdmin.majorCompact(String tableOrRegionName)
> > > > > >
> > > > > > It turns out to be an asynchronous call and there seems no call
> > back
> > > > > > parameter that can be specified. How can I validate the
> compaction
> > > > result
> > > > > > (e.g., success or failure) ?
> > > > > >
> > > > > > Thanks
> > > > > > Chen
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Chen Song
> > > > Mobile: 518-445-5096
> > > >
> > >
> >
> >
> >
> > --
> > Chen Song
> > Mobile: 518-445-5096
> >
>



-- 
Chen Song
Mobile: 518-445-5096


Re: hbase security

2012-05-17 Thread Stack
On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz  wrote:
> http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/
>
> http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/
>

Anyone interested in porting these over to
http://blogs.apache.org/hbase/? They have great stuff in them.
St.Ack


Re: Schedule major compaction programmatically

2012-05-17 Thread Jimmy Xiang
HRegionServer.java:

this.metrics.compactionQueueSize.set(compactSplitThread
.getCompactionQueueSize());

On Thu, May 17, 2012 at 12:00 PM, Chen Song  wrote:

> Can you direct me to the API call to get the queue size metrics?
>
> On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang  wrote:
>
> > It is an async call to the region server to request a compaction.  Once
> the
> > request is accepted,
> > the call returned.  There is no sync call here.  The request is queued
> and
> > processed by a pool
> > of threads.
> >
> > Currently, there is a metric to show the queue size.  But it doesn't tell
> > how many are for major,
> > and how many are for minor.  The queue size is the number of store files
> > pending compact.
> >
> > As I know, there is no work around for now.
> >
> > Jimmy
> >
> >
> > On Thu, May 17, 2012 at 11:42 AM, Chen Song 
> > wrote:
> >
> > > Thanks Jimmy. Meanwhile, is there a work around for this?
> > >
> > > How does compact/major_compact issued from hbase shell handles this
> under
> > > the hood? Is it eventually calling HBaseAdmin API or HRegion
> synchronous
> > > API call?
> > >
> > > Thanks
> > > Chen
> > >
> > > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang 
> > wrote:
> > >
> > > > I am thinking to add a function to check if a table or region in
> > > compaction
> > > > (major or minor).
> > > >
> > > > I filed HBASE-6033. It won't show status of a specific compaction
> > > request.
> > > > Will this help?
> > > >
> > > > Thanks,
> > > > Jimmy
> > > >
> > > > On Thu, May 17, 2012 at 11:11 AM, Chen Song 
> > > > wrote:
> > > >
> > > > > I would like to schedule major compaction on a region
> > > programmatically. I
> > > > > found the API call below which can properly achieve my goal.
> > > > >
> > > > > HBaseAdmin.majorCompact(String tableOrRegionName)
> > > > >
> > > > > It turns out to be an asynchronous call and there seems no call
> back
> > > > > parameter that can be specified. How can I validate the compaction
> > > result
> > > > > (e.g., success or failure) ?
> > > > >
> > > > > Thanks
> > > > > Chen
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Chen Song
> > > Mobile: 518-445-5096
> > >
> >
>
>
>
> --
> Chen Song
> Mobile: 518-445-5096
>


Re: Schedule major compaction programmatically

2012-05-17 Thread Chen Song
Can you direct me to the API call to get the queue size metrics?

On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang  wrote:

> It is an async call to the region server to request a compaction.  Once the
> request is accepted,
> the call returned.  There is no sync call here.  The request is queued and
> processed by a pool
> of threads.
>
> Currently, there is a metric to show the queue size.  But it doesn't tell
> how many are for major,
> and how many are for minor.  The queue size is the number of store files
> pending compact.
>
> As I know, there is no work around for now.
>
> Jimmy
>
>
> On Thu, May 17, 2012 at 11:42 AM, Chen Song 
> wrote:
>
> > Thanks Jimmy. Meanwhile, is there a work around for this?
> >
> > How does compact/major_compact issued from hbase shell handles this under
> > the hood? Is it eventually calling HBaseAdmin API or HRegion synchronous
> > API call?
> >
> > Thanks
> > Chen
> >
> > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang 
> wrote:
> >
> > > I am thinking to add a function to check if a table or region in
> > compaction
> > > (major or minor).
> > >
> > > I filed HBASE-6033. It won't show status of a specific compaction
> > request.
> > > Will this help?
> > >
> > > Thanks,
> > > Jimmy
> > >
> > > On Thu, May 17, 2012 at 11:11 AM, Chen Song 
> > > wrote:
> > >
> > > > I would like to schedule major compaction on a region
> > programmatically. I
> > > > found the API call below which can properly achieve my goal.
> > > >
> > > > HBaseAdmin.majorCompact(String tableOrRegionName)
> > > >
> > > > It turns out to be an asynchronous call and there seems no call back
> > > > parameter that can be specified. How can I validate the compaction
> > result
> > > > (e.g., success or failure) ?
> > > >
> > > > Thanks
> > > > Chen
> > > >
> > >
> >
> >
> >
> > --
> > Chen Song
> > Mobile: 518-445-5096
> >
>



-- 
Chen Song
Mobile: 518-445-5096


Re: Schedule major compaction programmatically

2012-05-17 Thread Jimmy Xiang
It is an async call to the region server to request a compaction.  Once the
request is accepted,
the call returned.  There is no sync call here.  The request is queued and
processed by a pool
of threads.

Currently, there is a metric to show the queue size.  But it doesn't tell
how many are for major,
and how many are for minor.  The queue size is the number of store files
pending compact.

As I know, there is no work around for now.

Jimmy


On Thu, May 17, 2012 at 11:42 AM, Chen Song  wrote:

> Thanks Jimmy. Meanwhile, is there a work around for this?
>
> How does compact/major_compact issued from hbase shell handles this under
> the hood? Is it eventually calling HBaseAdmin API or HRegion synchronous
> API call?
>
> Thanks
> Chen
>
> On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang  wrote:
>
> > I am thinking to add a function to check if a table or region in
> compaction
> > (major or minor).
> >
> > I filed HBASE-6033. It won't show status of a specific compaction
> request.
> > Will this help?
> >
> > Thanks,
> > Jimmy
> >
> > On Thu, May 17, 2012 at 11:11 AM, Chen Song 
> > wrote:
> >
> > > I would like to schedule major compaction on a region
> programmatically. I
> > > found the API call below which can properly achieve my goal.
> > >
> > > HBaseAdmin.majorCompact(String tableOrRegionName)
> > >
> > > It turns out to be an asynchronous call and there seems no call back
> > > parameter that can be specified. How can I validate the compaction
> result
> > > (e.g., success or failure) ?
> > >
> > > Thanks
> > > Chen
> > >
> >
>
>
>
> --
> Chen Song
> Mobile: 518-445-5096
>


Re: Schedule major compaction programmatically

2012-05-17 Thread Chen Song
Thanks Jimmy. Meanwhile, is there a work around for this?

How does compact/major_compact issued from hbase shell handles this under
the hood? Is it eventually calling HBaseAdmin API or HRegion synchronous
API call?

Thanks
Chen

On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang  wrote:

> I am thinking to add a function to check if a table or region in compaction
> (major or minor).
>
> I filed HBASE-6033. It won't show status of a specific compaction request.
> Will this help?
>
> Thanks,
> Jimmy
>
> On Thu, May 17, 2012 at 11:11 AM, Chen Song 
> wrote:
>
> > I would like to schedule major compaction on a region programmatically. I
> > found the API call below which can properly achieve my goal.
> >
> > HBaseAdmin.majorCompact(String tableOrRegionName)
> >
> > It turns out to be an asynchronous call and there seems no call back
> > parameter that can be specified. How can I validate the compaction result
> > (e.g., success or failure) ?
> >
> > Thanks
> > Chen
> >
>



-- 
Chen Song
Mobile: 518-445-5096


Re: hbase security

2012-05-17 Thread Andrew Purtell
> On 5/15/12 2:24 AM, Harsh J wrote:
> P.s. If you're making it to HBaseCon, you may not wanna miss
> http://www.hbasecon.com/sessions/hbase-security-for-the-enterprise/
> which also includes a tutorial (from Andrew).

Given the time constraints on the material I have to present and Q&A,
what I'm doing is bringing a ~5 minute (accelerated) video instead,
which I may or may not have time to show., and posted the scripts and
configuration used to set up the security enabled demo cluster in EC2
in a public GitHub repo:

https://github.com/apurtell/tm-ec2-demo

It's possible to use those GitHub scripts right away.

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)


Re: Schedule major compaction programmatically

2012-05-17 Thread Jimmy Xiang
I am thinking to add a function to check if a table or region in compaction
(major or minor).

I filed HBASE-6033. It won't show status of a specific compaction request.
Will this help?

Thanks,
Jimmy

On Thu, May 17, 2012 at 11:11 AM, Chen Song  wrote:

> I would like to schedule major compaction on a region programmatically. I
> found the API call below which can properly achieve my goal.
>
> HBaseAdmin.majorCompact(String tableOrRegionName)
>
> It turns out to be an asynchronous call and there seems no call back
> parameter that can be specified. How can I validate the compaction result
> (e.g., success or failure) ?
>
> Thanks
> Chen
>


Re: EndPoint Coprocessor could be dealocked?

2012-05-17 Thread Michael Segel
> You should not let just any user run coprocessors on the server. That's 
> madness.
> 
> Best regards,
> 
>- Andy

Fei Ding, 

I'm a little confused. 
Are you trying to solve the problem of querying  data efficiently from a table, 
or are you trying to find an example of where and when  to use co-processors?

You actually have an interesting problem that isn't easily solved in relational 
databases, but I don't think its an appropriate problem if you want to stress 
the use of coprocessors. 

Yes with Indexes you want to use coprocessors as a way to keep the index in 
synch with the underlying table. 

However beyond that... the solution is really best run as a M/R job. 

Considering that HBase has two different access methods. One is as part of M/R 
jobs, the other is a client/server model.  If you wanted to, you could create a 
service/engine/app that would allow you to efficiently query and return result 
sets from your database, as well as manage indexes. 
In part, coprocessors make this a lot easier. 

If you consider the general flow of my solution earlier in this thread, you now 
have a really great way to implement this.

Note: we're really talking about allowing someone to query data from a table 
using multiple indexes and index types. Think alternate table (key/value pair) 
, Lucene/SOLR, and GeoSpatial.

You could even bench mark it against an Oracle implementation, and probably 
smoke it.
You could also do efficient joins between tables. 

So yeah, I would encourage you to work on your initial problem... ;-)

Just Saying...  ;-)

-Mike

On May 16, 2012, at 8:49 PM, Andrew Purtell wrote:

> On Wed, May 16, 2012 at 6:43 PM, fding hbase  wrote:
>>> Not coprocessors in general. The client side support for Endpoints
>>> (Exec, etc.) gives the developer the fiction of addressing the cluster
>>> as a range of rows, and will parallelize per-region Endpoint
>>> invocations, and collect the responses, and can return them all to the
>>> caller as "a single call".
>> 
>> But on the deadlock problem the Endpoint behaves the same way as Observer.
>> Endpoints are also executed via RPC handlers of RegionServer.
> 
> Reread what I wrote. I'm not talking about the server side above.
> 
> Regarding the RPC issues, yes the behavior is the same. My other point
> was there is no RPC deadlock if you schedule your additional work
> (which issues RPCs) in some background thread or Executor and return
> to the client immediately. But that is not what you have claimed you
> want to do, you want to do some distributed indexed join if I
> understood it correctly *first* (via RPC) and *then* return to the
> client. That is how you would get deadlocks.
> 
>> the coprocessors are written by users and any kind of
>> code may appear on the server side.
> 
> You should not let just any user run coprocessors on the server. That's 
> madness.
> 
> Best regards,
> 
>- Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)
> 



Re: Unique ID per URL

2012-05-17 Thread sagar naik
We use md5(url)
That gives us a good distribution

-Sagar

On Thu, May 17, 2012 at 1:02 AM, Amit Sela  wrote:

> Hi all,
>
> One of our HBase tables holds URL as a row key.
>
> I read that it is recommended to hold the URL key as: reversed domain +
>  URL ID (using unique id per url.)
>
> I understand the part of reversed domain but could anyone elaborate about
> unique id per url, maybe give an example ?
>
> Thanks.
>


Consider individual RSs performance when writing records with random keys?

2012-05-17 Thread Alex Baranau
Hi,

1.
Not sure if you've seen HBaseWD (https://github.com/sematext/HBaseWD)
project. It implements the "salt keys with prefix" approach when writing
monotonically increasing row key/timeseries data. If simplified, the idea
is to add random prefix to the row key so that writes end up on different
region servers (avoiding single RS hotspot).

2.
When writing data to HBase with salted or random keys (so that load is well
distributed over cluster) the write speed per RS is limited by the slowest
RS in cluster (singe one Region is served by one RS).

Given 1 & 2 I got this crazy idea to:
* write in multiple threads
* each prefix (or interval of keys in case of completely random keys) is
assigned to particular thread, so that records with this prefix always
written by that thread
* measure how well each thread performs (e.g. write speed)
* based on each thread performance, salt (or randomize) keys in a biased
way, so that threads which perform better got more records to write

Thus we will be loading less those RSs that are "slower" and overall load
will be more or less balanced which will give max write performance for the
cluster.
This might work if each thread is writing into relatively small number of
all RSs though only, I think. Otherwise they will perform more or less the
same.

Am I completely crazy when thinking about this? Does it makes sense to you
at all?

Alex Baranau
--
Sematext :: http://blog.sematext.com/


Re: hbase security

2012-05-17 Thread Eugene Koontz
On 5/15/12 2:24 AM, Harsh J wrote:
> HBase 0.92 has table-level security (among other goodies). Check out
> this slide on what all it includes:
> http://www.slideshare.net/ghelmling/new-hbase-features-coprocessors-and-security
> 
> There was also a good blog post earlier on how to set it up, but am
> currently unable to locate it. I'll post back in case I find an
> archive (or someone else may).
> 
> P.s. If you're making it to HBaseCon, you may not wanna miss
> http://www.hbasecon.com/sessions/hbase-security-for-the-enterprise/
> which also includes a tutorial (from Andrew).
> 
Hi Harsh J and Rita,

You might be interested in a couple of blog posts from the old HBase
blog (hbaseblog.com). The site is gone but you can still see them on the
Internet Archive:

http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/

http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/

-Eugene



Timming problems with getScanner()

2012-05-17 Thread Belussi

Hi,

I have really strange timing problem with getScanner() method.

I have cluster with 3 nodes. If i want to read 10 rows (using JAVA API,
50bytes per row), ~50% of reads take me 10ms and second ~50% take more then
1000ms. I didn`t find any dependencies between that results and hbase
configuration or hardware. I`m pretty sure that I made all necessary things
with Hbase configuration to speed it up. 

Could anyone tell what should I do to fix it?

Kamil,

  
-- 
View this message in context: 
http://old.nabble.com/Timming-problems-with-getScanner%28%29-tp33864310p33864310.html
Sent from the HBase User mailing list archive at Nabble.com.



Unique ID per URL

2012-05-17 Thread Amit Sela
Hi all,

One of our HBase tables holds URL as a row key.

I read that it is recommended to hold the URL key as: reversed domain +
 URL ID (using unique id per url.)

I understand the part of reversed domain but could anyone elaborate about
unique id per url, maybe give an example ?

Thanks.