Re: EndPoint Coprocessor could be dealocked?
Hi Michel, On Fri, May 18, 2012 at 1:39 AM, Michael Segel wrote: > > You should not let just any user run coprocessors on the server. That's > madness. > > > > Best regards, > > > >- Andy > > Fei Ding, > > I'm a little confused. > Are you trying to solve the problem of querying data efficiently from a > table, or are you trying to find an example of where and when to use > co-processors? > > I'm trying to solve the problem of querying data efficiently. Coprocessor is one of the possible solutions that I've tried. > You actually have an interesting problem that isn't easily solved in > relational databases, but I don't think its an appropriate problem if you > want to stress the use of coprocessors. > > Yes with Indexes you want to use coprocessors as a way to keep the index > in synch with the underlying table. > > However beyond that... the solution is really best run as a M/R job. > > Considering that HBase has two different access methods. One is as part of > M/R jobs, the other is a client/server model. If you wanted to, you could > create a service/engine/app that would allow you to efficiently query and > return result sets from your database, as well as manage indexes. > In part, coprocessors make this a lot easier. > I'm not using the coprocessors to maintain index tables, but using extended client to do this. > > If you consider the general flow of my solution earlier in this thread, > you now have a really great way to implement this. > > Note: we're really talking about allowing someone to query data from a > table using multiple indexes and index types. Think alternate table > (key/value pair) , Lucene/SOLR, and GeoSpatial. > > You could even bench mark it against an Oracle implementation, and > probably smoke it. > You could also do efficient joins between tables. > > So yeah, I would encourage you to work on your initial problem... ;-) > > Alternate table is also one of the possible solutions, however, it's not that easy too. I'm still working on it. ;-) -- Best Regards! Fei Ding fding.chu...@gmail.com
Re: Trailer 'header' is wrong; does the trailer size match content
HBase Version: hbase-0.90.4-cdh3u3 Hadoop Version: hadoop-0.20.2-cdh3u2 12/05/17 16:37:47 ERROR mapreduce.LoadIncrementalHFiles: IOException during splitting java.util.concurrent.ExecutionException: java.io.IOException: Trailer 'header' is wrong; does the trailer size match content? at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplitPhase(LoadIncrementalHFiles.java:333) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:233) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Caused by: java.io.IOException: Trailer 'header' is wrong; does the trailer size match content? at org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1527) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:885) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Exception in thread "main" java.io.IOException: Trailer 'header' is wrong; does the trailer size match content? at org.apache.hadoop.hbase.io.hfile.HFile$FixedFileTrailer.deserialize(HFile.java:1527) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readTrailer(HFile.java:885) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:819) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:405) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:321) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) On Thu, May 17, 2012 at 4:55 PM, Ted Yu wrote: > Can you post the complete message ? > > What HBase version are you using ? > > On Thu, May 17, 2012 at 4:48 PM, Something Something < > mailinglist...@gmail.com> wrote: > > > Hello, > > > > I keep getting this message while running the 'completebulkload' process. > > I tried the following solutions that I came across while Googling for > this > > error: > > > > 1) setReduceSpeculativeExecution(true) > > > > 2) Made sure that none of the tasks are failing. > > > > 3) The HFileOutput job runs successfully. > > > > 4) The first 2 lines in the output from HFileOutput look like this: > > > > 2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 3437 > > 2a 34 39 39 39 row=+9L99/++MWT7f2+2*1*153347*4999, > > families={(family=info, > > > > > keyvalues=(+9L99/++MWT7f2+2*1*153347*4999/info:frequency/922337203685
Re: Trailer 'header' is wrong; does the trailer size match content
Can you post the complete message ? What HBase version are you using ? On Thu, May 17, 2012 at 4:48 PM, Something Something < mailinglist...@gmail.com> wrote: > Hello, > > I keep getting this message while running the 'completebulkload' process. > I tried the following solutions that I came across while Googling for this > error: > > 1) setReduceSpeculativeExecution(true) > > 2) Made sure that none of the tasks are failing. > > 3) The HFileOutput job runs successfully. > > 4) The first 2 lines in the output from HFileOutput look like this: > > 2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 34 37 > 2a 34 39 39 39 row=+9L99/++MWT7f2+2*1*153347*4999, > families={(family=info, > > keyvalues=(+9L99/++MWT7f2+2*1*153347*4999/info:frequency/9223372036854775807/Put/vlen=1)} > 2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 34 38 > 2a 34 39 39 39 row=+9L99/++MWT7f2+2*1*153348*4999, > families={(family=info, > > keyvalues=(+9L99/++MWT7f2+2*1*153348*4999/info:frequency/9223372036854775807/Put/vlen=1)} > > > 5) My Mapper for HFileOutput looks like this: > >public static class MyMapper extends MapReduceBase implements > Mapper { > >@Override >public void map(LongWritable key, Text value, > OutputCollector output, Reporter reporter) >throws IOException { >String[] values = value.toString().split("\t"); >String key1 = values[0]; >String value1 = values[1]; > >ImmutableBytesWritable ibw = new > ImmutableBytesWritable(key1.getBytes()); >Put put = new Put(Bytes.toBytes(key1)); >put.add(Bytes.toBytes("info"), Bytes.toBytes("frequency"), > Bytes.toBytes(value1)); >output.collect(ibw, put); >} > >} > > > Any ideas what could be wrong? Thanks for your help. >
Trailer 'header' is wrong; does the trailer size match content
Hello, I keep getting this message while running the 'completebulkload' process. I tried the following solutions that I came across while Googling for this error: 1) setReduceSpeculativeExecution(true) 2) Made sure that none of the tasks are failing. 3) The HFileOutput job runs successfully. 4) The first 2 lines in the output from HFileOutput look like this: 2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 34 37 2a 34 39 39 39 row=+9L99/++MWT7f2+2*1*153347*4999, families={(family=info, keyvalues=(+9L99/++MWT7f2+2*1*153347*4999/info:frequency/9223372036854775807/Put/vlen=1)} 2b 39 4c 39 39 2f 2b 2b 4d 57 54 37 66 32 2b 32 2a 31 2a 31 35 33 33 34 38 2a 34 39 39 39 row=+9L99/++MWT7f2+2*1*153348*4999, families={(family=info, keyvalues=(+9L99/++MWT7f2+2*1*153348*4999/info:frequency/9223372036854775807/Put/vlen=1)} 5) My Mapper for HFileOutput looks like this: public static class MyMapper extends MapReduceBase implements Mapper { @Override public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { String[] values = value.toString().split("\t"); String key1 = values[0]; String value1 = values[1]; ImmutableBytesWritable ibw = new ImmutableBytesWritable(key1.getBytes()); Put put = new Put(Bytes.toBytes(key1)); put.add(Bytes.toBytes("info"), Bytes.toBytes("frequency"), Bytes.toBytes(value1)); output.collect(ibw, put); } } Any ideas what could be wrong? Thanks for your help.
hbase data
Hello, Currently, using hbase to store sensor data -- basically large time series data hitting close to 2 billion rows for a type of sensor. I was wondering how hbase differs from HDF (http://www.hdfgroup.org/HDF5/) file format. Most of my operations are scanning a range and getting its values but it seems I can achieve this usind HDF. Does anyone have experience with this file container format and shed some light? -- --- Get your facts first, then you can distort them as you please.--
Re: client timeouts after upgrading to 0.92
This means that the servers aren't responding in 60 seconds to the clients, I believe this is new from 0.90 so it could be that you were used to have long-running requests. If not, check what's going on with those servers at the address given in the exception message. J-D On Thu, May 17, 2012 at 2:35 PM, Viral Bajaria wrote: > Hello, > > I just upgraded our production cluster from hbase 0.89 (chd3b2, yeah!!) to > 0.92.1, ever since the upgrade I see a lot of issues with timeouts on my > clients. > > Below are the log dumps from the client and the regionserver that it was > requesting the data from. I can overcome this exception by increasing > hbase.rpc.timeout but I doubt that's the right way of solving this issue. > > Has anyone faced this issue in hbase 0.92, if yes how did you go about > solving it ? If not, any pointers on how to start debugging this ? > > Thanks, > Viral > > *Client Logs* > 2012-05-17T19:44:19.191Z Processed 277000, written 277000, writing row for > 40036669 (this log line is output for every 1000 rows) > 12/05/17 19:45:19 WARN client.HConnectionManager$HConnectionImplementation: > Failed all from > region=platform,40032999,1323868834966.e9f3d644fa843340129355bd9e005903., > hostname=elshadoop-c01, port=60020 > java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: > Call to elshadoop-c01/10.16.80.69:60020 failed on socket timeout exception: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected > local=/10.16.80.30:57245remote=elshadoop-c01/ > 10.16.80.69:60020] > at > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409) > at > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943) > at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:816) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:803) > at > com.x.aggregation.hbase.util.BulkTableWriter.commit(BulkTableWriter.java:191) > at > com.x.metrics.ingestmetadata.DataProvider.writePlatformToShowId(DataProvider.java:1799) > at com.x.metrics.ingestmetadata.Program$16.run(Program.java:163) > at com.x.metrics.ingestmetadata.Program.runAction(Program.java:24) > at com.x.metrics.ingestmetadata.Program.main(Program.java:218) > Caused by: java.net.SocketTimeoutException: Call to elshadoop-c01/ > 10.16.80.69:60020 failed on socket timeout exception: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected > local=/10.16.80.30:57245remote=elshadoop-c01/ > 10.16.80.69:60020] > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:949) > at > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:922) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) > at $Proxy4.multi(Unknown Source) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.net.SocketTimeoutException: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected > local=/10.16.80.30:57245remote=elshadoop-c01/ > 10.16.80.69:60020] > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at
Re: hbase security
On 5/17/12 1:22 PM, Stack wrote: > On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz wrote: >> http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/ >> >> http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/ >> > > Anyone interested in porting these over to > http://blogs.apache.org/hbase/? They have great stuff in them. > St.Ack Hi St. Ack, Thanks for saying so! I'm planning to port mine (the access controls post) as soon as my Apache Roller account is granted by the Infra folks. -Eugene
Re: hbase security
I could repost the "up and running with secure hadoop" one. But it's kind of out of date at this point. I remember, back when the site was still up, getting some comments on it about things that had already changed in the 0.20.20X releases. I can take a look and see how bad it is. On Thu, May 17, 2012 at 1:22 PM, Stack wrote: > On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz wrote: >> http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/ >> >> http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/ >> > > Anyone interested in porting these over to > http://blogs.apache.org/hbase/? They have great stuff in them. > St.Ack
Re: Schedule major compaction programmatically
That's why you can't use it. :) You can hack it around if you want. Even better, you can start to contribute and work on HBASE-6033. Cheers, Jimmy On Thu, May 17, 2012 at 1:23 PM, Chen Song wrote: > Sorry for another dump question. As I am querying such information in > client code, how to get a HRegionServer from a HRegionInfo, > or HServerAddress? > > I found a way to get HRegionInterface shown below. > > HConnection.getHRegionConnection(HServerAddress) > > But getMetrics method is not exposed on HRegionInterface and only on > HRegionServer. > Thanks > Chen > > On Thu, May 17, 2012 at 3:03 PM, Jimmy Xiang wrote: > > > HRegionServer.java: > > > >this.metrics.compactionQueueSize.set(compactSplitThread > >.getCompactionQueueSize()); > > > > On Thu, May 17, 2012 at 12:00 PM, Chen Song > > wrote: > > > > > Can you direct me to the API call to get the queue size metrics? > > > > > > On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang > > wrote: > > > > > > > It is an async call to the region server to request a compaction. > Once > > > the > > > > request is accepted, > > > > the call returned. There is no sync call here. The request is > queued > > > and > > > > processed by a pool > > > > of threads. > > > > > > > > Currently, there is a metric to show the queue size. But it doesn't > > tell > > > > how many are for major, > > > > and how many are for minor. The queue size is the number of store > > files > > > > pending compact. > > > > > > > > As I know, there is no work around for now. > > > > > > > > Jimmy > > > > > > > > > > > > On Thu, May 17, 2012 at 11:42 AM, Chen Song > > > > wrote: > > > > > > > > > Thanks Jimmy. Meanwhile, is there a work around for this? > > > > > > > > > > How does compact/major_compact issued from hbase shell handles this > > > under > > > > > the hood? Is it eventually calling HBaseAdmin API or HRegion > > > synchronous > > > > > API call? > > > > > > > > > > Thanks > > > > > Chen > > > > > > > > > > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang > > > > wrote: > > > > > > > > > > > I am thinking to add a function to check if a table or region in > > > > > compaction > > > > > > (major or minor). > > > > > > > > > > > > I filed HBASE-6033. It won't show status of a specific compaction > > > > > request. > > > > > > Will this help? > > > > > > > > > > > > Thanks, > > > > > > Jimmy > > > > > > > > > > > > On Thu, May 17, 2012 at 11:11 AM, Chen Song < > > chen.song...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > I would like to schedule major compaction on a region > > > > > programmatically. I > > > > > > > found the API call below which can properly achieve my goal. > > > > > > > > > > > > > > HBaseAdmin.majorCompact(String tableOrRegionName) > > > > > > > > > > > > > > It turns out to be an asynchronous call and there seems no call > > > back > > > > > > > parameter that can be specified. How can I validate the > > compaction > > > > > result > > > > > > > (e.g., success or failure) ? > > > > > > > > > > > > > > Thanks > > > > > > > Chen > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Chen Song > > > > > Mobile: 518-445-5096 > > > > > > > > > > > > > > > > > > > > > -- > > > Chen Song > > > Mobile: 518-445-5096 > > > > > > > > > -- > Chen Song > Mobile: 518-445-5096 >
Re: Schedule major compaction programmatically
Sorry for another dump question. As I am querying such information in client code, how to get a HRegionServer from a HRegionInfo, or HServerAddress? I found a way to get HRegionInterface shown below. HConnection.getHRegionConnection(HServerAddress) But getMetrics method is not exposed on HRegionInterface and only on HRegionServer. Thanks Chen On Thu, May 17, 2012 at 3:03 PM, Jimmy Xiang wrote: > HRegionServer.java: > >this.metrics.compactionQueueSize.set(compactSplitThread >.getCompactionQueueSize()); > > On Thu, May 17, 2012 at 12:00 PM, Chen Song > wrote: > > > Can you direct me to the API call to get the queue size metrics? > > > > On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang > wrote: > > > > > It is an async call to the region server to request a compaction. Once > > the > > > request is accepted, > > > the call returned. There is no sync call here. The request is queued > > and > > > processed by a pool > > > of threads. > > > > > > Currently, there is a metric to show the queue size. But it doesn't > tell > > > how many are for major, > > > and how many are for minor. The queue size is the number of store > files > > > pending compact. > > > > > > As I know, there is no work around for now. > > > > > > Jimmy > > > > > > > > > On Thu, May 17, 2012 at 11:42 AM, Chen Song > > > wrote: > > > > > > > Thanks Jimmy. Meanwhile, is there a work around for this? > > > > > > > > How does compact/major_compact issued from hbase shell handles this > > under > > > > the hood? Is it eventually calling HBaseAdmin API or HRegion > > synchronous > > > > API call? > > > > > > > > Thanks > > > > Chen > > > > > > > > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang > > > wrote: > > > > > > > > > I am thinking to add a function to check if a table or region in > > > > compaction > > > > > (major or minor). > > > > > > > > > > I filed HBASE-6033. It won't show status of a specific compaction > > > > request. > > > > > Will this help? > > > > > > > > > > Thanks, > > > > > Jimmy > > > > > > > > > > On Thu, May 17, 2012 at 11:11 AM, Chen Song < > chen.song...@gmail.com> > > > > > wrote: > > > > > > > > > > > I would like to schedule major compaction on a region > > > > programmatically. I > > > > > > found the API call below which can properly achieve my goal. > > > > > > > > > > > > HBaseAdmin.majorCompact(String tableOrRegionName) > > > > > > > > > > > > It turns out to be an asynchronous call and there seems no call > > back > > > > > > parameter that can be specified. How can I validate the > compaction > > > > result > > > > > > (e.g., success or failure) ? > > > > > > > > > > > > Thanks > > > > > > Chen > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Chen Song > > > > Mobile: 518-445-5096 > > > > > > > > > > > > > > > -- > > Chen Song > > Mobile: 518-445-5096 > > > -- Chen Song Mobile: 518-445-5096
Re: hbase security
On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz wrote: > http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/ > > http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/ > Anyone interested in porting these over to http://blogs.apache.org/hbase/? They have great stuff in them. St.Ack
Re: Schedule major compaction programmatically
HRegionServer.java: this.metrics.compactionQueueSize.set(compactSplitThread .getCompactionQueueSize()); On Thu, May 17, 2012 at 12:00 PM, Chen Song wrote: > Can you direct me to the API call to get the queue size metrics? > > On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang wrote: > > > It is an async call to the region server to request a compaction. Once > the > > request is accepted, > > the call returned. There is no sync call here. The request is queued > and > > processed by a pool > > of threads. > > > > Currently, there is a metric to show the queue size. But it doesn't tell > > how many are for major, > > and how many are for minor. The queue size is the number of store files > > pending compact. > > > > As I know, there is no work around for now. > > > > Jimmy > > > > > > On Thu, May 17, 2012 at 11:42 AM, Chen Song > > wrote: > > > > > Thanks Jimmy. Meanwhile, is there a work around for this? > > > > > > How does compact/major_compact issued from hbase shell handles this > under > > > the hood? Is it eventually calling HBaseAdmin API or HRegion > synchronous > > > API call? > > > > > > Thanks > > > Chen > > > > > > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang > > wrote: > > > > > > > I am thinking to add a function to check if a table or region in > > > compaction > > > > (major or minor). > > > > > > > > I filed HBASE-6033. It won't show status of a specific compaction > > > request. > > > > Will this help? > > > > > > > > Thanks, > > > > Jimmy > > > > > > > > On Thu, May 17, 2012 at 11:11 AM, Chen Song > > > > wrote: > > > > > > > > > I would like to schedule major compaction on a region > > > programmatically. I > > > > > found the API call below which can properly achieve my goal. > > > > > > > > > > HBaseAdmin.majorCompact(String tableOrRegionName) > > > > > > > > > > It turns out to be an asynchronous call and there seems no call > back > > > > > parameter that can be specified. How can I validate the compaction > > > result > > > > > (e.g., success or failure) ? > > > > > > > > > > Thanks > > > > > Chen > > > > > > > > > > > > > > > > > > > > > -- > > > Chen Song > > > Mobile: 518-445-5096 > > > > > > > > > -- > Chen Song > Mobile: 518-445-5096 >
Re: Schedule major compaction programmatically
Can you direct me to the API call to get the queue size metrics? On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang wrote: > It is an async call to the region server to request a compaction. Once the > request is accepted, > the call returned. There is no sync call here. The request is queued and > processed by a pool > of threads. > > Currently, there is a metric to show the queue size. But it doesn't tell > how many are for major, > and how many are for minor. The queue size is the number of store files > pending compact. > > As I know, there is no work around for now. > > Jimmy > > > On Thu, May 17, 2012 at 11:42 AM, Chen Song > wrote: > > > Thanks Jimmy. Meanwhile, is there a work around for this? > > > > How does compact/major_compact issued from hbase shell handles this under > > the hood? Is it eventually calling HBaseAdmin API or HRegion synchronous > > API call? > > > > Thanks > > Chen > > > > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang > wrote: > > > > > I am thinking to add a function to check if a table or region in > > compaction > > > (major or minor). > > > > > > I filed HBASE-6033. It won't show status of a specific compaction > > request. > > > Will this help? > > > > > > Thanks, > > > Jimmy > > > > > > On Thu, May 17, 2012 at 11:11 AM, Chen Song > > > wrote: > > > > > > > I would like to schedule major compaction on a region > > programmatically. I > > > > found the API call below which can properly achieve my goal. > > > > > > > > HBaseAdmin.majorCompact(String tableOrRegionName) > > > > > > > > It turns out to be an asynchronous call and there seems no call back > > > > parameter that can be specified. How can I validate the compaction > > result > > > > (e.g., success or failure) ? > > > > > > > > Thanks > > > > Chen > > > > > > > > > > > > > > > -- > > Chen Song > > Mobile: 518-445-5096 > > > -- Chen Song Mobile: 518-445-5096
Re: Schedule major compaction programmatically
It is an async call to the region server to request a compaction. Once the request is accepted, the call returned. There is no sync call here. The request is queued and processed by a pool of threads. Currently, there is a metric to show the queue size. But it doesn't tell how many are for major, and how many are for minor. The queue size is the number of store files pending compact. As I know, there is no work around for now. Jimmy On Thu, May 17, 2012 at 11:42 AM, Chen Song wrote: > Thanks Jimmy. Meanwhile, is there a work around for this? > > How does compact/major_compact issued from hbase shell handles this under > the hood? Is it eventually calling HBaseAdmin API or HRegion synchronous > API call? > > Thanks > Chen > > On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang wrote: > > > I am thinking to add a function to check if a table or region in > compaction > > (major or minor). > > > > I filed HBASE-6033. It won't show status of a specific compaction > request. > > Will this help? > > > > Thanks, > > Jimmy > > > > On Thu, May 17, 2012 at 11:11 AM, Chen Song > > wrote: > > > > > I would like to schedule major compaction on a region > programmatically. I > > > found the API call below which can properly achieve my goal. > > > > > > HBaseAdmin.majorCompact(String tableOrRegionName) > > > > > > It turns out to be an asynchronous call and there seems no call back > > > parameter that can be specified. How can I validate the compaction > result > > > (e.g., success or failure) ? > > > > > > Thanks > > > Chen > > > > > > > > > -- > Chen Song > Mobile: 518-445-5096 >
Re: Schedule major compaction programmatically
Thanks Jimmy. Meanwhile, is there a work around for this? How does compact/major_compact issued from hbase shell handles this under the hood? Is it eventually calling HBaseAdmin API or HRegion synchronous API call? Thanks Chen On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang wrote: > I am thinking to add a function to check if a table or region in compaction > (major or minor). > > I filed HBASE-6033. It won't show status of a specific compaction request. > Will this help? > > Thanks, > Jimmy > > On Thu, May 17, 2012 at 11:11 AM, Chen Song > wrote: > > > I would like to schedule major compaction on a region programmatically. I > > found the API call below which can properly achieve my goal. > > > > HBaseAdmin.majorCompact(String tableOrRegionName) > > > > It turns out to be an asynchronous call and there seems no call back > > parameter that can be specified. How can I validate the compaction result > > (e.g., success or failure) ? > > > > Thanks > > Chen > > > -- Chen Song Mobile: 518-445-5096
Re: hbase security
> On 5/15/12 2:24 AM, Harsh J wrote: > P.s. If you're making it to HBaseCon, you may not wanna miss > http://www.hbasecon.com/sessions/hbase-security-for-the-enterprise/ > which also includes a tutorial (from Andrew). Given the time constraints on the material I have to present and Q&A, what I'm doing is bringing a ~5 minute (accelerated) video instead, which I may or may not have time to show., and posted the scripts and configuration used to set up the security enabled demo cluster in EC2 in a public GitHub repo: https://github.com/apurtell/tm-ec2-demo It's possible to use those GitHub scripts right away. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Schedule major compaction programmatically
I am thinking to add a function to check if a table or region in compaction (major or minor). I filed HBASE-6033. It won't show status of a specific compaction request. Will this help? Thanks, Jimmy On Thu, May 17, 2012 at 11:11 AM, Chen Song wrote: > I would like to schedule major compaction on a region programmatically. I > found the API call below which can properly achieve my goal. > > HBaseAdmin.majorCompact(String tableOrRegionName) > > It turns out to be an asynchronous call and there seems no call back > parameter that can be specified. How can I validate the compaction result > (e.g., success or failure) ? > > Thanks > Chen >
Re: EndPoint Coprocessor could be dealocked?
> You should not let just any user run coprocessors on the server. That's > madness. > > Best regards, > >- Andy Fei Ding, I'm a little confused. Are you trying to solve the problem of querying data efficiently from a table, or are you trying to find an example of where and when to use co-processors? You actually have an interesting problem that isn't easily solved in relational databases, but I don't think its an appropriate problem if you want to stress the use of coprocessors. Yes with Indexes you want to use coprocessors as a way to keep the index in synch with the underlying table. However beyond that... the solution is really best run as a M/R job. Considering that HBase has two different access methods. One is as part of M/R jobs, the other is a client/server model. If you wanted to, you could create a service/engine/app that would allow you to efficiently query and return result sets from your database, as well as manage indexes. In part, coprocessors make this a lot easier. If you consider the general flow of my solution earlier in this thread, you now have a really great way to implement this. Note: we're really talking about allowing someone to query data from a table using multiple indexes and index types. Think alternate table (key/value pair) , Lucene/SOLR, and GeoSpatial. You could even bench mark it against an Oracle implementation, and probably smoke it. You could also do efficient joins between tables. So yeah, I would encourage you to work on your initial problem... ;-) Just Saying... ;-) -Mike On May 16, 2012, at 8:49 PM, Andrew Purtell wrote: > On Wed, May 16, 2012 at 6:43 PM, fding hbase wrote: >>> Not coprocessors in general. The client side support for Endpoints >>> (Exec, etc.) gives the developer the fiction of addressing the cluster >>> as a range of rows, and will parallelize per-region Endpoint >>> invocations, and collect the responses, and can return them all to the >>> caller as "a single call". >> >> But on the deadlock problem the Endpoint behaves the same way as Observer. >> Endpoints are also executed via RPC handlers of RegionServer. > > Reread what I wrote. I'm not talking about the server side above. > > Regarding the RPC issues, yes the behavior is the same. My other point > was there is no RPC deadlock if you schedule your additional work > (which issues RPCs) in some background thread or Executor and return > to the client immediately. But that is not what you have claimed you > want to do, you want to do some distributed indexed join if I > understood it correctly *first* (via RPC) and *then* return to the > client. That is how you would get deadlocks. > >> the coprocessors are written by users and any kind of >> code may appear on the server side. > > You should not let just any user run coprocessors on the server. That's > madness. > > Best regards, > >- Andy > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein (via Tom White) >
Re: Unique ID per URL
We use md5(url) That gives us a good distribution -Sagar On Thu, May 17, 2012 at 1:02 AM, Amit Sela wrote: > Hi all, > > One of our HBase tables holds URL as a row key. > > I read that it is recommended to hold the URL key as: reversed domain + > URL ID (using unique id per url.) > > I understand the part of reversed domain but could anyone elaborate about > unique id per url, maybe give an example ? > > Thanks. >
Consider individual RSs performance when writing records with random keys?
Hi, 1. Not sure if you've seen HBaseWD (https://github.com/sematext/HBaseWD) project. It implements the "salt keys with prefix" approach when writing monotonically increasing row key/timeseries data. If simplified, the idea is to add random prefix to the row key so that writes end up on different region servers (avoiding single RS hotspot). 2. When writing data to HBase with salted or random keys (so that load is well distributed over cluster) the write speed per RS is limited by the slowest RS in cluster (singe one Region is served by one RS). Given 1 & 2 I got this crazy idea to: * write in multiple threads * each prefix (or interval of keys in case of completely random keys) is assigned to particular thread, so that records with this prefix always written by that thread * measure how well each thread performs (e.g. write speed) * based on each thread performance, salt (or randomize) keys in a biased way, so that threads which perform better got more records to write Thus we will be loading less those RSs that are "slower" and overall load will be more or less balanced which will give max write performance for the cluster. This might work if each thread is writing into relatively small number of all RSs though only, I think. Otherwise they will perform more or less the same. Am I completely crazy when thinking about this? Does it makes sense to you at all? Alex Baranau -- Sematext :: http://blog.sematext.com/
Re: hbase security
On 5/15/12 2:24 AM, Harsh J wrote: > HBase 0.92 has table-level security (among other goodies). Check out > this slide on what all it includes: > http://www.slideshare.net/ghelmling/new-hbase-features-coprocessors-and-security > > There was also a good blog post earlier on how to set it up, but am > currently unable to locate it. I'll post back in case I find an > archive (or someone else may). > > P.s. If you're making it to HBaseCon, you may not wanna miss > http://www.hbasecon.com/sessions/hbase-security-for-the-enterprise/ > which also includes a tutorial (from Andrew). > Hi Harsh J and Rita, You might be interested in a couple of blog posts from the old HBase blog (hbaseblog.com). The site is gone but you can still see them on the Internet Archive: http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/ http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/ -Eugene
Timming problems with getScanner()
Hi, I have really strange timing problem with getScanner() method. I have cluster with 3 nodes. If i want to read 10 rows (using JAVA API, 50bytes per row), ~50% of reads take me 10ms and second ~50% take more then 1000ms. I didn`t find any dependencies between that results and hbase configuration or hardware. I`m pretty sure that I made all necessary things with Hbase configuration to speed it up. Could anyone tell what should I do to fix it? Kamil, -- View this message in context: http://old.nabble.com/Timming-problems-with-getScanner%28%29-tp33864310p33864310.html Sent from the HBase User mailing list archive at Nabble.com.
Unique ID per URL
Hi all, One of our HBase tables holds URL as a row key. I read that it is recommended to hold the URL key as: reversed domain + URL ID (using unique id per url.) I understand the part of reversed domain but could anyone elaborate about unique id per url, maybe give an example ? Thanks.