from:"\"Ted Yu\""

Re: IndexOutOfBoundsException during retrieving region split point

2016-06-03 Thread Ted Yu

1.0.0 is quite old.

Is it possible to upgrade to 1.1 or 1.2 release ?

Thanks

On Fri, Jun 3, 2016 at 8:12 AM, Pankaj kr  wrote:

> Hi,
>
> We met a weird scenario in our production environment.
> IndexOutOfBoundsException is thrown while retrieving mid key of the
> storefile after region compaction.
>
> Log Snippet :
> -
> 2016-05-30 01:41:58,484 | INFO  |
> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 |
> Completed compaction of 1 (all) file(s) in CF of
> User_Namespace:User_Table,100050007010803_20140126_308010717550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3c5.
> into eee1f433635d478197b212e2e378fce8(size=22.0 G), total size for store is
> 22.0 G. This selection was in queue for 0sec, and took 6mins, 25sec to
> execute. |
> org.apache.hadoop.hbase.regionserver.HStore.logCompactionEndMessage(HStore.java:1356)
> 2016-05-30 01:41:58,485 | INFO  |
> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 |
> Completed compaction: Request =
> regionName=User_Namespace:User_Table,100050007010803_20140126_308010717550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3c5.,
> storeName=CF, fileCount=1, fileSize=44.0 G, priority=6,
> time=295643974900644; duration=6mins, 25sec |
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:544)
> 2016-05-30 01:41:58,529 | ERROR |
> regionserver/RS-HOSTNAME/RS-IP:21302-longCompactions-1464247799749 |
> Compaction failed Request =
> regionName=User_Namespace:User_Table,100050007010803_20140126_308010717550001_756781_99_36_0_01,1464543296529.676ee7e9902c066b0e8c15745463d3c5.,
> storeName=CF, fileCount=1, fileSize=44.0 G, priority=6,
> time=295643974900644 |
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:563)
> java.lang.IndexOutOfBoundsException
> at java.nio.Buffer.checkIndex(Buffer.java:540)
> at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
> at
> org.apache.hadoop.hbase.util.ByteBufferUtils.toBytes(ByteBufferUtils.java:490)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.midkey(HFileBlockIndex.java:349)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.midkey(HFileReaderV2.java:512)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.midkey(StoreFile.java:1480)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.getFileSplitPoint(StoreFile.java:685)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.getSplitPoint(DefaultStoreFileManager.java:126)
> at
> org.apache.hadoop.hbase.regionserver.HStore.getSplitPoint(HStore.java:1986)
> at
> org.apache.hadoop.hbase.regionserver.RegionSplitPolicy.getSplitPoint(RegionSplitPolicy.java:82)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.checkSplit(HRegion.java:7914)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestSplit(CompactSplitThread.java:240)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:552)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> -
> Observation:
> >> HFilePrettyPrinter also print the message "Unable to retrieve the
> midkey" for the mid key.
> >> HDFS fsck report the hfile healthy.
>
> Though successful region compaction were also there, only few region
> compaction failed with same error.
>
> Have anyone faced this issue? Any help will be appreciated.
> HBase version is 1.0.0.
>
> Regards,
> Pankaj
>

Re: RS open hbase:meta may throw NullPointerException

2016-06-03 Thread Ted Yu

Were you referring to the following lines ?

  // See HBASE-5094. Cross check with hbase:meta if still this RS
is owning
  // the region.
  Pair p = MetaReader.getRegion(
  this.catalogTracker, region.getRegionName());

The above is at line 3967 at head of 0.98 branch.

If you can come up with test which shows the problem for head of 0.98
branch, suggest opening a JIRA.

Cheers

On Fri, Jun 3, 2016 at 2:16 AM, WangYQ  wrote:

> in hbase 0.98.10,
> class HRegionServer
> method openRegion
> line 3827
>
>
> when RS open region, if this region is already opened by RS, we will check
> hbase:meta to see if hbase:meta is updated.
> but, if the RS is opening hbase:meta, then MetaReader.getRegion will
> return null(we do not store hbase:meta data in table hbase:meta), and lead
> to nullPointerException
>
>
> we can reproduce this problem as follows:
> 1. master assign hbase:meta
> 2. RS open hbase:meta slowly, master timeout(may because of high load or
> network problem),
> 3. master send open region request again(RS is still opening hbase:meta,
> and opened)
> 4. RS will throw NullPointerException and hmaster will retry forever
>
>

Re: Re: pool MASTER_OPEN_REGION in hmaster is not used

2016-06-03 Thread Ted Yu

I checked head of 0.98 branch.

For the two 'new OpenedRegionHandler()' calls
in AssignmentManager.java, process() is invoked directly.

Looks like you're right.

On Thu, Jun 2, 2016 at 10:58 PM, WangYQ  wrote:

> RS_ZK_REGION_OPENED   (4, ExecutorType.MASTER_OPEN_REGION),there is a
> relation between the enevt type "RS_ZK_REGION_OPENED" and pool
> "MASTER_OPEN_REGION", but in hbase 0.98.10, we do not use these pools
> anymoreexamples are :class AssignmentManagermethod: handleRegionin case
> RS_ZK_REGION_OPENED, we construct a OpendRegionHandler and call the process
> method, not submit this handler to pool(in class AssignmentManager, line
> 1078)
>
>
>
>
>
>
>
>
> At 2016-06-02 21:29:09, "Ted Yu"  wrote:
> >Have you seen this line in EventType.java ?
> >
> >  RS_ZK_REGION_OPENED   (4, ExecutorType.MASTER_OPEN_REGION),
> >
> >If you follow RS_ZK_REGION_OPENED, you would see how the executor is used.
> >
> >On Thu, Jun 2, 2016 at 4:56 AM, WangYQ  wrote:
> >
> >> in hbase 0.98.10, class HMaster, method startServiceThread, hmaster
> open a
> >> thread poolwith type MASTER_OPEN_REGION, but this pool is not used in
> any
> >> place
> >> can remove this.
> >>
> >>
> >> thanks
>

Re: pool MASTER_OPEN_REGION in hmaster is not used

2016-06-02 Thread Ted Yu

Have you seen this line in EventType.java ?

  RS_ZK_REGION_OPENED   (4, ExecutorType.MASTER_OPEN_REGION),

If you follow RS_ZK_REGION_OPENED, you would see how the executor is used.

On Thu, Jun 2, 2016 at 4:56 AM, WangYQ  wrote:

> in hbase 0.98.10, class HMaster, method startServiceThread, hmaster open a
> thread poolwith type MASTER_OPEN_REGION, but this pool is not used in any
> place
> can remove this.
>
>
> thanks

Re: region stuck in failed close state

2016-05-30 Thread Ted Yu

There is debug log in HRegion#replayWALFlushStartMarker :

  LOG.debug(getRegionInfo().getEncodedName() + " : "

  + " Prepared flush with seqId:" +
flush.getFlushSequenceNumber());

...

LOG.debug(getRegionInfo().getEncodedName() + " : "

  + " Prepared empty flush with seqId:" +
flush.getFlushSequenceNumber());

I searched for them in the log you attached to HBASE-15900 but didn't find
any occurrence.

FYI

On Mon, May 30, 2016 at 2:59 AM, Heng Chen  wrote:

> I  find something useful.
>
> When we do region.close,  if there is one compaction or flush in progress,
> close will wait for compaction or flush be finished.
>
> {code: title=HRegion.java}
>
> @Override
> public void waitForFlushesAndCompactions() {
>   synchronized (writestate) {
> if (this.writestate.readOnly) {
>   // we should not wait for replayed flushed if we are read only
> (for example in case the
>   // region is a secondary replica).
>   return;
> }
> boolean interrupted = false;
> try {
>   while (writestate.compacting > 0 || writestate.flushing) {
> LOG.debug("waiting for " + writestate.compacting + " compactions"
>   + (writestate.flushing ? " & cache flush" : "") + " to
> complete for region " + this);
> try {
>   writestate.wait();
> } catch (InterruptedException iex) {
>   // essentially ignore and propagate the interrupt back up
>   LOG.warn("Interrupted while waiting");
>   interrupted = true;
> }
>   }
> } finally {
>   if (interrupted) {
> Thread.currentThread().interrupt();
>   }
> }
>   }
> }
>
> {code}
>
> And writestate.flushing will be set to be true in two place:
>
> HRegion.flushCache and HRegion.replayWALFlushStartMarker
>
> {code: title=HRegion.flushCache}
>
> synchronized (writestate) {
>   if (!writestate.flushing && writestate.writesEnabled) {
> this.writestate.flushing = true;
>   } else {
> **
>   }
> }
>
> {code}
>
> {code: title=HRegion.replayWALFlushStartMarker}
>
> synchronized (writestate) {
>   try {
> **
> if (!writestate.flushing) {
>
> this.writestate.flushing = true;
> *...*
>
> * }*
>
> {code}
>
>
> Notice that,  in HRegion.replayWALFlushStartMarker,  we did not check
> writestate.writesEnabled before set writestate.flushing to be true.
>
> So if region.close wake up in writestate.wait, but the lock acquried by
> HRegion.replayWALFlushStartMarker,  the flushing will be set to be true
> again,  and region.close will stuck in writestate.wait forever.
>
>
> Will it happen in real logical?
>
>
> 2016-05-27 10:44 GMT+08:00 Heng Chen :
>
> > Thanks guys,  yesterday i restart relate RS and failed close region
> reopen
> > successfuly.  But today, there is another region fall in this state.
> >
> > I paste relate RS' jstack information.  This time the failed close region
> > is 9368190b3ba46238534b6307702aabae
> >
> > 2016-05-26 21:50 GMT+08:00 Ted Yu :
> >
> >> Heng:
> >> Can you pastebin the complete stack trace for the region server ?
> >>
> >> Snippet from region server log may also provide more clue.
> >>
> >> Thanks
> >>
> >> On Wed, May 25, 2016 at 9:48 PM, Heng Chen 
> >> wrote:
> >>
> >> > On master web UI, i could see region
> (c371fb20c372b8edbf54735409ab5c4a)
> >> > always in failed close state,  So balancer could not run.
> >> >
> >> >
> >> > i check the region on RS,  and found logs about this region
> >> >
> >> > 2016-05-26 12:42:10,490 INFO  [MemStoreFlusher.1]
> >> > regionserver.MemStoreFlusher: Waited 90447ms on a compaction to clean
> up
> >> > 'too many store files'; waited long enough... proceeding with flush of
> >> >
> >> >
> >>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> >> > 2016-05-26 12:42:20,043 INFO
> >> >  [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
> >> > regionserver.HRegionServer:
> >> > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
> >> > requesting flush for region
> >> >
> >> >
> >>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf

Re: exception not descriptive relationed to zookeeper.znode.parent

2016-05-28 Thread Ted Yu

Please use user@hbase for future correspondence.

Here is related code from ZooKeeperWatcher (NPE seems to have come from the
for loop):

  public List getMetaReplicaNodes() throws KeeperException {
List childrenOfBaseNode = ZKUtil.listChildrenNoWatch(this,
baseZNode);
List metaReplicaNodes = new ArrayList(2);
String pattern =
conf.get("zookeeper.znode.metaserver","meta-region-server");
for (String child : childrenOfBaseNode) {

ZKUtil.listChildrenNoWatch() would return null if the base znode doesn't
exist.

The error message you mentioned still exists:

hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java:
   + "There could be a mismatch with the one configured in the
master.";
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java:
   + "There could be a mismatch with the one configured in the
master.";

With zookeeper.znode.parent  properly set, do you still experience NPE with
your code ?

Thanks

On Fri, May 27, 2016 at 5:53 AM, Pablo Leira  wrote:

> Hi,
>
> I have just updated a client of hbase from version 0.96.2-hadoop2 to 1.2.1
> (hbase-client).
> With the old version with the property zookeeper.znode.parent of the file
> hbase-site.xml bad configured (p.e. with the value "hbase" instead of
> "hbase-secure"
> our "hbase-unsecure" )  . I was obtaining the following error:
>
> “The node /hbase (/hbase-unsecure or /hbase-secure) is not in ZooKeeper. It
> should have been written by the master. Check the value configured in
> 'zookeeper.znode.parent'. There could be a mismatch with the one configured
> in the master.“
>
>
> But with the new version I was obtaining the following trace:
>
> java.lang.RuntimeException: java.lang.NullPointerException at
>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
> at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
> at
>
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
> at
>
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
> at
> org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:161)
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794) at
>
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
> at
>
> org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:403)
> at
>
> com.denodo.connect.hadoop.hbase.HBaseConnector.doRun(HBaseConnector.java:198)
> at
>
> com.denodo.connect.hadoop.hdfs.wrapper.AbstractSecureHadoopWrapper.run(AbstractSecureHadoopWrapper.java:110)
> at com.denodo.vdb.misc.datasource.MyDataSource$1$1.doRun(Unknown Source) at
> com.denodo.vdb.engine.wrapper.raw.my.MyAccessImpl.doRun(Unknown Source) at
> com.denodo.vdb.engine.wrapper.RawAccess.run(Unknown Source) at
> com.denodo.vdb.engine.thread.g.a(Unknown Source) at
> com.denodo.vdb.engine.thread.ReusableThread.run(Unknown Source) Caused by:
> java.lang.NullPointerException at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.getMetaReplicaNodes(ZooKeeperWatcher.java:489)
> at
>
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:558)
> at
>
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
> at
>
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1211)
> at
>
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1178)
> at
>
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)
> at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
> at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
> at
>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
> ... 15 more
>
>
> This is my code:
>
>
> final Configuration hbaseConfig = getHBaseConfig(inputValues); /**
> Connection to the cluster. A single connection shared by all application
> threads. */ Connection connection = null; /** A lightweight handle to a
> specific table. Used from a single thread. */ Table table = null; try {
> final TableName tableName =
> TableName.valueOf(inputValues.get(ParameterNaming.CONF_TABLE_NAME));
> connection = ConnectionFactory.createConnection(hbaseConfig); Admin admin =
> connection.getAdmin(); if (!admin.tableExists(tableName)) {
>
>
> The last line where the exception is thrown.
>
> Are there any way to obtain a exception more specific than
> NullPointerException, because in the previous version this exception it was
> more descriptive. Or maybe my code is not correct.

Re: Some regions never get online after a region server crashes

2016-05-27 Thread Ted Yu

There were 7 regions Master tried to close which were opening but not yet
served.

d1c7f3f455f2529da82a2f713b5ee067 was one of them.

On Fri, May 27, 2016 at 12:47 AM, Shuai Lin  wrote:

> Here is the complete log on node6 between 13:10:47 and 13:11:47:
> http://paste.openstack.org/raw/505826/
>
> The master asked node6 to open several regions. Node6 opened the first 4
> very fast (within 1 seconsd) and got stuck at the 5th one. But there is no
> errors at that time.
>
> On Wed, May 25, 2016 at 10:12 PM, Ted Yu  wrote:
>
>> In AssignmentManager#assign(), you should find:
>>
>>   // Send OPEN RPC. If it fails on a IOE or RemoteException,
>>   // regions will be assigned individually.
>>   long maxWaitTime = System.currentTimeMillis() +
>> this.server.getConfiguration().
>>   getLong("hbase.regionserver.rpc.startup.waittime", 6);
>>
>> BTW can you see what caused rs-node6 to not respond around 13:11:47 ?
>>
>> Cheers
>>
>> On Fri, May 20, 2016 at 6:20 AM, Shuai Lin 
>> wrote:
>>
>>> Because of the "opening regions" rpc call sent by master to the region
>>> server node6 got timed out after 1 minutes?
>>>
>>> *RPC call was sent:*
>>>
>>> 2016-04-30 13:10:47,702 INFO 
>>> org.apache.hadoop.hbase.master.AssignmentManager:
>>> Assigning 22 region(s) tors-node6.example.com,60020,1458723856883
>>>
>>> *After 1 minute:*
>>>
>>> 2016-04-30 13:11:47,780 INFO
>>> org.apache.hadoop.hbase.master.AssignmentManager: Unable to communicate
>>> with rs-node6.example.com,60020,1458723856883 in order to assign
>>> regions, java.io.IOException: Call to
>>> rs-node6.example.com/172.16.6.6:60020 failed on local exception:
>>> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=4,
>>> waitTime=60001, operationTimeout=6 expired.
>>>
>>> 2016-04-30 13:11:47,783 DEBUG
>>> org.apache.hadoop.hbase.master.AssignmentManager: Force region state
>>> offline {d1c7f3f455f2529da82a2f713b5ee067 state=PENDING_OPEN,
>>> ts=1462021847743, server=rs-node6.example.com,60020,1458723856883}
>>>
>>>
>>> I have checked hbase source code, but don't find any specific timeout
>>> settings for "open region" rpc call I can use. So I guess the it's using
>>> the default "hbase.rpc.timeout", which defaults to 60secs. And since there
>>> are 20+ regions being assigned to node6 almost at the same moment, node6
>>> gets overloaded and can't finish opening all of them within one minute.
>>>
>>> So this looks like a hbase bug to me (regions never get online when the
>>> region server failed to handle the OpenRegionRequest before the rpc
>>> timeout), am I right?
>>>
>>>
>>> On Fri, May 20, 2016 at 12:42 PM, Ted Yu  wrote:
>>>
>>>> Looks like region d1c7f3f455f2529da82a2f713b5ee067 received CLOSE
>>>> request
>>>> when it was opening, leading to RegionAlreadyInTransitionException.
>>>>
>>>> Was there any clue in master log why the close request was sent ?
>>>>
>>>> Cheers
>>>>
>>>> On Wed, May 4, 2016 at 8:02 PM, Shuai Lin 
>>>> wrote:
>>>>
>>>> > Hi Ted,
>>>> >
>>>> > The hbase version is 1.0.0-cdh5.4.8, shipped with cloudera CDH 5.4.8.
>>>> The
>>>> > RS logs on node6 can be found here <
>>>> http://paste.openstack.org/raw/496174/
>>>> > >
>>>> >  .
>>>> >
>>>> > Thanks!
>>>> >
>>>> > Shuai
>>>> >
>>>> > On Thu, May 5, 2016 at 9:15 AM, Ted Yu  wrote:
>>>> >
>>>> > > Can you pastebin related server log w.r.t.
>>>> > d1c7f3f455f2529da82a2f713b5ee067
>>>> > > from rs-node6 ?
>>>> > >
>>>> > > Which release of hbase are you using ?
>>>> > >
>>>> > > Cheers
>>>> > >
>>>> > > On Wed, May 4, 2016 at 6:07 PM, Shuai Lin 
>>>> > wrote:
>>>> > >
>>>> > > > Hi list,
>>>> > > >
>>>> > > > Last weekend I got a region server crashed, but some regions
>>>> never got
>>>> > > > online again on other RSes. I've gone through the logs, and here
>>>> is the
>>>

Re: [ANNOUNCE] Mikhail Antonov joins the Apache HBase PMC

2016-05-26 Thread Ted Yu

Congratulations, Mikhail !

On Thu, May 26, 2016 at 11:30 AM, Andrew Purtell 
wrote:

> On behalf of the Apache HBase PMC I am pleased to announce that Mikhail
> Antonov has accepted our invitation to become a PMC member on the Apache
> HBase project. Mikhail has been an active contributor in many areas,
> including recently taking on the Release Manager role for the upcoming
> 1.3.x code line. Please join me in thanking Mikhail for his contributions
> to date and anticipation of many more contributions.
>
> Welcome to the PMC, Mikhail!
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: region stuck in failed close state

2016-05-26 Thread Ted Yu

Heng:
Can you pastebin the complete stack trace for the region server ?

Snippet from region server log may also provide more clue.

Thanks

On Wed, May 25, 2016 at 9:48 PM, Heng Chen  wrote:

> On master web UI, i could see region (c371fb20c372b8edbf54735409ab5c4a)
> always in failed close state,  So balancer could not run.
>
>
> i check the region on RS,  and found logs about this region
>
> 2016-05-26 12:42:10,490 INFO  [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: Waited 90447ms on a compaction to clean up
> 'too many store files'; waited long enough... proceeding with flush of
>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> 2016-05-26 12:42:20,043 INFO
>  [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
> regionserver.HRegionServer:
> dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
> requesting flush for region
>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> after a delay of 20753
> 2016-05-26 12:42:30,043 INFO
>  [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
> regionserver.HRegionServer:
> dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
> requesting flush for region
>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> after a delay of 7057
>
>
> relate jstack information like below:
>
> Thread 12403 (RS_CLOSE_REGION-dx-pipe-regionserver4-online:16020-2):
>   State: WAITING
>   Blocked count: 1
>   Waited count: 2
>   Waiting on
> org.apache.hadoop.hbase.regionserver.HRegion$WriteState@1390594c
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:502)
>
> org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512)
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371)
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336)
>
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
>
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
>
>
> Our HBase cluster version is 1.1.1,   i try to compact this region,
> compact stuck in progress 89.58%
>
>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> 85860221 85860221
> 89.58%
>

Re: Some regions never get online after a region server crashes

2016-05-25 Thread Ted Yu

In AssignmentManager#assign(), you should find:

  // Send OPEN RPC. If it fails on a IOE or RemoteException,
  // regions will be assigned individually.
  long maxWaitTime = System.currentTimeMillis() +
this.server.getConfiguration().
  getLong("hbase.regionserver.rpc.startup.waittime", 6);

BTW can you see what caused rs-node6 to not respond around 13:11:47 ?

Cheers

On Fri, May 20, 2016 at 6:20 AM, Shuai Lin  wrote:

> Because of the "opening regions" rpc call sent by master to the region
> server node6 got timed out after 1 minutes?
>
> *RPC call was sent:*
>
> 2016-04-30 13:10:47,702 INFO org.apache.hadoop.hbase.master.AssignmentManager:
> Assigning 22 region(s) tors-node6.example.com,60020,1458723856883
>
> *After 1 minute:*
>
> 2016-04-30 13:11:47,780 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to communicate
> with rs-node6.example.com,60020,1458723856883 in order to assign regions,
> java.io.IOException: Call to rs-node6.example.com/172.16.6.6:60020 failed
> on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
> id=4, waitTime=60001, operationTimeout=6 expired.
>
> 2016-04-30 13:11:47,783 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Force region state
> offline {d1c7f3f455f2529da82a2f713b5ee067 state=PENDING_OPEN,
> ts=1462021847743, server=rs-node6.example.com,60020,1458723856883}
>
>
> I have checked hbase source code, but don't find any specific timeout
> settings for "open region" rpc call I can use. So I guess the it's using
> the default "hbase.rpc.timeout", which defaults to 60secs. And since there
> are 20+ regions being assigned to node6 almost at the same moment, node6
> gets overloaded and can't finish opening all of them within one minute.
>
> So this looks like a hbase bug to me (regions never get online when the
> region server failed to handle the OpenRegionRequest before the rpc
> timeout), am I right?
>
>
> On Fri, May 20, 2016 at 12:42 PM, Ted Yu  wrote:
>
>> Looks like region d1c7f3f455f2529da82a2f713b5ee067 received CLOSE request
>> when it was opening, leading to RegionAlreadyInTransitionException.
>>
>> Was there any clue in master log why the close request was sent ?
>>
>> Cheers
>>
>> On Wed, May 4, 2016 at 8:02 PM, Shuai Lin  wrote:
>>
>> > Hi Ted,
>> >
>> > The hbase version is 1.0.0-cdh5.4.8, shipped with cloudera CDH 5.4.8.
>> The
>> > RS logs on node6 can be found here <
>> http://paste.openstack.org/raw/496174/
>> > >
>> >  .
>> >
>> > Thanks!
>> >
>> > Shuai
>> >
>> > On Thu, May 5, 2016 at 9:15 AM, Ted Yu  wrote:
>> >
>> > > Can you pastebin related server log w.r.t.
>> > d1c7f3f455f2529da82a2f713b5ee067
>> > > from rs-node6 ?
>> > >
>> > > Which release of hbase are you using ?
>> > >
>> > > Cheers
>> > >
>> > > On Wed, May 4, 2016 at 6:07 PM, Shuai Lin 
>> > wrote:
>> > >
>> > > > Hi list,
>> > > >
>> > > > Last weekend I got a region server crashed, but some regions never
>> got
>> > > > online again on other RSes. I've gone through the logs, and here is
>> the
>> > > > timeline about some of the events:
>> > > >
>> > > > * 13:03:50 on of the region server, rs-node7, died because of a disk
>> > > > failure. Master started to split rs-node7's WALs
>> > > >
>> > > >
>> > > > 2016-04-30 13:03:50,953 INFO
>> > > > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
>> Splitting
>> > > > logs for rs-node7.example.com,60020,1458724695128 before
>> assignment;
>> > > > region
>> > > > count=133
>> > > > 2016-04-30 13:03:50,966 DEBUG
>> > > > org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of
>> > logs
>> > > to
>> > > > split
>> > > > 2016-04-30 13:03:50,966 INFO
>> > > > org.apache.hadoop.hbase.master.SplitLogManager: started splitting 33
>> > logs
>> > > > in [hdfs://nameservice1/hbase/WALs/rs-node7.example.com
>> > > > ,60020,1458724695128-splitting]
>> > > > for [rs-node7.example.com,60020,1458724695128]
>> > > >
>> > > > * 13:10:47 WAL splits done, master began to re-assign regions
>> > > >
>> > >

Re: File descriptor leak, possibly new in CDH5.7.0

2016-05-23 Thread Ted Yu

Have you taken a look at HBASE-9393 ?

On Mon, May 23, 2016 at 9:55 AM, Bryan Beaudreault  wrote:

> Hey everyone,
>
> We are noticing a file descriptor leak that is only affecting nodes in our
> cluster running 5.7.0, not those still running 5.3.8. I ran an lsof against
> an affected regionserver, and noticed that there were 10k+ unix sockets
> that are just called "socket", as well as another 10k+ of the form
> "/dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-_1_". The
> 2 seem related based on how closely the counts match.
>
> We are in the middle of a rolling upgrade from CDH5.3.8 to CDH5.7.0 (we
> handled the namenode upgrade separately).  The 5.3.8 nodes *do not*
> experience this issue. The 5.7.0 nodes *do. *We are holding off upgrading
> more regionservers until we can figure this out. I'm not sure if any
> intermediate versions between the 2 have the issue.
>
> We traced the root cause to a hadoop job running against a basic table:
>
> 'my-table-1', {TABLE_ATTRIBUTES => {MAX_FILESIZE => '107374182400',
> MEMSTORE_FLUSHSIZE => '67108864'}, {NAME => '0', VERSIONS => '50',
> BLOOMFILTER => 'NONE', COMPRESSION => 'LZO', METADATA =>
> {'COMPRESSION_COMPACT' => 'LZO', 'ENCODE_ON_DISK' => 'true'}}
>
> This is very similar to all of our other tables (we have many). However,
> it's regions are getting up there in size, 40+gb per region, compressed.
> This has not been an issue for us previously.
>
> The hadoop job is a simple TableMapper job with no special parameters,
> though we haven't updated our client yet to the latest (will do that once
> we finish the server side). The hadoop job runs on a separate hadoop
> cluster, remotely accessing the HBase cluster. It does not do any other
> reads or writes, outside of the TableMapper scans.
>
> Moving the regions off of an affected server, or killing the hadoop job,
> causes the file descriptors to gradually go back down to normal.
>
> Any ideas?
>
> Thanks,
>
> Bryan
>

Re: Some regions never get online after a region server crashes

2016-05-19 Thread Ted Yu

Looks like region d1c7f3f455f2529da82a2f713b5ee067 received CLOSE request
when it was opening, leading to RegionAlreadyInTransitionException.

Was there any clue in master log why the close request was sent ?

Cheers

On Wed, May 4, 2016 at 8:02 PM, Shuai Lin  wrote:

> Hi Ted,
>
> The hbase version is 1.0.0-cdh5.4.8, shipped with cloudera CDH 5.4.8. The
> RS logs on node6 can be found here <http://paste.openstack.org/raw/496174/
> >
>  .
>
> Thanks!
>
> Shuai
>
> On Thu, May 5, 2016 at 9:15 AM, Ted Yu  wrote:
>
> > Can you pastebin related server log w.r.t.
> d1c7f3f455f2529da82a2f713b5ee067
> > from rs-node6 ?
> >
> > Which release of hbase are you using ?
> >
> > Cheers
> >
> > On Wed, May 4, 2016 at 6:07 PM, Shuai Lin 
> wrote:
> >
> > > Hi list,
> > >
> > > Last weekend I got a region server crashed, but some regions never got
> > > online again on other RSes. I've gone through the logs, and here is the
> > > timeline about some of the events:
> > >
> > > * 13:03:50 on of the region server, rs-node7, died because of a disk
> > > failure. Master started to split rs-node7's WALs
> > >
> > >
> > > 2016-04-30 13:03:50,953 INFO
> > > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
> > > logs for rs-node7.example.com,60020,1458724695128 before assignment;
> > > region
> > > count=133
> > > 2016-04-30 13:03:50,966 DEBUG
> > > org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of
> logs
> > to
> > > split
> > > 2016-04-30 13:03:50,966 INFO
> > > org.apache.hadoop.hbase.master.SplitLogManager: started splitting 33
> logs
> > > in [hdfs://nameservice1/hbase/WALs/rs-node7.example.com
> > > ,60020,1458724695128-splitting]
> > > for [rs-node7.example.com,60020,1458724695128]
> > >
> > > * 13:10:47 WAL splits done, master began to re-assign regions
> > >
> > > 2016-04-30 13:10:47,655 INFO
> > > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
> Reassigning
> > > 133 region(s) that rs-node7.example.com,60020,1458724695128 was
> carrying
> > > (and 0 regions(s) that were opening on this server)
> > > 2016-04-30 13:10:47,665 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 133
> > > region(s) across 6 server(s), round-robin=true
> > > 2016-04-30 13:10:47,667 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22
> region(s)
> > to
> > > rs-node1.example.com,60020,1458720625688
> > > 2016-04-30 13:10:47,667 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22
> region(s)
> > to
> > > rs-node2.example.com,60020,1458721110988
> > > 2016-04-30 13:10:47,667 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22
> region(s)
> > to
> > > rs-node3.example.com,60020,1458721713906
> > > 2016-04-30 13:10:47,679 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning 23
> region(s)
> > to
> > > rs-node4.example.com,60020,1458722335527
> > > 2016-04-30 13:10:47,691 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22
> region(s)
> > to
> > > rs-node5.example.com,60020,1458722992296
> > > 2016-04-30 13:10:47,702 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22
> region(s)
> > to
> > > rs-node6.example.com,60020,1458723856883
> > >
> > > * 13:11:47 the opening regions rpc sent by master to region servers
> timed
> > > out after 1 minutes
> > >
> > > 2016-04-30 13:11:47,780 INFO
> > > org.apache.hadoop.hbase.master.AssignmentManager: Unable to communicate
> > > with rs-node3.example.com,60020,1458721713906 in order to assign
> regions
> > > java.io.IOException: Call to rs-node3.example.com/172.16.6.3:60020
> > failed
> > > on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
> > Call
> > > id=4, waitTime=60001, operationTimeout=6 expired.
> > > 2016-04-30 13:11:47,782 INFO
> > > org.apache.hadoop.hbase.master.GeneralBulkAssigner: Failed assigning 22
> > > regions to server rs-node6.example.com,60020,1458723856883,
> reassigning
> > > them
> > > 2016-04-30 13:11:47,782 INFO
> > > org.apache.hadoop.hbase.master.GeneralBulkAssigner: Failed assigning 23
> > > regions to server rs-node4.example.com,60020,145872

Re: Re: how to make a safe use of hbase

2016-05-17 Thread Ted Yu

Please read the following:

http://hbase.apache.org/book.html#security

On Tue, May 17, 2016 at 8:28 AM, WangYQ  wrote:

> I have 2 goals
> 1. protect 60010 web page
> 2. control the access to hbase,  such as read and write
> for example, when want to access hbase, must have the correct password
>
>
> thanks
>
>
> On 2016-05-17 22:04 , Ted Yu  Wrote:
>
> Is your goal to protect web page access ?
>
> Take a look at HBASE-5291.
>
> If I didn't understand your use case, please elaborate.
>
> Use user@hbase in the future.
>
> On Tue, May 17, 2016 at 4:02 AM, WangYQ  wrote:
>
>> in hbase, if we know zookeeper address, we can write and read hbase
>> if we know hmaster's address, we can see 60010 page
>>
>>
>> how can we make a safe use of hbase
>> such as, if we want to see 60010, to write/read hbase, we must have the
>> correct password, like linux
>
>
>
>
>

Re: how to make a safe use of hbase

2016-05-17 Thread Ted Yu

Is your goal to protect web page access ?

Take a look at HBASE-5291.

If I didn't understand your use case, please elaborate.

Use user@hbase in the future.

On Tue, May 17, 2016 at 4:02 AM, WangYQ  wrote:

> in hbase, if we know zookeeper address, we can write and read hbase
> if we know hmaster's address, we can see 60010 page
>
>
> how can we make a safe use of hbase
> such as, if we want to see 60010, to write/read hbase, we must have the
> correct password, like linux

Re: Splitting causes HBase to crash

2016-05-13 Thread Ted Yu

bq. 2016-05-13 11:56:52,763 WARN
org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs
in
[hdfs://ip-172-31-50-109.ec2.internal:8020/hbase/WALs/ip-
172-31-54-241.ec2.internal,60020,1463123941413-splitting]
installed = 1 but only 0 done

Looks like WAL splitting was slow or stalled.
Please check region server log to see why.

Cheers

On Fri, May 13, 2016 at 8:45 AM, Gunnar Tapper 
wrote:

> Some more info.
>
> I remove /hbase using hbase zkcli rmr /hbaase. The log messages I provided
> occurred after that. This is a HA configuration with two HMasters.
>
> After sitting in an initializing state for a long time, I end up with:
>
> hbase(main):001:0> list
> TABLE
>
>
> ERROR: Can't get master address from ZooKeeper; znode data == null
>
> Here is some help for this command:
> List all tables in hbase. Optional regular expression parameter could
> be used to filter the output. Examples:
>
>   hbase> list
>   hbase> list 'abc.*'
>   hbase> list 'ns:abc.*'
>   hbase> list 'ns:.*'
>
>
> HMaster log node 1:
>
> 2016-05-13 11:56:36,646 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned
> = 0
>
> tasks={/hbase/splitWAL/WALs%2Fip-172-31-54-241.ec2.internal%2C60020%2C1463123941413-splitting%2Fip-172-31-54-241.ec2.internal%252C60020%252C1463123941413.null0.1463123949331=last_update
> = 1463140497694 last_version = 11 cur_worker_name =
> ip-172-31-54-241.ec2.internal,60020,1463139946671 status = in_progress
> incarnation = 1 resubmits = 1 batch = installed = 1 done = 0 error = 0,
>
> /hbase/splitWAL/WALs%2Fip-172-31-61-36.ec2.internal%2C60020%2C1463123940830-splitting%2Fip-172-31-61-36.ec2.internal%252C60020%252C1463123940830.null0.1463123949164=last_update
> = 1463140498292 last_version = 9 cur_worker_name =
> ip-172-31-54-241.ec2.internal,60020,1463139946671 status = in_progress
> incarnation = 1 resubmits = 0 batch = installed = 1 done = 0 error = 0,
>
> /hbase/splitWAL/WALs%2Fip-172-31-53-252.ec2.internal%2C60020%2C1463123940875-splitting%2Fip-172-31-53-252.ec2.internal%252C60020%252C1463123940875.null0.1463123949155=last_update
> = 1463140498292 last_version = 8 cur_worker_name =
> ip-172-31-53-252.ec2.internal,60020,1463139946203 status = in_progress
> incarnation = 1 resubmits = 0 batch = installed = 1 done = 0 error = 0,
>
> /hbase/splitWAL/WALs%2Fip-172-31-50-109.ec2.internal%2C60020%2C1463123941361-splitting%2Fip-172-31-50-109.ec2.internal%252C60020%252C1463123941361.null0.1463123949342=last_update
> = 1463140497663 last_version = 8 cur_worker_name =
> ip-172-31-50-109.ec2.internal,60020,1463139946412 status = in_progress
> incarnation = 1 resubmits = 1 batch = installed = 1 done = 0 error = 0}
> 2016-05-13 11:56:41,647 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned
> = 0
>
> tasks={/hbase/splitWAL/WALs%2Fip-172-31-54-241.ec2.internal%2C60020%2C1463123941413-splitting%2Fip-172-31-54-241.ec2.internal%252C60020%252C1463123941413.null0.1463123949331=last_update
> = 1463140497694 last_version = 11 cur_worker_name =
> ip-172-31-54-241.ec2.internal,60020,1463139946671 status = in_progress
> incarnation = 1 resubmits = 1 batch = installed = 1 done = 0 error = 0,
>
> /hbase/splitWAL/WALs%2Fip-172-31-61-36.ec2.internal%2C60020%2C1463123940830-splitting%2Fip-172-31-61-36.ec2.internal%252C60020%252C1463123940830.null0.1463123949164=last_update
> = 1463140498292 last_version = 9 cur_worker_name =
> ip-172-31-54-241.ec2.internal,60020,1463139946671 status = in_progress
> incarnation = 1 resubmits = 0 batch = installed = 1 done = 0 error = 0,
>
> /hbase/splitWAL/WALs%2Fip-172-31-53-252.ec2.internal%2C60020%2C1463123940875-splitting%2Fip-172-31-53-252.ec2.internal%252C60020%252C1463123940875.null0.1463123949155=last_update
> = 1463140498292 last_version = 8 cur_worker_name =
> ip-172-31-53-252.ec2.internal,60020,1463139946203 status = in_progress
> incarnation = 1 resubmits = 0 batch = installed = 1 done = 0 error = 0,
>
> /hbase/splitWAL/WALs%2Fip-172-31-50-109.ec2.internal%2C60020%2C1463123941361-splitting%2Fip-172-31-50-109.ec2.internal%252C60020%252C1463123941361.null0.1463123949342=last_update
> = 1463140497663 last_version = 8 cur_worker_name =
> ip-172-31-50-109.ec2.internal,60020,1463139946412 status = in_progress
> incarnation = 1 resubmits = 1 batch = installed = 1 done = 0 error = 0}
> 2016-05-13 11:56:47,647 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned
> = 0
>
> tasks={/hbase/splitWAL/WALs%2Fip-172-31-54-241.ec2.internal%2C60020%2C1463123941413-splitting%2Fip-172-31-54-241.ec2.internal%252C60020%252C1463123941413.null0.1463123949331=last_update
> = 1463140497694 last_version = 11 cur_worker_name =
> ip-172-31-54-241.ec2.internal,60020,1463139946671 status = in_progress
> incarnation = 1 resubmits = 1 batch = installed = 1 done = 0 error = 0,
>
> /hbase/splitWAL/WALs%2Fip-172-31-61-36.ec2.internal%2C60020%2C1463123940830-splitting%2Fip-172-31-61-36.ec2.internal%252C60020%252C1463123940830.null

Re: Splitting causes HBase to crash

2016-05-13 Thread Ted Yu

bq. Unable to list children of znode /hbase/region-in-transition

Looks like there might be some problem with zookeeper quorum.

Can you check zookeeper server logs ?

Cheers

On Fri, May 13, 2016 at 12:17 AM, Gunnar Tapper 
wrote:

> Hi,
>
> I'm doing some development testing with Apache Trafodion running
> HBase Version 1.0.0-cdh5.4.5.
>
> All of a sudden, HBase has started to crash. First, it could not be
> recovered until I changed hbase_master_distributed_log_splitting to false.
> At that point, HBase restarted and sat happily idling for 1 hour. Then, I
> started Trafodion letting it sit idling for 1 hour.
>
> I then started a workload and all RegionServers came crashing down. Looking
> at the log files, I suspected ZooKeeper issues so I restarted ZooKeeper and
> then HBase. Now, the HMaster fails with:
>
> 2016-05-13 07:13:52,521 INFO org.apache.hadoop.hbase.master.RegionStates:
> Transition {a33adb83f77095913adb4701b01c09a0 state=PENDING_OPEN,
> ts=146312157, server=ip-172-31-50-109.ec2.internal,60020,1463122925684}
> to {a33adb83f77095913adb4701b01c09a0 state=OPENING, ts=1463123632517,
> server=ip-172-31-50-109.ec2.internal,60020,1463122925684}
> 2016-05-13 07:13:52,527 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:6-0x354a8eaea3e007d,
>
> quorum=ip-172-31-53-252.ec2.internal:2181,ip-172-31-54-241.ec2.internal:2181,ip-172-31-61-36.ec2.internal:2181,
> baseZNode=/hbase Unable to list children of znode
> /hbase/region-in-transition
> java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1466)
> at
>
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:296)
> at
>
> org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:518)
> at
>
> org.apache.hadoop.hbase.master.AssignmentManager$5.run(AssignmentManager.java:1420)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2016-05-13 07:13:52,527 INFO
> org.apache.hadoop.hbase.procedure.flush.MasterFlushTableProcedureManager:
> stop: server shutting down.
> 2016-05-13 07:13:52,527 INFO org.apache.hadoop.hbase.ipc.RpcServer:
> Stopping server on 6
> 2016-05-13 07:13:52,527 INFO org.apache.hadoop.hbase.ipc.RpcServer:
> RpcServer.listener,port=6: stopping
> 2016-05-13 07:13:52,528 INFO org.apache.hadoop.hbase.ipc.RpcServer:
> RpcServer.responder: stopped
> 2016-05-13 07:13:52,528 INFO org.apache.hadoop.hbase.ipc.RpcServer:
> RpcServer.responder: stopping
> 2016-05-13 07:13:52,532 ERROR org.apache.zookeeper.ClientCnxn: Error while
> calling watcher
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.FutureTask@33d4a2bd rejected from
> java.util.concurrent.ThreadPoolExecutor@4d0840e0[Terminated, pool size =
> 0,
> active threads = 0, queued tasks = 0, completed tasks = 38681]
> at
>
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> at
>
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
> at
>
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
> at
>
> org.apache.hadoop.hbase.master.AssignmentManager.zkEventWorkersSubmit(AssignmentManager.java:1285)
> at
>
> org.apache.hadoop.hbase.master.AssignmentManager.handleAssignmentEvent(AssignmentManager.java:1479)
> at
>
> org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:1244)
> at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:458)
> at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2016-05-13 07:13:52,533 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node
> /hbase/rs/ip-172-31-50-109.ec2.internal,6,1463122925543 already
> deleted, retry=false
> 2016-05-13 07:13:52,534 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x354a8eaea3e007d closed
> 2016-05-13 07:13:52,534 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
> ip-172-31-50-109.ec2.internal,6,1463122925543; zookeeper connection
> closed.
> 2016-05-13 07:13:52,534 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> master/ip-172-31-50-109.ec2.internal/172.31.50.109:6 exiting
> 2016-05-13 07:13:52,534 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>
> Suggestions on how to move fo

Re: Hbase scaning for couple Terabytes data

2016-05-11 Thread Ted Yu

TableInputFormatBase is abstract.

Most likely you would use TableInputFormat for the scan.

See javadoc of getSplits():

   * Calculates the splits that will serve as input for the map tasks. The

   * number of splits matches the number of regions in a table.


FYI

On Wed, May 11, 2016 at 6:05 PM, Yi Jiang  wrote:

> Hi, Guys
> Recently we are debating the usage for hbase as our destination for data
> pipeline job.
> Basically, we want to save our logs into hbase, and our pipeline can
> generate 2-4 terabytes data everyday, but our IT department think it is not
> good idea to scan so hbase, it will cause the performance and memory issue.
> And they ask our just keep 15 minutes data amount in the hbase for real
> time analysis.
> For now, I am using hive to external to hbase, but what I am thinking that
> for map reduce job, what kind of mapper it is using to scan the data from
> hbase? Is it TableInputFormatBase? and how many mapper it will use in hive
> to scan the hbase. Is it efficient or not? Will it cause the performance
> issue if we have couple T's or more larger data amount?
> I am also trying to index some columns that we might use to query. But  I
> am not sure if it is good idea to keep so much history data in the hbase
> for query.
> Thank you
> Jacky
>
>

Re: Unable to connect to Hbase when Kerberos Enabled

2016-05-10 Thread Ted Yu

Looks like there were pictures in the second email which didn't go through.

Please paste text.

Cheers

On Tue, May 10, 2016 at 12:13 AM, horaamit  wrote:

> After making few changes to my code
>
>
>
> I am getting exception ,please find below stack trace
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Unable-to-connect-to-Hbase-when-Kerberos-Enabled-tp4079897p4079901.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: issue in Hbase Master

2016-05-09 Thread Ted Yu

HMaster is in hbase-server-xx.jar

Was it on the classpath ?

Please consider pastebin'ning master log if you need further help.

Cheers

On Mon, May 9, 2016 at 2:25 AM, Raghuveera Ramamoorthi 
wrote:

> Dear team,
>
> While starting Hbase master from newly installed HDP 2.4 we are getting"
> Error: Could not find or load main class org.hadoop.hbase.master.HMaster".
>
> Please advice on this.
>
> Thanks in advance!
>

Re: hbase doubts

2016-05-06 Thread Ted Yu

For #2, see the following in CompactSplitThread - there is a config
parameter for merge threads as well:

  // Configuration key for the large compaction threads.

  public final static String LARGE_COMPACTION_THREADS =

  "hbase.regionserver.thread.compaction.large";

  public final static int LARGE_COMPACTION_THREADS_DEFAULT = 1;



  // Configuration key for the small compaction threads.

  public final static String SMALL_COMPACTION_THREADS =

  "hbase.regionserver.thread.compaction.small";

  public final static int SMALL_COMPACTION_THREADS_DEFAULT = 1;



  // Configuration key for split threads

  public final static String SPLIT_THREADS =
"hbase.regionserver.thread.split";

  public final static int SPLIT_THREADS_DEFAULT = 1;

On Thu, May 5, 2016 at 6:55 PM, Ted Yu  wrote:

> For #3, we already have the following in 1.1 release:
>
> HBASE-10201 Port 'Make flush decisions per column family' to trunk
>
> On Thu, May 5, 2016 at 6:36 PM, Shushant Arora 
> wrote:
>
>> 1.Why is it better to have single file per region than multiple files for
>> read performance. Why can't multile threads read multiple file and give
>> better performance?
>>
>> 2Does hbase regionserver has single thread for compactions and split for
>> all regions its holding? Why can't single thread per regions will work
>> better than sequential compactions/split for all regions in a
>> regionserver.
>>
>> 3.Why hbase flush and compact all memstores of all the families of a table
>> at same time irrespective of their size when even one memstore reaches
>> threshold.
>>
>> Thanks
>> Shushant
>>
>
>

Re: Simple user/pass authentication

2016-05-06 Thread Ted Yu

Please take a look at:

http://hbase.apache.org/book.html#_server_side_configuration_for_simple_user_access_operation

On Fri, May 6, 2016 at 10:51 AM, Mohit Anchlia 
wrote:

> Is there a way to implement a simple user/pass authentication in HBase
> instead of using a Kerberos? Are the coprocessor the right way of
> implementing such authentication?
>

Re: hbase doubts

2016-05-05 Thread Ted Yu

For #3, we already have the following in 1.1 release:

HBASE-10201 Port 'Make flush decisions per column family' to trunk

On Thu, May 5, 2016 at 6:36 PM, Shushant Arora 
wrote:

> 1.Why is it better to have single file per region than multiple files for
> read performance. Why can't multile threads read multiple file and give
> better performance?
>
> 2Does hbase regionserver has single thread for compactions and split for
> all regions its holding? Why can't single thread per regions will work
> better than sequential compactions/split for all regions in a regionserver.
>
> 3.Why hbase flush and compact all memstores of all the families of a table
> at same time irrespective of their size when even one memstore reaches
> threshold.
>
> Thanks
> Shushant
>

Re: Export HBase snapshot to S3 creates empty root directory (prefix)

2016-05-04 Thread Ted Yu

Lex:
Please also see this thread about s3n versus s3a:

http://search-hadoop.com/m/uOzYtE1Fy22eEWfe1&subj=Re+S3+Hadoop+FileSystems

On Wed, May 4, 2016 at 9:01 PM, Matteo Bertozzi 
wrote:

> never seen that problem before, but a couple of suggestions you can try.
>
> Instead of the old s3 driver, you can use s3n or s3a if you have it
> available (those are the ones I tested)
> and instead of using hbase.root dir use -copy-from
>
> ExportSnapshot -snapshot SNAPSHOT_NAME -copy-to s3a://BUCKET/NAMESPACE
> ExportSnapshot -snapshot SNAPSHOT_NAME -copy-from s3a://BUCKET/NAMESPACE
> -copy-to hdfs://HOST/hbase
>
> you can take a look at some in-progress doc about s3 and snapshot here:
> https://issues.apache.org/jira/browse/HBASE-15646
>
> Matteo
>
>
> On Wed, May 4, 2016 at 8:53 PM, Lex Toumbourou  wrote:
>
> > Hi all,
> >
> > I'm having a couple of problems with exporting HBase snapshots to S3. I
> am
> > running HBase version 1.2.0.
> >
> > I have a table called "domain"
> >
> > And I have created a snapshot for it:
> >
> > hbase(main):003:0> snapshot 'domain', 'domain-aws-test'
> > 0 row(s) in 0.3310 seconds
> >
> > ---
> >
> > I am attempting to export it to S3 using the following command:
> >
> > hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> > "domain-aws-test" -copy-to s3://my-hbase-snapshots/domain-aws-test
> >
> > Now, when I view the snapshot metadata in the S3 bucket, there's nothing
> > there:
> >
> > > aws s3 ls my-hbase-snapshots/domain-aws-snapshots
> >
> > But there is data under:
> >
> > > aws s3 ls my-hbase-snapshots/\/domain-aws-test/
> >PRE .hbase-snapshot/
> > 2016-05-05 13:38:12  1 .hbase-snapshot
> >
> > It seems what's happening is, HBase is creating a directory/prefix with
> no
> > name and placing the snapshot data under there.
> >
> > That wouldn't be a problem, except that when I try to import the snapshot
> > on my destination cluster, it seems unable to deal with the empty
> > directory:
> >
> > sudo -u hbase hbase snapshot export -D
> > hbase.rootdir=s3://my-hbase-snapshots/\/domain-aws-test -snapshot
> > my-aws-test -copy-to hdfs://hbaseClusterDNSName:8020/user/hbase -mappers
> 2
> >
> > Caused by: java.io.FileNotFoundException: No such file or directory
> > 's3://my-hbase-snapshots/domain-aws-test/.hbase-snapshot/domain-aws-test'
> >
> > ---
> >
> > Any one come across this before?
> >
> > Lex
> >
>

Re: Some regions never get online after a region server crashes

2016-05-04 Thread Ted Yu

Can you pastebin related server log w.r.t. d1c7f3f455f2529da82a2f713b5ee067
from rs-node6 ?

Which release of hbase are you using ?

Cheers

On Wed, May 4, 2016 at 6:07 PM, Shuai Lin  wrote:

> Hi list,
>
> Last weekend I got a region server crashed, but some regions never got
> online again on other RSes. I've gone through the logs, and here is the
> timeline about some of the events:
>
> * 13:03:50 on of the region server, rs-node7, died because of a disk
> failure. Master started to split rs-node7's WALs
>
>
> 2016-04-30 13:03:50,953 INFO
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
> logs for rs-node7.example.com,60020,1458724695128 before assignment;
> region
> count=133
> 2016-04-30 13:03:50,966 DEBUG
> org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to
> split
> 2016-04-30 13:03:50,966 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: started splitting 33 logs
> in [hdfs://nameservice1/hbase/WALs/rs-node7.example.com
> ,60020,1458724695128-splitting]
> for [rs-node7.example.com,60020,1458724695128]
>
> * 13:10:47 WAL splits done, master began to re-assign regions
>
> 2016-04-30 13:10:47,655 INFO
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning
> 133 region(s) that rs-node7.example.com,60020,1458724695128 was carrying
> (and 0 regions(s) that were opening on this server)
> 2016-04-30 13:10:47,665 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 133
> region(s) across 6 server(s), round-robin=true
> 2016-04-30 13:10:47,667 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22 region(s) to
> rs-node1.example.com,60020,1458720625688
> 2016-04-30 13:10:47,667 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22 region(s) to
> rs-node2.example.com,60020,1458721110988
> 2016-04-30 13:10:47,667 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22 region(s) to
> rs-node3.example.com,60020,1458721713906
> 2016-04-30 13:10:47,679 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning 23 region(s) to
> rs-node4.example.com,60020,1458722335527
> 2016-04-30 13:10:47,691 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22 region(s) to
> rs-node5.example.com,60020,1458722992296
> 2016-04-30 13:10:47,702 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning 22 region(s) to
> rs-node6.example.com,60020,1458723856883
>
> * 13:11:47 the opening regions rpc sent by master to region servers timed
> out after 1 minutes
>
> 2016-04-30 13:11:47,780 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to communicate
> with rs-node3.example.com,60020,1458721713906 in order to assign regions
> java.io.IOException: Call to rs-node3.example.com/172.16.6.3:60020 failed
> on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call
> id=4, waitTime=60001, operationTimeout=6 expired.
> 2016-04-30 13:11:47,782 INFO
> org.apache.hadoop.hbase.master.GeneralBulkAssigner: Failed assigning 22
> regions to server rs-node6.example.com,60020,1458723856883, reassigning
> them
> 2016-04-30 13:11:47,782 INFO
> org.apache.hadoop.hbase.master.GeneralBulkAssigner: Failed assigning 23
> regions to server rs-node4.example.com,60020,1458722335527, reassigning
> them
> 2016-04-30 13:11:47,782 INFO
> org.apache.hadoop.hbase.master.GeneralBulkAssigner: Failed assigning 22
> regions to server rs-node3.example.com,60020,1458721713906, reassigning
> them
> 2016-04-30 13:11:47,783 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Force region state
> offline {a65660e421f114e93862194f7cc35644 state=OPENING, ts=1462021907753,
> server=rs-node6.example.com,60020,1458723856883}
>
>
> * After that, part of the regions (40 out of 130 regions) never got online,
> and the following lines were logged repeatly in master log:
>
> 2016-04-30 13:12:37,188 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: update
> {d1c7f3f455f2529da82a2f713b5ee067 state=PENDING_OPEN, ts=1462021957087,
> server=rs-node6.example.com,60020,1458723856883} the timestamp.
> 2016-04-30 13:12:37,188 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Region is already in
> transition; waiting up to 10668ms
>
> $ grep 'AssignmentManager: update {d1c7f3f455f2529da82a2f713b5ee067'
> /var/log/hbase/hbase-cmf-hbase-MASTER-head.example.com.log.out.1|wc -l
> 484
>
>
> I've searched in mailing list archive and hbase JIRA, but didn't find any
> similar situations like this one. The most similar one is HBASE-14407
>  , but after reading
> its
> discussion I don't think that's the same problem.
>
> Anyone have a clue? Thanks!
>
> Regards,
> Shuai
>

Re: Does Scan API guarantee key order?

2016-05-04 Thread Ted Yu

Yes, key order is guaranteed.

On Wed, May 4, 2016 at 3:20 PM, Dave Birdsall 
wrote:

> Hi,
>
>
>
> Suppose I have an HBase table with many regions, and possibly many rows in
> the memstore from recent additions.
>
>
>
> Suppose I have a program that opens a Scan on the table, from start to
> finish. Full table scan.
>
>
>
> Does HBase guarantee that rows are returned in key order? Or might it jump
> around, say, read one region first, then maybe another (and not necessarily
> in region order)?
>
>
>
> Thanks,
>
>
>
> Dave
>

Re: How can i get hbase table memory used? Why hdfs size of hbase table double when i use bulkload?

2016-05-02 Thread Ted Yu

For #1, consider increasing hfile.block.cache.size (assuming majority of
your reads are not point gets).

FYI

On Mon, May 2, 2016 at 6:41 PM, Jone Zhang  wrote:

> For #1
> My workload is read heavy.
> I use bulkload to  write data once a day.
>
> Thanks.
>
> 2016-04-30 1:13 GMT+08:00 Ted Yu :
>
> > For #1, can you clarify whether your workload is read heavy, write heavy
> or
> > mixed load of read and write ?
> >
> > For #2, have you run major compaction after the second bulk load ?
> >
> > On Thu, Apr 28, 2016 at 9:16 PM, Jone Zhang 
> > wrote:
> >
> > > *1、How can i get hbase table memory used?*
> > > *2、Why hdfs size of hbase table  double  when i use bulkload*
> > >
> > > bulkload file to qimei_info
> > >
> > > 101.7 G  /user/hbase/data/default/qimei_info
> > >
> > > bulkload same file to qimei_info agagin
> > >
> > > 203.3 G  /user/hbase/data/default/qimei_info
> > >
> > > hbase(main):001:0> describe 'qimei_info'
> > > DESCRIPTION
> > >
> > >
> > >  'qimei_info', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
> > =>
> > > 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '
> > >
> > >  1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => '2147483647',
> > > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
> > >
> > >   IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> > >
> > >
> > > 1 row(s) in 1.4170 seconds
> > >
> > >
> > > *Besh wishes.*
> > > *Thanks.*
> > >
> >
>

Re: hbase architecture doubts

2016-05-01 Thread Ted Yu

For #1, in branch-1, please take a look at DefaultMemStore.java where you
would see:

  // MemStore.  Use a CellSkipListSet rather than SkipListSet because of the

  // better semantics.  The Map will overwrite if passed a key it already
had

  // whereas the Set will not add new Cell if key is same though value
might be

  // different.  Value is not important -- just make sure always same

  // reference passed.

  volatile CellSkipListSet cellSet;

  // Snapshot of memstore.  Made for flusher.

  volatile CellSkipListSet snapshot;

For #2, MemStoreFlusher#flushRegion() -> HRegion#flush() ->
HRegion#internalFlushCacheAndCommit()

Please take a look at HStore#commit().

On Sun, May 1, 2016 at 3:36 AM, Shushant Arora 
wrote:

> 1.Does Hbase uses ConcurrentskipListMap(CSLM) to store data in memstore?
>
> 2.When mwmstore is flushed to HDFS- does it dump the memstore
> Concurrentskiplist as Hfile2? Then How does it calculates blocks out of
> CSLM and dmp them in HDFS.
>
> 3.After dumping the inmemory CSLM of memstore to HFILe does memstore
> content is discarded and if while dumping memstore any read request comes
> will it be responded by copy of memstore or discard of memstore will be
> blocked until read request is completed?
>
> 4.When a read request comes does it look in inmemory CSLM and then in
> HFile? And what is LogStructuredMerge tree and its usage in Hbase.
>
> Thanks!
>

Re: Major Compaction Strategy

2016-04-29 Thread Ted Yu

Interesting.

When compiling against hbase 1.1.2, I got:

http://pastebin.com/NfUjva9R

FYI

On Fri, Apr 29, 2016 at 3:29 PM, Frank Luo  wrote:

> Saad,
>
> Will all your tables/regions be used 24/7, or at any time, just a part of
> regions used and others are running ideal?
>
> If latter, I developed a tool to launch major-compact in a "smart" way,
> because I am facing a similar issue.
> https://github.com/jinyeluo/smarthbasecompactor.
>
> It looks at every RegionServer, and find non-hot regions with most store
> files and starts compacting. It just continue going until time is up. Just
> to be clear, it doesn't perform MC itself, which is a scary thing to do,
> but tell region servers to do MC.
>
> We have it running in our cluster for about 10 hours a day and it has
> virtually no impact to applications and the cluster is doing far better
> than when using default scheduled MC.
>
>
> -Original Message-
> From: Saad Mufti [mailto:saad.mu...@gmail.com]
> Sent: Friday, April 29, 2016 1:51 PM
> To: user@hbase.apache.org
> Subject: Re: Major Compaction Strategy
>
> We have more issues now, after testing this in dev, in our production
> cluster which has tons of data (60 regions servers and around 7000
> regions), we tried to do rolling compaction and most regions that were
> around 6-7 GB n size were taking 4-5 minutes to finish. Based on this we
> estimated it would take something like 20 days for a single run to finish,
> which doesn't seem reasonable.
>
> So is it more reasonable to aim for doing major compaction across all
> region servers at once but within a RS one region at a time? That would cut
> it down to around 8 hours which is still very long. Or is it better to
> compact all regions on one region server, then move to the next?
>
> The goal of all this is to maintain decent write performance while still
> doing compaction. We don't have a good very low load period for our cluster
> so trying to find a way to do this without cluster downtime.
>
> Thanks.
>
> 
> Saad
>
>
> On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti  wrote:
>
> > Thanks for the pointer. Working like a charm.
> >
> > 
> > Saad
> >
> >
> > On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu  wrote:
> >
> >> Please use the following method of HBaseAdmin:
> >>
> >>   public CompactionState getCompactionStateForRegion(final byte[]
> >> regionName)
> >>
> >> Cheers
> >>
> >> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > We have a large HBase 1.x cluster in AWS and have disabled
> >> > automatic
> >> major
> >> > compaction as advised. We were running our own code for compaction
> >> > daily around midnight which calls
> >> > HBaseAdmin.majorCompactRegion(byte[]
> >> > regionName) in a rolling fashion across all regions.
> >> >
> >> > But we missed the fact that this is an asynchronous operation, so
> >> > in practice this causes major compaction to run across all regions,
> >> > at
> >> least
> >> > those not already major compacted (for example because previous
> >> > minor compactions got upgraded to major ones).
> >> >
> >> > We don't really have a suitable low load period, so what is a
> >> > suitable
> >> way
> >> > to make major compaction run in a rolling fashion region by region?
> >> > The
> >> API
> >> > above provides no return value for us to be able to wait for one
> >> compaction
> >> > to finish before moving to the next.
> >> >
> >> > Thanks.
> >> >
> >> > 
> >> > Saad
> >> >
> >>
> >
> >
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

Re: How can i get hbase table memory used? Why hdfs size of hbase table double when i use bulkload?

2016-04-29 Thread Ted Yu

For #1, can you clarify whether your workload is read heavy, write heavy or
mixed load of read and write ?

For #2, have you run major compaction after the second bulk load ?

On Thu, Apr 28, 2016 at 9:16 PM, Jone Zhang  wrote:

> *1、How can i get hbase table memory used?*
> *2、Why hdfs size of hbase table  double  when i use bulkload*
>
> bulkload file to qimei_info
>
> 101.7 G  /user/hbase/data/default/qimei_info
>
> bulkload same file to qimei_info agagin
>
> 203.3 G  /user/hbase/data/default/qimei_info
>
> hbase(main):001:0> describe 'qimei_info'
> DESCRIPTION
>
>
>  'qimei_info', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
> 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '
>
>  1', COMPRESSION => 'LZO', MIN_VERSIONS => '0', TTL => '2147483647',
> KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
>
>   IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>
>
> 1 row(s) in 1.4170 seconds
>
>
> *Besh wishes.*
> *Thanks.*
>

Re: load Sharing across the regions

2016-04-29 Thread Ted Yu

Presplitting table allows load to spread across region servers.

Can you be more specific about what you observed ?

Which workload(s) did you run ?

Thanks

On Thu, Apr 28, 2016 at 10:03 PM, prabhu Mahendran 
wrote:

> i have created the cluster with 3 region server and i used YCSB to test the
> performance of the cluster.
>
> https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
>
> Created the table as mentioned in the link with equal splits(pre split) in
> the regions , But i noted that Load isn't equally shared among the regions.
>
> Please guide me to achieve equal load across Regionserver.
>
> Best,
> Mahe
>

Re: No server address listed in hbase:meta for region SYSTEM.CATALOG

2016-04-28 Thread Ted Yu

Here is sample scan output from a working cluster

 hbase:namespace,,146075636.acc7841bcbaca column=info:regioninfo,
timestamp=1460756360969, value={ENCODED =>
acc7841bcbacafacf336e48bb14794de, NAME => 'hbase:namespace,,14607
 facf336e48bb14794de.
5636.acc7841bcbacafacf336e48bb14794de.', STARTKEY => '', ENDKEY => ''}
 hbase:namespace,,146075636.acc7841bcbaca column=info:seqnumDuringOpen,
timestamp=1461104489067, value=\x00\x00\x00\x00\x00\x00\x00\x0C
 facf336e48bb14794de.
 hbase:namespace,,146075636.acc7841bcbaca column=info:server,
timestamp=1461104489067, value=x.com:16020
 facf336e48bb14794de.
 hbase:namespace,,146075636.acc7841bcbaca column=info:serverstartcode,
timestamp=1461104489067, value=1461104478409

Note the info:server column.

How did you deploy hbase and Phoenix.

Mind pastebinning master log and meta server log ?

Cheers

On Thu, Apr 28, 2016 at 1:33 AM, Manisha Sethi  wrote:

> Hi
>
> I have my hbase:meta table entries as :
> SYSTEM.CATALOG,,1461831992343.a6daf63bd column=info:regioninfo,
> timestamp=1461831993549, value={ENCODED =>
> a6daf63bde1f1456ca4acee228b8f5fe, NAME => 'SYSTEM
> e1f1456ca4acee228b8f5fe.
> .CATALOG,,1461831992343.a6daf63bde1f1456ca4acee228b8f5fe.', STARTKEY => '',
> ENDKEY => ''}
> hbase:namespace,,1461821579649.7edd6a09 column=info:regioninfo,
> timestamp=1461821581196, value={ENCODED =>
> 7edd6a099dc3612b7dafa52f380ac3e6, NAME => 'hbase:
> 9dc3612b7dafa52f380ac3e6.
>  namespace,,1461821579649.7edd6a099dc3612b7dafa52f380ac3e6.', STARTKEY =>
> '', ENDKEY => ''}
> hbase:namespace,,1461821579649.7edd6a09 column=info:seqnumDuringOpen,
> timestamp=1461831928239, value=\x00\x00\x00\x00\x00\x00\x00\x1C
> 9dc3612b7dafa52f380ac3e6.
>
> While I do scan 'SYSTEM.CATALOG' I get exception:
>
> No server address listed in hbase:meta for region
> SYSTEM.CATALOG,.
>
> My aim is to connect to hbase throufh phoenix, but even hbase shell scan
> not working. I can see entries in meta table,, I tried flush and compact
> for meta also. But no progress...
>
> I am using Hbase 1.2
>
> Thanks
> Manisha Sethi
>
>
> 
>
>

Re: Slow sync cost

2016-04-27 Thread Ted Yu

There might be a typo:

bq. After the Evacuation phase, Eden and Survivor To are devoid of live
data and reclaimed.

>From the graph below it, it seems Survivor From is reclaimed, not Survivor
To.

FYI

On Wed, Apr 27, 2016 at 7:39 AM, Bryan Beaudreault  wrote:

> We have 6 production clusters and all of them are tuned differently, so I'm
> not sure there is a setting I could easily give you. It really depends on
> the usage.  One of our devs wrote a blog post on G1GC fundamentals
> recently. It's rather long, but could be worth a read:
>
> http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection
>
> We will also have a blog post coming out in the next week or so that talks
> specifically to tuning G1GC for HBase. I can update this thread when that's
> available.
>
> On Tue, Apr 26, 2016 at 8:08 PM Saad Mufti  wrote:
>
> > That is interesting. Would it be possible for you to share what GC
> settings
> > you ended up on that gave you the most predictable performance?
> >
> > Thanks.
> >
> > 
> > Saad
> >
> >
> > On Tue, Apr 26, 2016 at 11:56 AM, Bryan Beaudreault <
> > bbeaudrea...@hubspot.com> wrote:
> >
> > > We were seeing this for a while with our CDH5 HBase clusters too. We
> > > eventually correlated it very closely to GC pauses. Through heavily
> > tuning
> > > our GC we were able to drastically reduce the logs, by keeping most
> GC's
> > > under 100ms.
> > >
> > > On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti 
> wrote:
> > >
> > > > From what I can see in the source code, the default is actually even
> > > lower
> > > > at 100 ms (can be overridden with
> hbase.regionserver.hlog.slowsync.ms
> > ).
> > > >
> > > > 
> > > > Saad
> > > >
> > > >
> > > > On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowling <
> > kevin.bowl...@kev009.com
> > > >
> > > > wrote:
> > > >
> > > > > I see similar log spam while system has reasonable performance.
> Was
> > > the
> > > > > 250ms default chosen with SSDs and 10ge in mind or something?  I
> > guess
> > > > I'm
> > > > > surprised a sync write several times through JVMs to 2 remote
> > datanodes
> > > > > would be expected to consistently happen that fast.
> > > > >
> > > > > Regards,
> > > > >
> > > > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti  >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're
> > constantly
> > > > > seeing
> > > > > > the following messages in the region server logs:
> > > > > >
> > > > > > 2016-04-25 14:02:55,178 INFO
> > > > > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost:
> > 258
> > > > ms,
> > > > > > current pipeline:
> > > > > > [DatanodeInfoWithStorage[10.99.182.165:50010
> > > > > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK],
> > > > > > DatanodeInfoWithStorage[10.99.182.236:50010
> > > > > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK],
> > > > > > DatanodeInfoWithStorage[10.99.182.195:50010
> > > > > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]]
> > > > > >
> > > > > >
> > > > > > These happen regularly while HBase appear to be operating
> normally
> > > with
> > > > > > decent read and write performance. We do have occasional
> > performance
> > > > > > problems when regions are auto-splitting, and at first I thought
> > this
> > > > was
> > > > > > related but now I se it happens all the time.
> > > > > >
> > > > > >
> > > > > > Can someone explain what this means really and should we be
> > > concerned?
> > > > I
> > > > > > tracked down the source code that outputs it in
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> > > > > >
> > > > > > but after going through the code I think I'd need to know much
> more
> > > > about
> > > > > > the code to glean anything from it or the associated JIRA ticket
> > > > > > https://issues.apache.org/jira/browse/HBASE-11240.
> > > > > >
> > > > > > Also, what is this "pipeline" the ticket and code talks about?
> > > > > >
> > > > > > Thanks in advance for any information and/or clarification anyone
> > can
> > > > > > provide.
> > > > > >
> > > > > > 
> > > > > >
> > > > > > Saad
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Slow sync cost

2016-04-27 Thread Ted Yu

Bryan:
w.r.t. gc_log_visualizer, is there plan to open source it ?

bq. while backend throughput will be better/cheaper with ParallelGC.

Does the above mean that hbase servers are still using ParallelGC ?

Thanks

On Wed, Apr 27, 2016 at 7:39 AM, Bryan Beaudreault  wrote:

> We have 6 production clusters and all of them are tuned differently, so I'm
> not sure there is a setting I could easily give you. It really depends on
> the usage.  One of our devs wrote a blog post on G1GC fundamentals
> recently. It's rather long, but could be worth a read:
>
> http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection
>
> We will also have a blog post coming out in the next week or so that talks
> specifically to tuning G1GC for HBase. I can update this thread when that's
> available.
>
> On Tue, Apr 26, 2016 at 8:08 PM Saad Mufti  wrote:
>
> > That is interesting. Would it be possible for you to share what GC
> settings
> > you ended up on that gave you the most predictable performance?
> >
> > Thanks.
> >
> > 
> > Saad
> >
> >
> > On Tue, Apr 26, 2016 at 11:56 AM, Bryan Beaudreault <
> > bbeaudrea...@hubspot.com> wrote:
> >
> > > We were seeing this for a while with our CDH5 HBase clusters too. We
> > > eventually correlated it very closely to GC pauses. Through heavily
> > tuning
> > > our GC we were able to drastically reduce the logs, by keeping most
> GC's
> > > under 100ms.
> > >
> > > On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti 
> wrote:
> > >
> > > > From what I can see in the source code, the default is actually even
> > > lower
> > > > at 100 ms (can be overridden with
> hbase.regionserver.hlog.slowsync.ms
> > ).
> > > >
> > > > 
> > > > Saad
> > > >
> > > >
> > > > On Tue, Apr 26, 2016 at 3:13 AM, Kevin Bowling <
> > kevin.bowl...@kev009.com
> > > >
> > > > wrote:
> > > >
> > > > > I see similar log spam while system has reasonable performance.
> Was
> > > the
> > > > > 250ms default chosen with SSDs and 10ge in mind or something?  I
> > guess
> > > > I'm
> > > > > surprised a sync write several times through JVMs to 2 remote
> > datanodes
> > > > > would be expected to consistently happen that fast.
> > > > >
> > > > > Regards,
> > > > >
> > > > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti  >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're
> > constantly
> > > > > seeing
> > > > > > the following messages in the region server logs:
> > > > > >
> > > > > > 2016-04-25 14:02:55,178 INFO
> > > > > > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost:
> > 258
> > > > ms,
> > > > > > current pipeline:
> > > > > > [DatanodeInfoWithStorage[10.99.182.165:50010
> > > > > > ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK],
> > > > > > DatanodeInfoWithStorage[10.99.182.236:50010
> > > > > > ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK],
> > > > > > DatanodeInfoWithStorage[10.99.182.195:50010
> > > > > > ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]]
> > > > > >
> > > > > >
> > > > > > These happen regularly while HBase appear to be operating
> normally
> > > with
> > > > > > decent read and write performance. We do have occasional
> > performance
> > > > > > problems when regions are auto-splitting, and at first I thought
> > this
> > > > was
> > > > > > related but now I se it happens all the time.
> > > > > >
> > > > > >
> > > > > > Can someone explain what this means really and should we be
> > > concerned?
> > > > I
> > > > > > tracked down the source code that outputs it in
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
> > > > > >
> > > > > > but after going through the code I think I'd need to know much
> more
> > > > about
> > > > > > the code to glean anything from it or the associated JIRA ticket
> > > > > > https://issues.apache.org/jira/browse/HBASE-11240.
> > > > > >
> > > > > > Also, what is this "pipeline" the ticket and code talks about?
> > > > > >
> > > > > > Thanks in advance for any information and/or clarification anyone
> > can
> > > > > > provide.
> > > > > >
> > > > > > 
> > > > > >
> > > > > > Saad
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Re: Re: question on "drain region servers"

2016-04-26 Thread Ted Yu

Please see HBASE-4298 where this feature was introduced.

On Tue, Apr 26, 2016 at 5:12 AM, WangYQ  wrote:

> yes,  there is a tool graceful_stop.sh to graceful stop regionserver, and
> can move the regions back to the rs after rs come back.
> but i can not find the relation with drain region servers...
>
>
> i think drain region servers function is good, but can not think up with a
> pracital use case
>
>
>
>
>
>
>
>
> At 2016-04-26 16:01:55, "Dejan Menges"  wrote:
> >One of use cases we use it is graceful stop of regionserver - you unload
> >regions from the server before you restart it. Of course, after restart
> you
> >expect HBase to move regions back.
> >
> >Now I'm not really remembering correctly, but I kinda remember that one of
> >the features was at least that it will move back regions which were
> already
> >there, hence not destroy too much block locality.
> >
> >On Tue, Apr 26, 2016 at 8:15 AM WangYQ  wrote:
> >
> >> thanks
> >> in hbase 0.99.0,  I find the rb file: draining_servers.rb
> >>
> >>
> >> i have some suggestions on this tool:
> >> 1. if I add rs hs1 to draining_servers, when hs1 restart, the zk node
> >> still exists in zk, but hmaster will not treat hs1 as draining_servers
> >> i think when we add a hs to draining_servers, we do not need to
> store
> >> the start code in zk, just store the hostName and port
> >> 2.  we add hs1 to draining_servers, but if hs1 always restart, we will
> >> need to add hs1 several times
> >>   when we need to delete the draining_servers info of hs1, we  will
> need
> >> to delete hs1 several times
> >>
> >>
> >>
> >> finally, what is the original motivation of this tool, some scenario
> >> descriptions are good.
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2016-04-26 11:33:10, "Ted Yu"  wrote:
> >> >Please take a look at:
> >> >bin/draining_servers.rb
> >> >
> >> >On Mon, Apr 25, 2016 at 8:12 PM, WangYQ 
> >> wrote:
> >> >
> >> >> in hbase,  I find there is a "drain regionServer" feature
> >> >>
> >> >>
> >> >> if a rs is added to drain regionServer in ZK, then regions will not
> be
> >> >> move to on these regionServers
> >> >>
> >> >>
> >> >> but, how can a rs be add to  drain regionServer,   we add it handly
> or
> >> rs
> >> >> will add itself automaticly
> >>
>

Re: question on "drain region servers"

2016-04-25 Thread Ted Yu

Please take a look at:
bin/draining_servers.rb

On Mon, Apr 25, 2016 at 8:12 PM, WangYQ  wrote:

> in hbase,  I find there is a "drain regionServer" feature
>
>
> if a rs is added to drain regionServer in ZK, then regions will not be
> move to on these regionServers
>
>
> but, how can a rs be add to  drain regionServer,   we add it handly or rs
> will add itself automaticly

Re: Slow sync cost

2016-04-25 Thread Ted Yu

w.r.t. the pipeline, please see this description:

http://itm-vm.shidler.hawaii.edu/HDFS/ArchDocUseCases.html

On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti  wrote:

> Hi,
>
> In our large HBase cluster based on CDH 5.5 in AWS, we're constantly seeing
> the following messages in the region server logs:
>
> 2016-04-25 14:02:55,178 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258 ms,
> current pipeline:
> [DatanodeInfoWithStorage[10.99.182.165:50010
> ,DS-281d4c4f-23bd-4541-bedb-946e57a0f0fd,DISK],
> DatanodeInfoWithStorage[10.99.182.236:50010
> ,DS-f8e7e8c9-6fa0-446d-a6e5-122ab35b6f7c,DISK],
> DatanodeInfoWithStorage[10.99.182.195:50010
> ,DS-3beae344-5a4a-4759-ad79-a61beabcc09d,DISK]]
>
>
> These happen regularly while HBase appear to be operating normally with
> decent read and write performance. We do have occasional performance
> problems when regions are auto-splitting, and at first I thought this was
> related but now I se it happens all the time.
>
>
> Can someone explain what this means really and should we be concerned? I
> tracked down the source code that outputs it in
>
>
> hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
>
> but after going through the code I think I'd need to know much more about
> the code to glean anything from it or the associated JIRA ticket
> https://issues.apache.org/jira/browse/HBASE-11240.
>
> Also, what is this "pipeline" the ticket and code talks about?
>
> Thanks in advance for any information and/or clarification anyone can
> provide.
>
> 
>
> Saad
>

Re: current leaseholder is trying to recreate file error with ProcedureV2

2016-04-25 Thread Ted Yu

Can you pastebin more of the master log ?

Which version of hadoop are you using ?

Log snippet from namenode w.r.t. state-0073.log may also
provide some more clue.

Thanks

On Mon, Apr 25, 2016 at 12:56 PM, donmai  wrote:

> Hi all,
>
> I'm getting a strange error during table creation / disable in HBase 1.1.2:
>
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> failed to create file /hbase/MasterProcWALs/state-0073.log
> for DFSClient_NONMAPREDUCE_87753856_1 for client 10.66.102.192 because
> current leaseholder is trying to recreate file.
>
> Looks somewhat related to HBASE-14234 - what's the root cause behind this
> error message?
>
> Full stack trace below:
>
> -
>
> 2016-04-25 15:24:01,356 WARN
>  [B.defaultRpcServer.handler=7,queue=7,port=6] wal.WALProcedureStore:
> failed to create log file with id=73
>
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
> failed to create file /hbase/MasterProcWALs/state-0073.log
> for DFSClient_NONMAPREDUCE_87753856_1 for client 10.66.102.192 because
> current leaseholder is trying to recreate file.
>
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2988)
>
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2737)
>
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2632)
>
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2519)
>
> at
>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:566)
>
> at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
>
> at
>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:635)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1468)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>
> at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:241)
>
> at com.sun.proxy.$Proxy31.create(Unknown Source)
>
> at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>
> at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>
> at com.sun.proxy.$Proxy32.create(Unknown Source)
>
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1738)
>
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1671)
>
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1596)
>
> at
>
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397)
>
> at
>
> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
>
> at
>
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
>
> at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:912)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:893)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:790)
>
> at
>
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:706)
>
> at
>
> org.apache.hadoop.hbase.procedure2

Re: Hbase shell script from java

2016-04-24 Thread Ted Yu

Please see

https://blog.art-of-coding.eu/executing-operating-system-commands-from-java/

> On Apr 23, 2016, at 10:18 PM, Saurabh Malviya (samalviy)  
> wrote:
> 
> Hi,
> 
> Is there any way to run hbase shell script from Java. Also mentioned this 
> question earlier in below url earlier.
> 
> As we are having bunch of scripts and need to change frequently for 
> performance tuning.
> 
> http://grokbase.com/p/hbase/user/161ezbnk11/run-hbase-shell-script-from-java
> 
> 
> -Saurabh

Re: Region server crashes after GC pause time

2016-04-22 Thread Ted Yu

Jayesh:
Is it possible for you to share the JVM parameters ?

Thanks

On Fri, Apr 22, 2016 at 7:48 AM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:

> Karthik,
>
> Yes, tuning can help - but the biggest help is to give "sufficient" memory
> to the regionserver.
> And "sufficient" is relative - e.g. we have a 75GB heap (our increases
> were like this - 8, 12, 16, 24, 32, 45, 60 and 75 GB)
> Note that we had started off with 45 GB RAM and are now running on 148 GB
> RAM servers.
>
> Now we are very stable at 75GB and with appropriate JVM tuning our GCs are
> well contained (sub-100 ms).
>
> Also, although I like G1, it did not fare well for us at 75 GB - with the
> best tuning we could get upto 20-25 second pauses (HBase 1.0).
> CMS seems to work best for us.
>
> Jayesh
>
> -Original Message-
> From: karthi keyan [mailto:karthi93.san...@gmail.com]
> Sent: Friday, April 22, 2016 7:04 AM
> To: user@hbase.apache.org; dingdongc...@baidu.com
> Subject: Re: Region server crashes after GC pause time
>
> Hi Ding,
>
> I have increased the Heap to 2G , still getting out of memory exception .
> Actually i had write the data to HBase at 40K writes/sec .
>  Is there any parameter to tune up , as my knowledge "-
> XX:CMSInitiatingOccupancyFraction=N " i tuned like this in HBase.
> Is there any other parameter required to resolve this issue???
>
> Best,
> Karthik
>
> On Thu, Apr 14, 2016 at 12:21 PM, Ding,Dongchao 
> wrote:
>
> > Dump the jvm heap，analysis the the heap and find which query(s) cost
> > so many memory？
> > In my ever bad case，the RS crashed for Long GC pauses because of a big
> > query on Batch Get operation.
> >
> >
> > In addition，I think you can increase the men of JVM, 512m is so small
> > for RS.
> >
> >
> >
> > 在 16/4/14 14:00， "karthi keyan"  写入:
> >
> > >Hi ,
> > >
> > >i got this issue in HBase while at peak time handling more requests .
> > >can any one pls guide me to resolve the Long GC pauses in hbase .
> > >
> > >JDK-7 , JVM heap 512m
> > >
> > >HBase 0.98.13
> > >
> > >
> > > INFO  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM
> > >or host machine (eg GC): pause of approximately 1466ms GC pool
> > >'ConcurrentMarkSweep' had collection(s): count=1 time=1967ms  INFO
> > >[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
> > >machine (eg GC): pause of approximately 2304ms GC pool
> > >'ConcurrentMarkSweep' had collection(s): count=1 time=2775ms  INFO
> > >[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host
> > >machine (eg GC): pause of approximately 2287ms GC pool
> > >'ConcurrentMarkSweep' had collection(s): count=1 time=2775ms
> > >
> > > INFO  [RS:0;0:0:0:0:0:0:0:0:44037-SendThread(:2181)]
> > >zookeeper.ClientCnxn: Client session timed out, have not heard from
> > >server in 6819ms for sessionid 0x1540ab48b280004, closing socket
> > >connection and attempting reconnect  INFO
> > >[SplitLogWorker-,44037,1460468489645-SendThread(:2181)]
> > >zookeeper.ClientCnxn: Client session timed out, have not heard from
> > >server in 6819ms for sessionid 0x1540ab48b280005, closing socket
> > >connection and attempting reconnect
> > >
> > >Once after this HBase Region Server moved to Dead state.
> > >
> > >Best,
> > >Karthik
> >
> >
>
>
>
>
> This email and any files included with it may contain privileged,
> proprietary and/or confidential information that is for the sole use
> of the intended recipient(s).  Any disclosure, copying, distribution,
> posting, or use of the information contained in or attached to this
> email is prohibited unless permitted by the sender.  If you have
> received this email in error, please immediately notify the sender
> via return email, telephone, or fax and destroy this original transmission
> and its included files without reading or saving it in any manner.
> Thank you.
>

Re: Can not connect local java client to a remote Hbase

2016-04-21 Thread Ted Yu

Are you using hbase 1.0 or 1.1 ?

I assume you have verified that hbase master is running normally on
master-sigma.
Are you able to use hbase shell on that node ?

If you check master log, you would see which node hosts hbase:meta
On that node, do you see anything interesting in region server log ?

Cheers

On Thu, Apr 21, 2016 at 10:41 AM, SOUFIANI Mustapha | السفياني مصطفى <
s.mustaph...@gmail.com> wrote:

> Hi all,
> I'm trying to connect my local java client (pentaho) to a remote Hbase but
> every time I get a TimeOut error telleing me that this connection couldn't
> be established.
>
> herer is the full message error:
>
>
> ***
>
> java.io.IOException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions:
> Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table
> 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=localhost,16020,1461071963695, seqNum=0
>
>
> at
>
> com.pentaho.big.data.bundles.impl.shim.hbase.table.HBaseTableImpl.exists(HBaseTableImpl.java:71)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.mapping.MappingAdmin.getMappedTables(MappingAdmin.java:502)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.setupMappedTableNames(HBaseOutputDialog.java:818)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.access$900(HBaseOutputDialog.java:88)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog$7.widgetSelected(HBaseOutputDialog.java:398)
>
> at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
>
> at
>
> org.pentaho.big.data.kettle.plugins.hbase.output.HBaseOutputDialog.open(HBaseOutputDialog.java:603)
>
> at
>
> org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:125)
>
> at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8783)
>
> at
> org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3072)
>
> at
>
> org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:755)
>
> at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
>
> at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
>
> at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1347)
>
> at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7989)
>
> at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9269)
>
> at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:662)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>
> at java.lang.reflect.Method.invoke(Unknown Source)
>
> at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)
>
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> after attempts=36, exceptions:
> Wed Apr 20 10:32:43 WEST 2016, null, java.net.SocketTimeoutException:
> callTimeout=6, callDuration=75181: row 'pentaho_mappings,,' on table
> 'hbase:meta' at region=hbase:meta,,1.1588230740,
> hostname=localhost,16020,1461071963695, seqNum=0
>
>
> at
>
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:270)
>
> at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:225)
>
> at
>
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:63)
>
> at
>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>
> at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:314)
>
> at
>
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:289)
>
> at
>
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:161)
>
> at
> org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:156)
>
> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888)
>
> at
>
> org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:601)
>
> at
>
> org.apache.hadoop.hbase.MetaTableAcce

Re: ERROR [main] client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper.

2016-04-21 Thread Ted Yu

; java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:392)
>   at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
> 2016-04-21 11:07:45,876 [myid:3] - WARN  
> [RecvWorker:1:QuorumCnxManager$RecvWorker@813] - Interrupting SendWorker
> 2016-04-21 11:07:45,888 [myid:3] - WARN  
> [SendWorker:1:QuorumCnxManager$SendWorker@727] - Interrupted while waiting 
> for message on queue
> java.lang.InterruptedException
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095)
>   at 
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389)
>   at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
>   at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
>   at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
> 2016-04-21 11:07:45,889 [myid:3] - WARN  
> [SendWorker:1:QuorumCnxManager$SendWorker@736] - Send worker leaving thread
> 2016-04-21 11:07:58,881 [myid:3] - INFO  
> [/192.168.1.4:3889:QuorumCnxManager$Listener@541] - Received connection 
> request /192.168.1.2:51099
> 2016-04-21 11:07:58,915 [myid:3] - INFO  
> [WorkerReceiver[myid=3]:FastLeaderElection@600] - Notification: 1 (message 
> format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LEADING (my state)
> 2016-04-21 11:07:59,150 [myid:3] - INFO  
> [LearnerHandler-/192.168.1.2:35263:LearnerHandler@329] - Follower sid: 1 : 
> info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@4a7baf7d
> 2016-04-21 11:07:59,155 [myid:3] - INFO  
> [LearnerHandler-/192.168.1.2:35263:LearnerHandler@384] - Synchronizing with 
> Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x0
> 2016-04-21 11:07:59,155 [myid:3] - INFO  
> [LearnerHandler-/192.168.1.2:35263:LearnerHandler@458] - Sending SNAP
> 2016-04-21 11:07:59,156 [myid:3] - INFO  
> [LearnerHandler-/192.168.1.2:35263:LearnerHandler@482] - Sending snapshot 
> last zxid of peer is 0x0  zxid of leader is 0x2sent zxid of db as 
> 0x2
> 2016-04-21 11:07:59,212 [myid:3] - INFO  
> [LearnerHandler-/192.168.1.2:35263:LearnerHandler@518] - Received 
> NEWLEADER-ACK message from 1
> Eric Gao
> Keep on going never give up.
> Blog:
> http://gaoqiang.blog.chinaunix.net/
> http://gaoqiangdba.blog.163.com/
> 
> 
>  
> From: Ted Yu
> Date: 2016-04-20 22:31
> To: Eric Gao
> CC: user
> Subject: Re: Re: ERROR [main] 
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not in 
> ZooKeeper.
> bq. dataDir=/tmp/zookeeper
>  
> When machine restarts, you would lose the data, right ?
> Please change to other directory.
>  
> Was zookeeper.out from slave2 ?
>  
> Please check port  on 192.168.1.3 <http://192.168.1.3:/>
>  
> On Wed, Apr 20, 2016 at 6:22 AM, Eric Gao  wrote:
>  
> > Dear Ted,
> > Thank you for your kind attention.
> > I'm a complete novice at Hadoop. ^___^
> >
> >
> >
> > I have 3 host---1 master server and 2 data node,and zookeeper is installed 
> > on each server.
> > Zookeeper and Hbase are still unable to start.
> >
> > Each server has the same status:
> >
> > [root@master data]# /opt/zookeeper/bin/zkServer.sh start
> > ZooKeeper JMX enabled by default
> > Using config: /opt/zookeeper/bin/../conf/zoo.cfg
> > Starting zookeeper ... date
> > STARTED
> > [root@slave2 ~]# /opt/zookeeper/bin/zkServer.sh status
> > ZooKeeper JMX enabled by default
> > Using config: /opt/zookeeper/bin/../conf/zoo.cfg
> > Error contacting service. It is probably not running.
> >
> >
> > Here are some of my configuration infomation:
> >
> >
> > *hbase-site.xml:*
> > 
> > 
> > hbase.rootdir
> > hdfs://master:9000/hbase/data
> > 
> > 
> > hbase.cluster.distributed
> > true
> > 
> >
> > 
> > zookeeper.znode.parent
> > /hbase
> > Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper
> > files that are configured with a relative path will go under this node.
> > By default, all of HBase's ZooKeeper file path are configured with a
> > relative path, so they will all go under this directory unless changed.
> > 
> > 
> >
> >
> > 
&g

Re: Re: ERROR [main] client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper.

2016-04-20 Thread Ted Yu

actPlainSocketImpl.java:182)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:579)
>
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:381)
>
> at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:426)
>
> at 
> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:822)
>
>
>
>
> *core-site.xml:*
> 
> 
> hadoop.tmp.dir
> /opt/hadoop/tmp
> A base for other temporary directories.
> 
> 
> 
> fs.default.name
> hdfs://master:9000
> 
> 
>
> Please help to check the problem. Thanks a lot!
>
> --
> *Eric Gao*
> Keep on going never give up.
> *Blog:*
> http://gaoqiang.blog.chinaunix.net/
> http://gaoqiangdba.blog.163.com/
>
>
>
> *From:* Ted Yu 
> *Date:* 2016-04-16 23:21
> *To:* user@hbase.apache.org
> *Subject:* Re: Re: ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper.
> Can you verify that hbase is running by logging onto master node and check
> the Java processes ?
>
> If master is running, can you do a listing of the zookeeper znode (using
> zkCli) and pastebin the result ?
>
> Thanks
>
> On Sat, Apr 16, 2016 at 8:14 AM, Eric Gao  wrote:
>
> > Yes,I have seen your reply.Thanks very much for your kindness.
> >
> > This is my hbase-site.xml:
> > 
> > 
> > hbase.rootdir
> > hdfs://master:9000/hbase/data
> > 
> > 
> > hbase.cluster.distributed
> > true
> > 
> >
> > 
> > zookeeper.znode.parent
> > /hbase
> > Root ZNode for HBase in ZooKeeper. All of HBase's
> > ZooKeeper
> >   files that are configured with a relative path will go under this
> > node.
> >   By default, all of HBase's ZooKeeper file path are configured with
> a
> >   relative path, so they will all go under this directory unless
> > changed.
> > 
> >   
> >
> >
> > 
> > hbase.zookeeper.quorum
> > master,slave1,slave2
> > Comma separated list of servers in the ZooKeeper Quorum. For
> > example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By
> > default this is set to localhost for local and pseudo-distributed modes
> of
> > operation. For a fully-distributed setup, this should be set to a full
> list
> > of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
> > this is the list of servers which we will start/stop ZooKeeper on.
> > 
> > 
> > 
> > hbase.zookeeper.property.dataDir
> > /opt/zookeeper/data
> > Property from ZooKeeper's config zoo.cfg. The directory
> where
> > the snapshot is stored. 
> > 
> > 
> >
> > This is my hbase-env.sh:
> >
> >
> > [root@master ~]# cat /opt/hbase/conf/hbase-env.sh
> > #
> > #/**
> > # * Licensed to the Apache Software Foundation (ASF) under one
> > # * or more contributor license agreements.  See the NOTICE file
> > # * distributed with this work for additional information
> > # * regarding copyright ownership.  The ASF licenses this file
> > # * to you under the Apache License, Version 2.0 (the
> > # * "License"); you may not use this file except in compliance
> > # * with the License.  You may obtain a copy of the License at
> > # *
> > # * http://www.apache.org/licenses/LICENSE-2.0
> > # *
> > # * Unless required by applicable law or agreed to in writing, software
> > # * distributed under the License is distributed on an "AS IS" BASIS,
> > # * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > implied.
> > # * See the License for the specific language governing permissions and
> > # * limitations under the License.
> > # */
> >
> > # Set environment variables here.
> >
> > # This script sets variables multiple times over the course of starting
> an
> > hbase process,
> > # so try to keep things idempotent unless you want to take an even deeper
> > look
> > # into the startup scripts (bin/hbase, etc.)
> >
> > # The java implementation to use.  Java 1.7+ required.
> > export JAVA_HOME=/usr
> >
> > # Extra Java CLASSPATH elements.  Optional.
> >  export HBASE_CLASSPATH=/opt/hadoop
> >
> > # The maximum amount of heap to use.

Re: Re: single thread threadpool for master_table_operation

2016-04-20 Thread Ted Yu

bq. all cp of all tables

What does cp mean ? Do you mean column family ?

bq. will never create/enable tables in the future

Then what would the cluster be used for ?

On Wed, Apr 20, 2016 at 6:16 AM, WangYQ  wrote:

> i want to remove all cp of all tables in hbase
> so i disable tables concurrently, and then modify tables
> for hundreds of tables, remove all cp costs 30 miniutes,  is not so fast
>
>
> so, i want to speed the whoe process. and will never create/enable tables
> in the future
>
>
> after examine the code, i want ot increase the  pool size for
> master_table_operation
>
>
> but not sure if there are any problems.
>
>
> thanks..
>
> At 2016-04-20 21:07:06, "Ted Yu"  wrote:
> >Adding subject.
> >
> >Adding back user@hbase
> >
> >But the master wouldn't know what next action admin is going to perform,
> >right ?
> >
> >On Wed, Apr 20, 2016 at 5:59 AM, WangYQ 
> wrote:
> >
> >> if i just disable tables in concurrently,
> >> and will never enable, modify, create table
> >>
> >> i think is ok, right?
> >>
> >>
> >>
> >>
> >> 在2016年04月20日 20:53 ，Ted Yu 写道：
> >>
> >>
> >> Have you seen the comment above that line ?
> >>
> >>// We depend on there being only one instance of this executor
> running
> >>// at a time.  To do concurrency, would need fencing of
> enable/disable
> >> of
> >>// tables.
> >>// Any time changing this maxThreads to > 1, pls see the comment at
> >>// AccessController#postCreateTableHandler
> >>
> >> BTW in the future, please send queries to user@hbase
> >>
> >> On Wed, Apr 20, 2016 at 5:50 AM, WangYQ 
> wrote:
> >>
> >>> hbase 0.98.10
> >>>
> >>> in class hmaster
> >>> line 1298, the threadpool size for master_table_operation is 1, can not
> >>> be set
> >>>
> >>> are there any problems if i disable tables in concurrently
> >>>
> >>> thanks
> >>
> >>
> >>
> >>
> >>
>

Re: single thread threadpool for master_table_operation

2016-04-20 Thread Ted Yu

Adding subject.

Adding back user@hbase

But the master wouldn't know what next action admin is going to perform,
right ?

On Wed, Apr 20, 2016 at 5:59 AM, WangYQ  wrote:

> if i just disable tables in concurrently,
> and will never enable, modify, create table
>
> i think is ok, right?
>
>
>
>
> 在2016年04月20日 20:53 ，Ted Yu 写道：
>
>
> Have you seen the comment above that line ?
>
>// We depend on there being only one instance of this executor running
>// at a time.  To do concurrency, would need fencing of enable/disable
> of
>// tables.
>// Any time changing this maxThreads to > 1, pls see the comment at
>// AccessController#postCreateTableHandler
>
> BTW in the future, please send queries to user@hbase
>
> On Wed, Apr 20, 2016 at 5:50 AM, WangYQ  wrote:
>
>> hbase 0.98.10
>>
>> in class hmaster
>> line 1298, the threadpool size for master_table_operation is 1, can not
>> be set
>>
>> are there any problems if i disable tables in concurrently
>>
>> thanks
>
>
>
>
>

Re: (无主题)

2016-04-20 Thread Ted Yu

Have you seen the comment above that line ?

   // We depend on there being only one instance of this executor running
   // at a time.  To do concurrency, would need fencing of enable/disable of
   // tables.
   // Any time changing this maxThreads to > 1, pls see the comment at
   // AccessController#postCreateTableHandler

BTW in the future, please send queries to user@hbase

On Wed, Apr 20, 2016 at 5:50 AM, WangYQ  wrote:

> hbase 0.98.10
>
> in class hmaster
> line 1298, the threadpool size for master_table_operation is 1, can not be
> set
>
> are there any problems if i disable tables in concurrently
>
> thanks

Re: Major Compaction Strategy

2016-04-19 Thread Ted Yu

Please use the following method of HBaseAdmin:

  public CompactionState getCompactionStateForRegion(final byte[]
regionName)

Cheers

On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti  wrote:

> Hi,
>
> We have a large HBase 1.x cluster in AWS and have disabled automatic major
> compaction as advised. We were running our own code for compaction daily
> around midnight which calls HBaseAdmin.majorCompactRegion(byte[]
> regionName) in a rolling fashion across all regions.
>
> But we missed the fact that this is an asynchronous operation, so in
> practice this causes major compaction to run across all regions, at least
> those not already major compacted (for example because previous minor
> compactions got upgraded to major ones).
>
> We don't really have a suitable low load period, so what is a suitable way
> to make major compaction run in a rolling fashion region by region? The API
> above provides no return value for us to be able to wait for one compaction
> to finish before moving to the next.
>
> Thanks.
>
> 
> Saad
>

Re: Processing rows in parallel with MapReduce jobs.

2016-04-19 Thread Ted Yu

>From the error, you need to provide an argumentless ctor for
MyTableInputFormat.

On Tue, Apr 19, 2016 at 12:12 AM, Ivan Cores gonzalez 
wrote:

>
> Hi Ted,
>
> Sorry, I forgot to write the error. In runtime I have the next exception:
>
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.NoSuchMethodException:
> simplerowcounter.SimpleRowCounter$MyTableInputFormat.()
>
> the program works fine if I don't use "MyTableInputFormat" modifying the
> call to initTableMapperJob:
>
> TableMapReduceUtil.initTableMapperJob(tableName, scan,
> RowCounterMapper.class,
> ImmutableBytesWritable.class, Result.class, job);   // -->
> works fine without MyTableInputFormat
>
> That's why I asked If you see any problem in the code. Because maybe I
> forgot override some method or something is missing.
>
> Best,
> Iván.
>
>
> - Mensaje original -
> > De: "Ted Yu" 
> > Para: user@hbase.apache.org
> > Enviados: Martes, 19 de Abril 2016 0:22:05
> > Asunto: Re: Processing rows in parallel with MapReduce jobs.
> >
> > Did you see the "Message to log?" log ?
> >
> > Can you pastebin the error / exception you got ?
> >
> > On Mon, Apr 18, 2016 at 1:54 AM, Ivan Cores gonzalez <
> ivan.co...@inria.fr>
> > wrote:
> >
> > >
> > >
> > > Hi Ted,
> > > So, If I understand the behaviour of getSplits(), I can create
> "virtual"
> > > splits overriding the getSplits function.
> > > I was performing some tests, but my code crash in runtime and I cannot
> > > found the problem.
> > > Any help? I didn't find examples.
> > >
> > >
> > > public class SimpleRowCounter extends Configured implements Tool {
> > >
> > >   static class RowCounterMapper extends
> > > TableMapper {
> > > public static enum Counters { ROWS }
> > > @Override
> > > public void map(ImmutableBytesWritable row, Result value, Context
> > > context) {
> > >   context.getCounter(Counters.ROWS).increment(1);
> > > try {
> > > Thread.sleep(3000); //Simulates work
> > > } catch (InterruptedException name) { }
> > > }
> > >   }
> > >
> > >   public class MyTableInputFormat extends TableInputFormat {
> > > @Override
> > > public List getSplits(JobContext context) throws
> > > IOException {
> > > //Just to detect if this method is being called ...
> > > List splits = super.getSplits(context);
> > > System.out.printf("Message to log? \n" );
> > > return splits;
> > > }
> > >   }
> > >
> > >   @Override
> > >   public int run(String[] args) throws Exception {
> > > if (args.length != 1) {
> > >   System.err.println("Usage: SimpleRowCounter ");
> > >   return -1;
> > > }
> > > String tableName = args[0];
> > >
> > > Scan scan = new Scan();
> > > scan.setFilter(new FirstKeyOnlyFilter());
> > > scan.setCaching(500);
> > > scan.setCacheBlocks(false);
> > >
> > > Job job = new Job(getConf(), getClass().getSimpleName());
> > > job.setJarByClass(getClass());
> > >
> > > TableMapReduceUtil.initTableMapperJob(tableName, scan,
> > > RowCounterMapper.class,
> > > ImmutableBytesWritable.class, Result.class, job, true,
> > > MyTableInputFormat.class);
> > >
> > > job.setNumReduceTasks(0);
> > > job.setOutputFormatClass(NullOutputFormat.class);
> > > return job.waitForCompletion(true) ? 0 : 1;
> > >   }
> > >
> > >   public static void main(String[] args) throws Exception {
> > > int exitCode = ToolRunner.run(HBaseConfiguration.create(),
> > > new SimpleRowCounter(), args);
> > > System.exit(exitCode);
> > >   }
> > > }
> > >
> > > Thanks so much,
> > > Iván.
> > >
> > >
> > >
> > >
> > > - Mensaje original -
> > > > De: "Ted Yu" 
> > > > Para: user@hbase.apache.org
> > > > Enviados: Martes, 12 de Abril 2016 17:29:52
> > > > Asunto: Re: Processing rows in parallel with MapReduce jobs.
> > > >
> > > > Please take a look at Ta

Re: Processing rows in parallel with MapReduce jobs.

2016-04-18 Thread Ted Yu

Did you see the "Message to log?" log ?

Can you pastebin the error / exception you got ?

On Mon, Apr 18, 2016 at 1:54 AM, Ivan Cores gonzalez 
wrote:

>
>
> Hi Ted,
> So, If I understand the behaviour of getSplits(), I can create "virtual"
> splits overriding the getSplits function.
> I was performing some tests, but my code crash in runtime and I cannot
> found the problem.
> Any help? I didn't find examples.
>
>
> public class SimpleRowCounter extends Configured implements Tool {
>
>   static class RowCounterMapper extends
> TableMapper {
> public static enum Counters { ROWS }
> @Override
> public void map(ImmutableBytesWritable row, Result value, Context
> context) {
>   context.getCounter(Counters.ROWS).increment(1);
> try {
> Thread.sleep(3000); //Simulates work
> } catch (InterruptedException name) { }
> }
>   }
>
>   public class MyTableInputFormat extends TableInputFormat {
> @Override
> public List getSplits(JobContext context) throws
> IOException {
> //Just to detect if this method is being called ...
> List splits = super.getSplits(context);
> System.out.printf("Message to log? \n" );
> return splits;
> }
>   }
>
>   @Override
>   public int run(String[] args) throws Exception {
> if (args.length != 1) {
>   System.err.println("Usage: SimpleRowCounter ");
>   return -1;
> }
> String tableName = args[0];
>
> Scan scan = new Scan();
> scan.setFilter(new FirstKeyOnlyFilter());
> scan.setCaching(500);
> scan.setCacheBlocks(false);
>
> Job job = new Job(getConf(), getClass().getSimpleName());
> job.setJarByClass(getClass());
>
> TableMapReduceUtil.initTableMapperJob(tableName, scan,
> RowCounterMapper.class,
> ImmutableBytesWritable.class, Result.class, job, true,
> MyTableInputFormat.class);
>
> job.setNumReduceTasks(0);
> job.setOutputFormatClass(NullOutputFormat.class);
> return job.waitForCompletion(true) ? 0 : 1;
>   }
>
>   public static void main(String[] args) throws Exception {
> int exitCode = ToolRunner.run(HBaseConfiguration.create(),
> new SimpleRowCounter(), args);
> System.exit(exitCode);
>   }
> }
>
> Thanks so much,
> Iván.
>
>
>
>
> - Mensaje original -
> > De: "Ted Yu" 
> > Para: user@hbase.apache.org
> > Enviados: Martes, 12 de Abril 2016 17:29:52
> > Asunto: Re: Processing rows in parallel with MapReduce jobs.
> >
> > Please take a look at TableInputFormatBase#getSplits() :
> >
> >* Calculates the splits that will serve as input for the map tasks.
> The
> >
> >* number of splits matches the number of regions in a table.
> >
> > Each mapper would be reading one of the regions.
> >
> > On Tue, Apr 12, 2016 at 8:18 AM, Ivan Cores gonzalez <
> ivan.co...@inria.fr>
> > wrote:
> >
> > > Hi Ted,
> > > Yes, I mean same region.
> > >
> > > I wasn't using the getSplits() function. I'm trying to add it to my
> code
> > > but I'm not sure how I have to do it. Is there any example in the
> website?
> > > I can not find anything. (By the way, I'm using TableInputFormat, not
> > > InputFormat)
> > >
> > > But just to confirm, with the getSplits() function, Are mappers
> processing
> > > rows in the same region executed in parallel? (assuming that there are
> > > empty
> > > processors/cores)
> > >
> > > Thanks,
> > > Ivan.
> > >
> > >
> > > - Mensaje original -
> > > > De: "Ted Yu" 
> > > > Para: user@hbase.apache.org
> > > > Enviados: Lunes, 11 de Abril 2016 15:10:29
> > > > Asunto: Re: Processing rows in parallel with MapReduce jobs.
> > > >
> > > > bq. if they are located in the same split?
> > > >
> > > > Probably you meant same region.
> > > >
> > > > Can you show the getSplits() for the InputFormat of your MapReduce
> job ?
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Apr 11, 2016 at 5:48 AM, Ivan Cores gonzalez <
> > > ivan.co...@inria.fr>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have a small question regarding the MapReduce jobs behaviour with
> > > HBase.
> > > > &g

Re: maven dependency resources for hbase .98 and apache-spark

2016-04-18 Thread Ted Yu

The referenced link is from a specific vendor.

Mind posting on the vendor's mailing list ?

On Mon, Apr 18, 2016 at 12:45 PM, Colin Kincaid Williams 
wrote:

> I would like to insert some data from Spark and or Spark streaming
> into Hbase, on v .98. I found this section of the book which shows
> examples of using the apis:
> https://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#spark .
>
> However I'm unsure what dependency sections I need to add to my maven
> manifest. Does anybody have any references?
>

Re: To Store Large Number of Video and Image files

2016-04-16 Thread Ted Yu

There was HBASE-15370 for backport but it was decided not to backport the
feature.

FYI

On Sat, Apr 16, 2016 at 7:26 PM, Ascot Moss  wrote:

> Hi,
>
> About HBase-11339,
> "The size of the MOB data could not be very large, it better to keep the
> MOB size within 100KB and 10MB. Since MOB cells are written into the
> memstore before flushing, large MOB cells stress the memory in region
> servers."
>
> Can this be resolved if we provide more RAM in region servers? for
> instances, the servers in the cluster, each has 768GB RAM + 14 x 6T HDD.
>
> regards
>
>
>
> On Sun, Apr 17, 2016 at 9:56 AM, Ascot Moss  wrote:
>
> > Thanks Ted!
> >
> > Just visited HBASE-11339, its status is "resolved" however, it is for
> > "Fix Version : 2.0.0."
> > How to patch it to current HBase stable version (v1.1.4) ?
> >
> > About Fault Tolerance to DataCenter level, I am thinking HBase
> Replication
> > method to replicate HBase Tables to another cluster (backup one), is
> there
> > any real world reference about the replication performance, for instances
> > if the bandwidth is 100MB/s?
> >
> > Regards
> >
> >
> >
> >
> >
> >
> >-
> >
> >
> > On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu  wrote:
> >
> >> Have you taken a look at HBASE-11339 (HBase MOB) ?
> >>
> >> Note: this feature does not handle 10GB objects well. Consider store GB
> >> image on hdfs.
> >>
> >> Cheers
> >>
> >> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss 
> wrote:
> >>
> >> > Hi,
> >> >
> >> > I have a project that needs to store large number of image and video
> >> files,
> >> > the file size varies from 10MB to 10GB, the initial number of files
> >> will be
> >> > 0.1 billion and would grow over 1 billion, what will be the practical
> >> > recommendations to store and view these files?
> >> >
> >> >
> >> >
> >> > #1 One cluster, store the HDFS URL in HBase and store the actual file
> in
> >> > HDFS? (block_size as 128MB and replication factor as 3)
> >> >
> >> >
> >> > #2 One cluster, Store small files in HBase directly and use #1 for
> large
> >> > files? (block_size as 128MB and replication factor as 3)
> >> >
> >> >
> >> > #3 Multiple Hadoop/HBase clusters, each with different block_size
> >> settings?
> >> >
> >> >
> >> >  e.g. cluster 1 (small): block_size as 128MB and replication
> factor
> >> as
> >> > 3, store all files in HBase if their file size is smaller 128MB
> >> >
> >> > cluster 2 (large): bigger block_size, say 4GB, replication
> >> > factor as 3, store the HDFS URL in HBase and store the actual file in
> >> HDFS
> >> >
> >> >
> >> >
> >> > #4 Use Hadoop Federation for large number of files?
> >> >
> >> >
> >> > About Fault Tolerance, need to consider four types of failures:
> driver,
> >> > host, rack, and  datacenter failures.
> >> >
> >> >
> >> > Regards
> >> >
> >>
> >
> >
>

Re: To Store Large Number of Video and Image files

2016-04-16 Thread Ted Yu

Have you taken a look at HBASE-11339 (HBase MOB) ?

Note: this feature does not handle 10GB objects well. Consider store GB
image on hdfs.

Cheers

On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss  wrote:

> Hi,
>
> I have a project that needs to store large number of image and video files,
> the file size varies from 10MB to 10GB, the initial number of files will be
> 0.1 billion and would grow over 1 billion, what will be the practical
> recommendations to store and view these files?
>
>
>
> #1 One cluster, store the HDFS URL in HBase and store the actual file in
> HDFS? (block_size as 128MB and replication factor as 3)
>
>
> #2 One cluster, Store small files in HBase directly and use #1 for large
> files? (block_size as 128MB and replication factor as 3)
>
>
> #3 Multiple Hadoop/HBase clusters, each with different block_size settings?
>
>
>  e.g. cluster 1 (small): block_size as 128MB and replication factor as
> 3, store all files in HBase if their file size is smaller 128MB
>
> cluster 2 (large): bigger block_size, say 4GB, replication
> factor as 3, store the HDFS URL in HBase and store the actual file in HDFS
>
>
>
> #4 Use Hadoop Federation for large number of files?
>
>
> About Fault Tolerance, need to consider four types of failures: driver,
> host, rack, and  datacenter failures.
>
>
> Regards
>

Re: Re: ERROR [main] client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper.

2016-04-16 Thread Ted Yu

And this is my env:
> [hadoop@master ~]$ env
> XDG_SESSION_ID=1
> HOSTNAME=master
> SHELL=/bin/bash
> TERM=xterm
> HADOOP_HOME=/opt/hadoop
> HISTSIZE=1000
> USER=hadoop
>
> LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
> MAIL=/var/spool/mail/hadoop
>
> PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/hadoop/bin:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/bin:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jre/bin:/usr/bin:/home/hadoop/.local/bin:/home/hadoop/bin
> PWD=/home/hadoop
> JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64
> LANG=en_US.UTF-8
> HISTCONTROL=ignoredups
> SHLVL=1
> HOME=/home/hadoop
> LOGNAME=hadoop
>
> CLASSPATH=.::/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/lib:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75-2.5.4.2.el7_0.x86_64/jre/lib
> LESSOPEN=||/usr/bin/lesspipe.sh %s
> _=/bin/env
>
>
>
> Eric Gao
> Keep on going never give up.
> Blog:
> http://gaoqiang.blog.chinaunix.net/
> http://gaoqiangdba.blog.163.com/
>
>
>
> From: Ted Yu
> Date: 2016-04-16 22:59
> To: user@hbase.apache.org
> Subject: Re: ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper.
> Have you seen my reply ?
>
> http://search-hadoop.com/m/q3RTtJHewi1jOgc21
>
> The actual value for zookeeper.znode.parent could be /hbase-secure (just an
> example).
>
> Make sure the correct hbase-site,xml is in classpath for hbase shell.
>
> On Sat, Apr 16, 2016 at 7:53 AM, Eric Gao  wrote:
>
> > Dear expert,
> >   I have encountered a problem,when I run hbase cmd :status it shows:
> >
> > hbase(main):001:0> status
> > 2016-04-16 13:03:02,333 ERROR [main]
> > client.ConnectionManager$HConnectionImplementation: The node /hbase is
> not
> > in ZooKeeper. It should have been written by the master. Check the value
> > configured in 'zookeeper.znode.parent'. There could be a mismatch with
> the
> > one configured in the master.
> > 2016-04-16 13:03:02,538 ERROR [main]
> > client.ConnectionManager$HConnectionImplementation: The node /hbase is
> not
> > in ZooKeeper. It should have been written by the master. Check the value
> > configured in 'zookeeper.znode.parent'. There could be a mismatch with
> the
> > one configured in the master.
> > 2016-04-16 13:03:02,843 ERROR [main]
> > client.ConnectionManager$HConnectionImplementation: The node /hbase is
> not
> > in ZooKeeper. It should have been written by the master. Check the value
> > configured in 'zookeeper.znode.parent'. There could be a mismatch with
> the
> > one configured in the master.
> > 2016-04-16 13:03:03,348 ERROR [main]
> > client.ConnectionManager$HConnectionImplementation: The node /hbase is
> not
> > in ZooKeeper. It should have been written by the master. Check the value
> > configured in 'zookeeper.znode.parent'. There could be a mismatch with
> the
> > one configured in the master.
> > 2016-04-16 13:03:04,355 ERROR [main]
> > client.ConnectionManager$HConnectionImplementation: The node /hbase is
> not
> > in ZooKeeper. It should have been written by the master. Check the value
> > configured in 'zookeeper.znode.parent'. There could be a mismatch with
> the
> > one configured in the master.
> > 2016-04-16 13:03:06,369 ERROR [main]
> > client.ConnectionManager$HConnectionImplementation: The node /hbase is
> not
> > in ZooKeeper. It should have been written by the master. Check the value
> > configured in 'zookeeper.znode.parent'. There could be a mismatch with
> the
> > one configured in the master.
> > 2016-04-16 13:03:10,414 ERROR [main]
> > client.ConnectionManager$HConnectionImplementation: The node /hbase is
> not
> > in ZooKeeper. It should have been written by the master. Check the value
> > configured in 'zookeeper.znode.parent'. There could be a mismatch with
> the
> > one configured in the master.
> >
> > How can I solve the problem?
> > Thanks very much
> >
> >
> >
> > Eric Gao
> > Keep on going never give up.
> > Blog:
> > http://gaoqiang.blog.chinaunix.net/
> > http://gaoqiangdba.blog.163.com/
> >
> >
> >
>

Re: ERROR [main] client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper.

2016-04-16 Thread Ted Yu

Have you seen my reply ?

http://search-hadoop.com/m/q3RTtJHewi1jOgc21

The actual value for zookeeper.znode.parent could be /hbase-secure (just an
example).

Make sure the correct hbase-site,xml is in classpath for hbase shell.

On Sat, Apr 16, 2016 at 7:53 AM, Eric Gao  wrote:

> Dear expert,
>   I have encountered a problem,when I run hbase cmd :status it shows:
>
> hbase(main):001:0> status
> 2016-04-16 13:03:02,333 ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper. It should have been written by the master. Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
> 2016-04-16 13:03:02,538 ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper. It should have been written by the master. Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
> 2016-04-16 13:03:02,843 ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper. It should have been written by the master. Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
> 2016-04-16 13:03:03,348 ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper. It should have been written by the master. Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
> 2016-04-16 13:03:04,355 ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper. It should have been written by the master. Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
> 2016-04-16 13:03:06,369 ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper. It should have been written by the master. Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
> 2016-04-16 13:03:10,414 ERROR [main]
> client.ConnectionManager$HConnectionImplementation: The node /hbase is not
> in ZooKeeper. It should have been written by the master. Check the value
> configured in 'zookeeper.znode.parent'. There could be a mismatch with the
> one configured in the master.
>
> How can I solve the problem?
> Thanks very much
>
>
>
> Eric Gao
> Keep on going never give up.
> Blog:
> http://gaoqiang.blog.chinaunix.net/
> http://gaoqiangdba.blog.163.com/
>
>
>

Re: Read HFile from Local file system for studying and testing

2016-04-14 Thread Ted Yu

Please take a look at:
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java

There're various tests under hbase-server/src/test which demonstrate the
usage.

FYI

On Thu, Apr 14, 2016 at 8:03 PM, Bin Wang  wrote:

> Hi there,
>
> I have a HFile that I copied from HDFS down to my laptop.
>
> I am trying to use the HFile.createreader to read in that HFile and play
> with it. However, it was really hard to figure that out how to read it from
> local file system.
>
> public class MyHbase {
> public static void main(String[] args) throws IOException {
>
> Path p = new Path("/Desktop/testHFile");
> Configuration conf = new Configuration();
> //  FileSystem localfs = FileSystem.newInstanceLocal(conf);
> //  FileSystem localfs =
> FileSystem.getLocal(conf).getRawFileSystem();
> //  FileSystem localfs = p.getFileSystem(conf);
>  RawLocalFileSystem localfs = new RawLocalFileSystem();
> System.out.println(localfs.toString());
> HFile.Reader reader = HFile.createReader(localfs, p, new
> CacheConfig(conf), conf);
> reader.loadFileInfo();
>
> HFileScanner scanner = reader.getScanner(false, false);
> scanner.seekTo();
> int count = 0;
> do {
> count++;
> } while (scanner.next());
> System.out.println(count);
> reader.close();
> }
> }
>
> And it is giving the error message:
>
> LocalFS
> 16/04/13 14:54:25 INFO hfile.CacheConfig: Allocating LruBlockCache
> size=1.42 GB, blockSize=64 KB
> 16/04/13 14:54:25 INFO hfile.CacheConfig:
> blockCache=LruBlockCache{blockCount=0, currentSize=1567264,
> freeSize=1525578848, maxSize=1527146112, heapSize=1567264,
> minSize=1450788736, minFactor=0.95, multiSize=725394368,
> multiFactor=0.5, singleSize=362697184, singleFactor=0.25},
> cacheDataOnRead=true, cacheDataOnWrite=false,
> cacheIndexesOnWrite=false, cacheBloomsOnWrite=false,
> cacheEvictOnClose=false, cacheDataCompressed=false,
> prefetchOnOpen=false
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
> at
> org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:98)
> at
> org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.(FSDataInputStreamWrapper.java:79)
> at
> org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:519)
> at myhbase.MyHbase.main(MyHbase.java:23)
>
>
> I asked this question
> <
> http://stackoverflow.com/questions/36609428/trying-to-read-hfile-using-hfile-createreader-from-local-file-system
> >on
> Stackoverflow and seems like there are not that many hbase people there.
>
> Can anyone on this mailing-list help me on that?
>
> Best regards,
>
> Bin
>

Re: Append Visibility Labels?

2016-04-13 Thread Ted Yu

There is currently no API for appending Visibility Labels.

checkAndPut() only allows you to compare value, not labels.

On Wed, Apr 13, 2016 at 8:12 AM, 
wrote:

> We sell data. A product can be defined as a permission to access data (at
> a cell level). Visibility Labels look like a very good candidate for
> implementing this model.
>
> The implementation works well until we create a new product over old data.
> We can set the visibility label for the new product but, whoops, by
> applying it to the relevant cells we've overwritten all the existing labels
> on those cells, destroying the permissioning of our older products. What to
> do?
>
> One answer would be to append the new visibility label to the existing
> label expressions on the cells with an 'OR'. But I'm not sure that's
> possible .. yet?
>
> Thanks,
>
> Ben
>
> 
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website.<
> http://site.thomsonreuters.com/site/disclosures/>
>

Re: Processing rows in parallel with MapReduce jobs.

2016-04-12 Thread Ted Yu

Please take a look at TableInputFormatBase#getSplits() :

   * Calculates the splits that will serve as input for the map tasks. The

   * number of splits matches the number of regions in a table.

Each mapper would be reading one of the regions.

On Tue, Apr 12, 2016 at 8:18 AM, Ivan Cores gonzalez 
wrote:

> Hi Ted,
> Yes, I mean same region.
>
> I wasn't using the getSplits() function. I'm trying to add it to my code
> but I'm not sure how I have to do it. Is there any example in the website?
> I can not find anything. (By the way, I'm using TableInputFormat, not
> InputFormat)
>
> But just to confirm, with the getSplits() function, Are mappers processing
> rows in the same region executed in parallel? (assuming that there are
> empty
> processors/cores)
>
> Thanks,
> Ivan.
>
>
> - Mensaje original -
> > De: "Ted Yu" 
> > Para: user@hbase.apache.org
> > Enviados: Lunes, 11 de Abril 2016 15:10:29
> > Asunto: Re: Processing rows in parallel with MapReduce jobs.
> >
> > bq. if they are located in the same split?
> >
> > Probably you meant same region.
> >
> > Can you show the getSplits() for the InputFormat of your MapReduce job ?
> >
> > Thanks
> >
> > On Mon, Apr 11, 2016 at 5:48 AM, Ivan Cores gonzalez <
> ivan.co...@inria.fr>
> > wrote:
> >
> > > Hi all,
> > >
> > > I have a small question regarding the MapReduce jobs behaviour with
> HBase.
> > >
> > > I have a HBase test table with only 8 rows. I splitted the table with
> the
> > > hbase shell
> > > split command into 2 splits. So now there are 4 rows in every split.
> > >
> > > I create a MapReduce job that only prints the row key in the log files.
> > > When I run the MapReduce job, every row is processed by 1 mapper. But
> the
> > > mappers
> > > in the same split are executed sequentially (inside the same
> container).
> > > That means,
> > > the first four rows are processed sequentially by 4 mappers. The system
> > > has cores
> > > that are free, so is it possible to process rows in parallel if they
> are
> > > located
> > > in the same split?
> > >
> > > The only way I found to have 8 mappers executed in parallel is split
> the
> > > table
> > > in 8 splits (1 split per row). But obviously this is not the best
> solution
> > > for
> > > big tables ...
> > >
> > > Thanks,
> > > Ivan.
> > >
> >
>

Re: unstable cluster

2016-04-11 Thread Ted Yu

>From region server log:

2016-04-11 03:11:51,589 WARN org.apache.zookeeper.ClientCnxnSocket:
Connected to an old server; r-o mode will be unavailable
2016-04-11 03:11:51,589 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x52ee1452fec5ac has expired,
closing socket connection

>From zookeeper log:

2016-04-11 03:11:27,323 - INFO  [CommitProcessor:0:NIOServerCnxn@1435] -
Closed socket connection for client /172.20.67.19:58404 which had sessionid
0x52ee1452fec71f
2016-04-11 03:11:53,301 - INFO  [CommitProcessor:0:NIOServerCnxn@1435] -
Closed socket connection for client /172.20.67.13:32946 which had sessionid
0x52ee1452fec6ea

Note the 26 second gap.

What do you see in the logs of the other two zookeeper servers ?

Thanks

On Mon, Apr 11, 2016 at 5:08 PM, Ted Tuttle  wrote:

> Hello -
>
> We've started experiencing regular failures of our HBase cluster.  For the
> last week we've had nightly failures about 1hr after a heavy batch process
> starts.
>
> In the logs below we see the failure starting at 2016-04-11 03:11 in
> zookeeper, master and region server logs:
>
> zookeeper:  http://pastebin.com/kf7ja22K
>
> region server: http://pastebin.com/tduJgKqq
>
> master:  http://pastebin.com/0szhi0bJ
>
> The master log seems most interesting.  Here we see problems connecting to
> Zookeeper then a number of region servers dying in quick succession.  From
> the log evidence it appears Zookeeper is not responding rather than the
> more typical GC causing isolated RS to abort.
>
> Any insights on what may be happening here?
>
> Best,
> Ted
>

Re: How to serialize Result in HBase 1.0

2016-04-11 Thread Ted Yu

It seems your code didn't go through.

Please take a look at ResultSerialization and related classes.

Cheers

On Mon, Apr 11, 2016 at 5:29 PM, 乔彦克  wrote:

> Hi, all
> recently we upgrade our HBase  cluster from cdh-0.94 to cdh-1.0. In 0.94
> we use Result.java(implement Writable) as the map out value.
> [image: pasted1]
> but int the newer HBase version Result.java has changed, it can't be
> serialized any more. Is there  any alternative methods to use Result as
>  MapReduce input/output?
> Thanks for any response.
>

Re: Question about open table

2016-04-11 Thread Ted Yu

bq. I am using hbase 2.x

2.0 has not been released yet.
Probably you meant 1.x ?

On Mon, Apr 11, 2016 at 6:48 AM, Yi Jiang  wrote:

> Thanks
> I am using hbase 2.x, so only once to create the connection in my project.
> According to Ted, the getTable is not expensive, then I am able to get and
> close table in each request.
> Jacky
>
> -Original Message-
> From: Yu Li [mailto:car...@gmail.com]
> Sent: Monday, April 11, 2016 12:05 AM
> To: user@hbase.apache.org
> Subject: Re: Question about open table
>
> If using HBase 1.x, please make sure to reuse the connection since
> creating connection is expensive. More specified, don't call
> {{ConnectionFactory.createConnection}} (the 1.x recommended way) multiple
> times but only once and share it among threads, while
> {{HConnectionManager#getConnection}} (the <=0.98 way) would just be ok. FYI.
>
> Best Regards,
> Yu
>
> On 10 April 2016 at 14:08, Yi Jiang  wrote:
>
> > Hi, Ted
> > Thanks for help. The getConnection is just a simple getter
> > method.public Connection getConnection(){return connection;} I have
> > configured my connection in constructor.
> > Jacky
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Saturday, April 09, 2016 11:03 AM
> > To: user@hbase.apache.org
> > Subject: Re: Question about open table
> >
> > Can you show the body of getConnection() ?
> >
> > getTable() itself is not expensive - assuming the same underlying
> > Connection.
> >
> > Cheers
> >
> > On Sat, Apr 9, 2016 at 7:18 AM, Yi Jiang  wrote:
> >
> > > Hi, Guys
> > > I just have a question, I am trying to save the data into table in
> > > HBase I am using
> > >
> > > Table table =
> > > getConnection().getTable(TableName.valueOf(tableName));
> > > ...
> > > ...
> > > Table.close
> > >
> > > My question is that, is the "getTable" expensive?
> > > Shall I use that get table and close table in each saving?
> > > Or I just get table at the beginning, saving all data and then close
> > > the table after all saving.
> > > Because from the aspect of code design, I will save data into the
> > > different table on the flying. So, if it is expensive, I would like
> > > to rewrite the design.
> > > Thank you
> > > Jacky
> > >
> >
>

Re: Processing rows in parallel with MapReduce jobs.

2016-04-11 Thread Ted Yu

bq. if they are located in the same split?

Probably you meant same region.

Can you show the getSplits() for the InputFormat of your MapReduce job ?

Thanks

On Mon, Apr 11, 2016 at 5:48 AM, Ivan Cores gonzalez 
wrote:

> Hi all,
>
> I have a small question regarding the MapReduce jobs behaviour with HBase.
>
> I have a HBase test table with only 8 rows. I splitted the table with the
> hbase shell
> split command into 2 splits. So now there are 4 rows in every split.
>
> I create a MapReduce job that only prints the row key in the log files.
> When I run the MapReduce job, every row is processed by 1 mapper. But the
> mappers
> in the same split are executed sequentially (inside the same container).
> That means,
> the first four rows are processed sequentially by 4 mappers. The system
> has cores
> that are free, so is it possible to process rows in parallel if they are
> located
> in the same split?
>
> The only way I found to have 8 mappers executed in parallel is split the
> table
> in 8 splits (1 split per row). But obviously this is not the best solution
> for
> big tables ...
>
> Thanks,
> Ivan.
>

Re: HBase TTL

2016-04-11 Thread Ted Yu

Have you looked at :

http://hbase.apache.org/book.html#ttl

Please describe your use case.

Thanks

On Mon, Apr 11, 2016 at 2:11 AM, hsdcl...@163.com  wrote:

> hi,
>
> I want to know the  principle of HBase TTL,I would like to use the same
> principle to develop Rowkey of TTL, HBase TTL now only for  Cell.
>
> thanks!
>
> hsdcl...@163.com
>

Re: HBase 1.2 master CPU usage spin

2016-04-11 Thread Ted Yu

Can you look at master log during this period to see what procedure was retried 
?

Turning on DEBUG logging if necessary and pastebin relevant portion of master 
log. 

Thanks

> On Apr 11, 2016, at 1:11 AM, Kevin Bowling  wrote:
> 
> Hi,
> 
> I'm running HBase 1.2.0 on FreeBSD via the ports system (
> http://www.freshports.org/databases/hbase/), and it is generally working
> well.  However, in an HA setup, the HBase master spins at 200% CPU usage
> when it is active and this follows the active master and disappears when
> standby.  Since this cluster is fairly idle, and an older production Linux
> one uses much less master CPU, I am curious what is going on.
> 
> Using visualvm, I can see that the ProcedureExecutor threads are quite
> busy.  Using dtrace, I can see that there is tons of native umtx activity.
> I am guessing that some procedure is failing, and continuously retrying as
> fast as the procedure dispatch allows.  Using dtrace, I can also see that a
> lot of time seems to be spent in the JVM's 'libzip.so' native library.  I'm
> wondering if it's a classloader run amok or something.
> 
> I need to rebuild the JVM with debugging to get more out of dtrace, but the
> JVM doesn't implement a dtrace ustack helper for FreeBSD like it does on
> Solarish.  Hopefully then I can get some idea of what is going on.
> 
> Does this speak to anyone for ideas to look into?  Other than noticing the
> CPU usage in top, the master seems to function fine and is responsive.
> 
> Here's a sample thread dump, this doesn't really jump out to me, but I
> don't know if any of my captures are at the right moment either:
> 
> "ProcedureExecutor-7" - Thread t@116
>   java.lang.Thread.State: WAITING
>at sun.misc.Unsafe.park(Native Method)
>- parking to wait for <4653d04c> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>at
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler.poll(MasterProcedureScheduler.java:145)
>at
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler.poll(MasterProcedureScheduler.java:139)
>at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:803)
>at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75)
>at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:494)
> 
>   Locked ownable synchronizers:
>- None
> 
> "ProcedureExecutor-5" - Thread t@114
>   java.lang.Thread.State: WAITING
>at sun.misc.Unsafe.park(Native Method)
>- parking to wait for <4653d04c> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>at
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler.poll(MasterProcedureScheduler.java:145)
>at
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler.poll(MasterProcedureScheduler.java:139)
>at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:803)
>at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:75)
>at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:494)
> 
>   Locked ownable synchronizers:
>- None
> 
> "ProcedureExecutor-4" - Thread t@113
>   java.lang.Thread.State: WAITING
>at sun.misc.Unsafe.park(Native Method)
>- parking to wait for <4653d04c> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>at
> java.util.concurrent.locks.AbstractQueuedSync

Re: choosing between hbase-spark / spark-hbase

2016-04-10 Thread Ted Yu

For 2.0.0-SNAPSHOT version, you should build trunk by yourself.

For 1.2.0-cdh5.7.0 , please contact cdh mailing list.

On Sun, Apr 10, 2016 at 7:09 PM, yeshwanth kumar 
wrote:

> Thank you for the reply,
>
> i am having trouble in finding out the dependency in maven repository, the
> only dependency i can find is
>
>   
> org.apache.hbase
> hbase-spark
> 1.2.0-cdh5.7.0
> 
>
> from cloudera maven repository,
>
> dependency specified in this page, was not able to resolve
>
> http://hbase.apache.org/hbase-spark/dependency-info.html
>
> do i need to build it from the trunk?
>
> please let me know
>
> Thanks,
> Yeshwanth
>
> -Yeshwanth
> Can you Imagine what I would do if I could do all I can - Art of War
>
> On Tue, Apr 5, 2016 at 5:30 PM, Ted Yu  wrote:
>
> > There are some outstanding bug fixes, e.g. HBASE-15333, for hbase-spark
> > module.
> >
> > FYI
> >
> > On Tue, Apr 5, 2016 at 2:36 PM, Nkechi Achara 
> > wrote:
> >
> > > So Hbase-spark is a continuation of the spark on hbase project, but
> > within
> > > the Hbase project.
> > > They are not any significant differences apart from the fact that Spark
> > on
> > > hbase is not updated.
> > > Dependent on the version you are using it would be more beneficial to
> use
> > > Hbase-Spark
> > >
> > > Kay
> > > On 5 Apr 2016 9:12 pm, "yeshwanth kumar" 
> wrote:
> > >
> > > > i have cloudera cluster,
> > > > i am exploring spark with HBase,
> > > >
> > > > after going through this blog
> > > >
> > > >
> > > >
> > >
> >
> http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
> > > >
> > > > i found two options for using Spark with HBase,
> > > >
> > > > Cloudera's Spark on HBase or
> > > > Apache hbase-spark.
> > > >
> > > > do they have significance difference?
> > > > which one should i use,
> > > >
> > > > can someone please point me out to their API documentation.
> > > > i did searched for documentation, but couldn't find it.
> > > >
> > > >
> > > > Thanks,
> > > > -Yeshwanth
> > > > Can you Imagine what I would do if I could do all I can - Art of War
> > > >
> > >
> >
>

Re: Question about open table

2016-04-09 Thread Ted Yu

Can you show the body of getConnection() ?

getTable() itself is not expensive - assuming the same underlying
Connection.

Cheers

On Sat, Apr 9, 2016 at 7:18 AM, Yi Jiang  wrote:

> Hi, Guys
> I just have a question, I am trying to save the data into table in HBase
> I am using
>
> Table table = getConnection().getTable(TableName.valueOf(tableName));
> ...
> ...
> Table.close
>
> My question is that, is the "getTable" expensive?
> Shall I use that get table and close table in each saving?
> Or I just get table at the beginning, saving all data and then close the
> table after all saving.
> Because from the aspect of code design, I will save data into the
> different table on the flying. So, if it is expensive, I would like to
> rewrite the design.
> Thank you
> Jacky
>

Re: Column qualifier name in byte array in hbase

2016-04-09 Thread Ted Yu

I guess you have read:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily

where example for column family mapping is given.

If you need to map column qualifier, probably poll hive mailing list on
syntax.

FYI

On Sat, Apr 9, 2016 at 12:30 AM, Viswanathan J 
wrote:

> Hi,
>
> I'm trying to create external hive table to map hbase table and i'm facing
> issue in the column mapping because the column qualifier is in byte array
> as below, how to define/map this.
>
> eg.,
>
> column=cf:\x00\x00\x00\x00\x00\x00\x00\x00, timestamp=1455990558044
>
> --
> Regards,
> Viswa.J
>

Re: "Some problems after upgrade from 0.98.6 to 0.98.17"

2016-04-07 Thread Ted Yu

The 'No StoreFiles for' is logged at DEBUG level.

Was 9151f75eaa7d00a81e5001f4744b8b6a among the regions which didn't finish
split ?

Can you pastebin more of the master log during this period ?

Any other symptom that you observed ?

Cheers

On Thu, Apr 7, 2016 at 12:59 AM, Heng Chen  wrote:

> This is log about one region i grep from master log
>
> 2016-04-07 12:20:53,984 INFO  [AM.ZK.Worker-pool2-t145]
> master.RegionStates: Transition null to {9151f75eaa7d00a81e5001f4744b8b6a
> state=SPLITTING_NEW, ts=1460002853984,
> server=dx-ape-regionserver40-online,60020,1459998494013}
> 2016-04-07 12:20:55,326 INFO  [AM.ZK.Worker-pool2-t147]
> master.RegionStates: Transition {9151f75eaa7d00a81e5001f4744b8b6a
> state=SPLITTING_NEW, ts=1460002855326,
> server=dx-ape-regionserver40-online,60020,1459998494013} to
> {9151f75eaa7d00a81e5001f4744b8b6a state=OPEN, ts=1460002855326,
> server=dx-ape-regionserver40-online,60020,1459998494013}
> 2016-04-07 12:20:55,326 INFO  [AM.ZK.Worker-pool2-t147]
> master.RegionStates: Onlined 9151f75eaa7d00a81e5001f4744b8b6a on
> dx-ape-regionserver40-online,60020,1459998494013
> 2016-04-07 12:20:55,328 INFO  [AM.ZK.Worker-pool2-t147]
> master.AssignmentManager: Handled SPLIT event;
>
> parent=apolo_pdf,\x00\x00\x00\x00\x01\x94\xC0\xA8,1457693428562.3410ea47a97d0aefb12ec62e8e89b605.,
> daughter
>
> a=apolo_pdf,\x00\x00\x00\x00\x01\x94\xC0\xA8,1460002853961.9151f75eaa7d00a81e5001f4744b8b6a.,
> daughter
>
> b=apolo_pdf,\x00\x00\x00\x00\x01\xA2\x0D\x96,1460002853961.a7d6d735ccbf47e0a9d3016b8fef181a.,
> on dx-ape-regionserver40-online,60020,1459998494013
> 2016-04-07 12:21:44,083 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.StoreFileInfo: reference
>
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/16b7f857eb6741a5bcaaa5516034929f.3410ea47a97d0aefb12ec62e8e89b605'
> to region=3410ea47a97d0aefb12ec62e8e89b605
> hfile=16b7f857eb6741a5bcaaa5516034929f
> 2016-04-07 12:21:44,123 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.StoreFileInfo: reference
>
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/1a3f77a7588a4ad38d34ed97f6c095be.3410ea47a97d0aefb12ec62e8e89b605'
> to region=3410ea47a97d0aefb12ec62e8e89b605
> hfile=1a3f77a7588a4ad38d34ed97f6c095be
> 2016-04-07 12:21:44,132 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.StoreFileInfo: reference
>
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/544552889a4c4de99d52814b3c229c30.3410ea47a97d0aefb12ec62e8e89b605'
> to region=3410ea47a97d0aefb12ec62e8e89b605
> hfile=544552889a4c4de99d52814b3c229c30
> 2016-04-07 12:21:44,138 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.StoreFileInfo: reference
>
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/fcf658954ff84e998eb71ee6477c2ebe.3410ea47a97d0aefb12ec62e8e89b605'
> to region=3410ea47a97d0aefb12ec62e8e89b605
> hfile=fcf658954ff84e998eb71ee6477c2ebe
> 2016-04-07 12:21:44,138 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.StoreFileInfo: reference
>
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/16b7f857eb6741a5bcaaa5516034929f.3410ea47a97d0aefb12ec62e8e89b605'
> to region=3410ea47a97d0aefb12ec62e8e89b605
> hfile=16b7f857eb6741a5bcaaa5516034929f
> 2016-04-07 12:21:44,177 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.StoreFileInfo: reference
>
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/1a3f77a7588a4ad38d34ed97f6c095be.3410ea47a97d0aefb12ec62e8e89b605'
> to region=3410ea47a97d0aefb12ec62e8e89b605
> hfile=1a3f77a7588a4ad38d34ed97f6c095be
> 2016-04-07 12:21:44,179 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.StoreFileInfo: reference
>
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/544552889a4c4de99d52814b3c229c30.3410ea47a97d0aefb12ec62e8e89b605'
> to region=3410ea47a97d0aefb12ec62e8e89b605
> hfile=544552889a4c4de99d52814b3c229c30
> 2016-04-07 12:21:44,181 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.StoreFileInfo: reference
>
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/fcf658954ff84e998eb71ee6477c2ebe.3410ea47a97d0aefb12ec62e8e89b605'
> to region=3410ea47a97d0aefb12ec62e8e89b605
> hfile=fcf658954ff84e998eb71ee6477c2ebe
> 2016-04-07 12:21:44,184 DEBUG
> [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> regionserver.HRegionFileSystem: No StoreFiles for:
>
> hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/m
> 2016-04-07 12:23:54,259 DEBUG [region-location-4]
> regionserver.StoreFileInfo: refe

Re: HBase region pre-split and store evenly

2016-04-07 Thread Ted Yu

Dropping user@hadoop.

Can you tell us more about your schema design ?

Was the behavior below happening even when you have pre-split your table ?

Cheers

On Thu, Apr 7, 2016 at 2:48 AM, Viswanathan J 
wrote:

> Hi Ted,
>
> When we inserting data into hbase it's utilize only the single region
> server, not other region servers and we have 3 to 4 rs.
>
> On Thu, Apr 7, 2016 at 3:27 AM, Ted Yu  wrote:
>
>> Please take a look at:
>> http://hbase.apache.org/book.html#disable.splitting
>>
>> especially the section titled:
>> Determine the Optimal Number of Pre-Split Regions
>>
>> For writing data evenly across the cluster, can you tell us some more
>> about your use case(s) ?
>>
>> Thanks
>>
>> On Tue, Apr 5, 2016 at 11:48 PM, Viswanathan J <
>> jayamviswanat...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Please help on region pre-split and write data evenly in all the region
>>> servers.
>>>
>>> --
>>> Regards,
>>> Viswa.J
>>>
>>
>>
>
>
> --
> Regards,
> Viswa.J
>

Re: HBase region pre-split and store evenly

2016-04-06 Thread Ted Yu

Please take a look at:
http://hbase.apache.org/book.html#disable.splitting

especially the section titled:
Determine the Optimal Number of Pre-Split Regions

For writing data evenly across the cluster, can you tell us some more about
your use case(s) ?

Thanks

On Tue, Apr 5, 2016 at 11:48 PM, Viswanathan J 
wrote:

> Hi,
>
> Please help on region pre-split and write data evenly in all the region
> servers.
>
> --
> Regards,
> Viswa.J
>

Re: choosing between hbase-spark / spark-hbase

2016-04-05 Thread Ted Yu

There are some outstanding bug fixes, e.g. HBASE-15333, for hbase-spark
module.

FYI

On Tue, Apr 5, 2016 at 2:36 PM, Nkechi Achara 
wrote:

> So Hbase-spark is a continuation of the spark on hbase project, but within
> the Hbase project.
> They are not any significant differences apart from the fact that Spark on
> hbase is not updated.
> Dependent on the version you are using it would be more beneficial to use
> Hbase-Spark
>
> Kay
> On 5 Apr 2016 9:12 pm, "yeshwanth kumar"  wrote:
>
> > i have cloudera cluster,
> > i am exploring spark with HBase,
> >
> > after going through this blog
> >
> >
> >
> http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
> >
> > i found two options for using Spark with HBase,
> >
> > Cloudera's Spark on HBase or
> > Apache hbase-spark.
> >
> > do they have significance difference?
> > which one should i use,
> >
> > can someone please point me out to their API documentation.
> > i did searched for documentation, but couldn't find it.
> >
> >
> > Thanks,
> > -Yeshwanth
> > Can you Imagine what I would do if I could do all I can - Art of War
> >
>

Re: delete hbase data in the previous month

2016-04-05 Thread Ted Yu

Have you considered setting TTL ?

bq. HBase will not be easily removed

Can you clarify the above ?

Cheers

On Tue, Apr 5, 2016 at 12:03 AM, hsdcl...@163.com  wrote:

>
> If I want to delete hbase data in the previous month , how do? To avoid
> errors,
> when delete date ,  we required to  block   insertion operation. HBase can
> do it?
> See the document said,  the historical data , HBase will not be easily
> removed, then the thing that  I need  can achive?
>
>
> hsdcl...@163.com
>

Re: hbase custom scan

2016-04-04 Thread Ted Yu

How many regions does your table have ?

After sorting, is there a chance that the top N rows come from distinct
regions ?

On Mon, Apr 4, 2016 at 8:27 PM, Shushant Arora 
wrote:

> Hi
>
> I have a requirement to scan a hbase table based on insertion timestamp.
> I need to fetch the keys sorted by insertion timestamp not by key .
>
> I can't made timestamp as prefix of key to avoid hot spotting.
> Is there any efficient way possible for this requirement.
>
> Thanks!
>

Re: Compacting same table

2016-04-03 Thread Ted Yu

bq. I have been informed

Can you disclose the source of such information ?

For hbase.hstore.compaction.kv.max , hbase-default.xml has:

The maximum number of KeyValues to read and then write in a batch when
flushing or
  compacting. Set this lower if you have big KeyValues and problems
with Out Of Memory
  Exceptions Set this higher if you have wide, small rows.

Is the above description not clear ?

Thanks

On Sun, Apr 3, 2016 at 4:32 AM, Sumit Nigam 
wrote:

> Hi,
> I have been informed that compacting (manual) the same hbase table takes
> same amount of time even when done in quick succession. This seems
> counter-intuitive because an already compacted table should not take same
> amount of time.
> Also, what is the use of hbase.hstore.compaction.kv.max setting in
> compaction? I am unable to determine its implications on other compaction
> tuning  factors? The default of 10 seems too less.
> Thanks,Sumit

Re: Back up HBase tables before hadoop upgrade

2016-04-01 Thread Ted Yu

bq. copy whole hbase directory to my local disk

I doubt your local disk has enough space for all your data.
Plus, what if some part of the local disk goes bad ?
With hdfs, the chance of data loss is very low.

w.r.t. hbase snapshot, you can refer to
http://hbase.apache.org/book.html#ops.snapshots

For Export, see http://hbase.apache.org/book.html#export

Note, hbase snapshot doesn't generate sequence file.

On Fri, Apr 1, 2016 at 7:10 AM, Chathuri Wimalasena 
wrote:

> Thank you. What if I stop HBase and copy whole hbase directory to my local
> disk ? Will that work if something went wrong with the upgrade ?
>
> Also could you please tell me what's the difference between export and
> snapshot ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 30, 2016 at 10:01 AM, Ted Yu  wrote:
>
> > You can also snapshot each of the 647 tables.
> > In case something goes unexpected, you can restore any of them.
> >
> > FYI
> >
> > On Wed, Mar 30, 2016 at 6:46 AM, Chathuri Wimalasena <
> kamalas...@gmail.com
> > >
> > wrote:
> >
> > > Hi All,
> > >
> > > We have production system using hadoop 2.5.1 and HBase 0.94.23. We have
> > > nearly around 200 TB of data in HDFS and we are planning to upgrade to
> > > newer hadoop version 2.7.2. In HBase we have roughly 647 tables. Before
> > the
> > > upgrade, we want to back up HBase tables in case of data loss or
> > corruption
> > > while upgrade. We are thinking of using export and import functionality
> > to
> > > export each table. Is there any other recommended way to back up hbase
> > > tables ?
> > >
> > > Thanks,
> > > Chathuri
> > >
> >
>

Re: build error

2016-04-01 Thread Ted Yu

In refguide, I don't see -Dsnappy mentioned.
I didn't find snappy in pom.xml either.

Have you tried building without this -D ?

On Fri, Apr 1, 2016 at 12:40 AM, Micha  wrote:

> Hi,
>
> this is my first maven build, thought this should just work :-)
>
> after calling:
>
> MAVEN_OPTS="-Xmx2g" mvn site install assembly:assembly -DskipTests
> -Dhadoop-two.version=2.7.2   -Dsnappy
>
>
> I get:
>
> Downloading:
>
> http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom
> Downloading:
>
> http://repository.apache.org/snapshots/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom
> [WARNING] The POM for org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT
> is missing, no dependency information available
>
>
> this leads to:
>
> ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on
> project hbase: failed to get report for
> org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal on
> project hbase-server: Could not resolve dependencies for project
> org.apache.hbase:hbase-server:jar:1.1.4: Could not find artifact
> org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT in apache release
> (https://repository.apache.org/content/repositories/releases/) -> [Help 1]
> [ERROR]
>
>
>
> how to fix this?
>
> thanks,
>  Michael
>
>

Re: difference between dus and df output

2016-03-31 Thread Ted Yu

Have you performed major compaction lately ?

Are there non-expired hbase snapshots ?

Cheers

On Thu, Mar 31, 2016 at 2:50 PM, Ted Tuttle  wrote:

> This is very interesting, Ted. Thank you.
>
> We are only running HBase on hdfs.
>
> Does this mostly empty block appending behavior make sense for HBase-only
> usage?
>
> If this is, in fact, unused storage how do we get it back?
>
> Currently df shows 75% filled while du shows 25%.  The former is prompting
> us to consider more hardware.  If in fact we are 25% we don't need to.
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Thursday, March 31, 2016 1:13 PM
> To: user@hbase.apache.org
> Subject: difference between dus and df output
>
> Have you seen this thread ?
>
> http://search-hadoop.com/m/uOzYtatlmAcqgzM
>
> On Thu, Mar 31, 2016 at 11:58 AM, Ted Tuttle  wrote:
>
> > Hello-
> >
> > We are running v0.94.9 cluster.
> >
> > I am seeing that 'fs -dus' reports 24TB used and 'fs -df' reports
> > 74.TB used.
> >
> > Does anyone know why these do not reconcile? Our replication factor is
> > 2 so that is not a likely explanation.
> >
> > Shown below are results from my cluster (doctored to TB for ease of
> > reading):
> >
> > bash-4.1$ hadoop fs -dus /hbase
> > hdfs://host/hbase  24.5TB
> >
> > bash-4.1$ hadoop fs -df /hbase
> > Filesystem  SizeUsedAvail   Use%
> > /hbase  103.8TB 74.2TB 24.3TB  71%
> >
>

difference between dus and df output

2016-03-31 Thread Ted Yu

Have you seen this thread ?

http://search-hadoop.com/m/uOzYtatlmAcqgzM

On Thu, Mar 31, 2016 at 11:58 AM, Ted Tuttle  wrote:

> Hello-
>
> We are running v0.94.9 cluster.
>
> I am seeing that 'fs -dus' reports 24TB used and 'fs -df' reports 74.TB
> used.
>
> Does anyone know why these do not reconcile? Our replication factor is 2
> so that is not a likely explanation.
>
> Shown below are results from my cluster (doctored to TB for ease of
> reading):
>
> bash-4.1$ hadoop fs -dus /hbase
> hdfs://host/hbase  24.5TB
>
> bash-4.1$ hadoop fs -df /hbase
> Filesystem  SizeUsedAvail   Use%
> /hbase  103.8TB 74.2TB 24.3TB  71%
>

Re: find size of each table in the cluster

2016-03-31 Thread Ted Yu

bq. COMPRESSION => 'LZ4',

The answer is given by above attribute :-)

On Thu, Mar 31, 2016 at 10:41 AM, marjana  wrote:

> Sure, here's describe of one table:
>
> Table RAWHITS_AURORA-COM is ENABLED
> RAWHITS_AURORA-COM
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'f1', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZ4',
> MIN_VERSIONS => '0', TTL => '216 SECONDS (2
> 5 DAYS)', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '262144', IN_MEMORY
> =>
> 'false', BLOCKCACHE => 'true'}
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899p4078927.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Re: Could not initialize all stores for the region

2016-03-31 Thread Ted Yu

Can you check server log on node 106 around 19:19:20 to see if there was
more clue ?

bq. being informed by the events which happens during their absent somehow?

Did you mean after nodeA came back online ?

Cheers

On Thu, Mar 31, 2016 at 9:57 AM, Zheng Shen  wrote:

> Hi Ted,
>
> Thank you very much for your reply!
>
> We do have mutliple HMaster nodes, one of them is on the offline node
> (let's call it nodeA). Another is on node which is alwasy online (nodeB).
>
> I scanned the audit log, and found that during nodeA offline, the nodeB
> HDFS auditlog shows:
>
> hdfs-audit.log:2016-03-31 19:19:24,158 INFO FSNamesystem.audit:
> allowed=true ugi=hbase (auth:SIMPLE) ip=/192.168.1.106 cmd=delete
> src=/hbase/archive/data/default/vocabulary/2639c4d082646bb4a4fa2d8119f9aaef/cnt/2dc367d0e1c24a3b848c68d3b171b06d
> dst=null perm=null proto=rpc
>
> where (192.168.1.106) is the IP of nodeB.
>
> So it looks like nodeB deleted this file during nodeA's offline. However,
> should'nt services on nodeA (like HMaster and namenode) being informed by
> the events which happens during their absent somehow?
>
> Although we have only 5 nodes in this cluster, we do perform HA on every
> levels of HBase service stack. So yes, there are multiple instances of
> every services as long as it's possible or necessay (e.g. we have 3
> HMaster, 2 name node, 3 journal node)
>
> Thanks,
> Zheng
>
> 
> zhengshe...@outlook.com
>
> From: Ted Yu<mailto:yuzhih...@gmail.com>
> Date: 2016-04-01 00:00
> To: user@hbase.apache.org<mailto:user@hbase.apache.org>
> Subject: Re: Could not initialize all stores for the region
> bq. File does not exist: /hbase/data/default/vocabulary/
> 2639c4d082646bb4a4fa2d8119f9aaef/cnt/2dc367d0e1c24a3b848c68d3b171b06d
>
> Can you search in namenode audit log to see which node initiated the delete
> request of the above file ?
> Then you can search in that node's region server log to get more clue.
>
> bq. hosts the HDFS namenode and datanode, Cloudera Manager, as well as
> HBase master and region server
>
> Can you separate some daemons off this node (e.g. HBase master) ?
> I assume you have second HBase master running somewhere else. Otherwise
> this node becomes the weak point of the cluster.
>
> On Thu, Mar 31, 2016 at 7:58 AM, Zheng Shen 
> wrote:
>
> > Hi,
> >
> > Our Hbase cannot performance any write operation while the read operation
> > are fine. I found the following error from regision server log
> >
> >
> > Could not initialize all stores for the
> >
> region=vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.
> >
> > Failed open of
> >
> region=vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.,
> > starting to roll back the global memstore size.
> > java.io.IOException: java.io.IOException: java.io.FileNotFoundException:
> > File does not exist:
> >
> /hbase/data/default/vocabulary/2639c4d082646bb4a4fa2d8119f9aaef/cnt/2dc367d0e1c24a3b848c68d3b171b06d
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:559)
> > at
> >
> >
> > Opening of region {ENCODED => 19faeb6e4da0b1873f68da271b0f5788, NAME =>
> >
> 'vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.',
> > STARTKEY => '576206_6513944', ENDKEY => '599122_6739914'} failed,
> > transitioning from OPENING to FAILED_OPEN in ZK, expecting version 22
> >
> >
> > We are using Cloudera CDH 5.4.7, the HBase version is 1.0.0-cdh_5.4.7,
> > with HDFS HA enabled (one of the namenode is running on the server being
> > shutdown). Our HBase cluster expereienced an expected node shutdown today
> > for about 4 hours. The node which is shutdown hosts the HDFS namenode and
> > datanode, Cloudera Manager, as w

Re: Could not initialize all stores for the region

2016-03-31 Thread Ted Yu

bq. File does not exist: /hbase/data/default/vocabulary/
2639c4d082646bb4a4fa2d8119f9aaef/cnt/2dc367d0e1c24a3b848c68d3b171b06d

Can you search in namenode audit log to see which node initiated the delete
request of the above file ?
Then you can search in that node's region server log to get more clue.

bq. hosts the HDFS namenode and datanode, Cloudera Manager, as well as
HBase master and region server

Can you separate some daemons off this node (e.g. HBase master) ?
I assume you have second HBase master running somewhere else. Otherwise
this node becomes the weak point of the cluster.

On Thu, Mar 31, 2016 at 7:58 AM, Zheng Shen  wrote:

> Hi,
>
> Our Hbase cannot performance any write operation while the read operation
> are fine. I found the following error from regision server log
>
>
> Could not initialize all stores for the
> region=vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.
>
> Failed open of
> region=vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.,
> starting to roll back the global memstore size.
> java.io.IOException: java.io.IOException: java.io.FileNotFoundException:
> File does not exist:
> /hbase/data/default/vocabulary/2639c4d082646bb4a4fa2d8119f9aaef/cnt/2dc367d0e1c24a3b848c68d3b171b06d
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:559)
> at
>
>
> Opening of region {ENCODED => 19faeb6e4da0b1873f68da271b0f5788, NAME =>
> 'vocabulary,576206_6513944,1459420417369.19faeb6e4da0b1873f68da271b0f5788.',
> STARTKEY => '576206_6513944', ENDKEY => '599122_6739914'} failed,
> transitioning from OPENING to FAILED_OPEN in ZK, expecting version 22
>
>
> We are using Cloudera CDH 5.4.7, the HBase version is 1.0.0-cdh_5.4.7,
> with HDFS HA enabled (one of the namenode is running on the server being
> shutdown). Our HBase cluster expereienced an expected node shutdown today
> for about 4 hours. The node which is shutdown hosts the HDFS namenode and
> datanode, Cloudera Manager, as well as HBase master and region server (5
> nodes in totally in our small clusder).  During the node shuting down,
> beside the services running that that node, the other HDFS namenode,
> failover server, and 2 of 3 journal node are also down. After the node is
> recovered, we restarted the whole CDH cluster, and then it ends like this
> one...
>
> The HDFS checking "hdfs fsck" does not report any corrupted blocks.
>
> Any suggesion about where we should look into for this problem?
>
> Thanks!
> Zheng
>
> 
> zhengshe...@outlook.com
>

Re: find size of each table in the cluster

2016-03-31 Thread Ted Yu

bq. data is distributed on node servers,

Data is on hdfs, i.e. the Data Nodes.

bq. it gets propagated to all data nodes,

If I understand correctly, the -du command queries namenode.

bq. Is this size compressed or uncompressed?

Can you show us the table description (output of describe command in hbase
shell) ?

On Thu, Mar 31, 2016 at 8:38 AM, marjana  wrote:

> Thanks all on your replies.
> This is clustered env, with 2 master nodes and 4 data nodes. Master nodes
> have these components installed (as shown in Ambari UI):
> active hbase master
> history server
> name node
> resource manager
> zookeeper server
> metrics monitor
>
> Node server has these components:
> Data Node
> region server
> metrics monitor
> node manager
>
> So I looked on my node server for the hbase.rootdir, and it points to my
> hdfs://hbasmaserserver:8020//apps/hbase/data.
> Now this is confusing to me as I thought data is distributed on node
> servers, where region servers are.
> I sshed to my masterserver and looked under this dir and did see all my
> tables in my default namespace. Example:
> $ hadoop fs -du -s -h /apps/hbase/data/data/default/RAWHITS_AURORA-COM
> 2.0 G  /apps/hbase/data/data/default/RAWHITS_AURORA-COM
>
> So when I run this command on hbmaster, it gets propagated to all data
> nodes, correct? Is this size compressed or uncompressed?
>
> Many thanks!
> Marjana
>
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899p4078919.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Region Server Crash On Upsert Query Execution

2016-03-31 Thread Ted Yu

The attachments you mentioned did go through.

For #2, please adjust:

hbase.client.scanner.timeout.period
hbase.rpc.timeout

Both have default value of 60 seconds.

If possible, please pastebin server log snippet before it crashed.

Thanks

On Thu, Mar 31, 2016 at 3:04 AM, Amit Shah  wrote:

> A couple of observations
>
> 1. I could see GC pauses in the logs but I do not think that could be
> causing the jvm exists. I have configured the region server heap to be 2
> GB. The jconsole indicates that it hardly reaches 1.5 GB. Kindly find some
> graphs attached.
>
> 2. On another run the phoenix client failed with a socket timeout
> operation. The error is pasted here - http://pastebin.com/mAWwiE2J
> Are there client timeouts for each region server? It looks like the
> default is 1 min. How can I increase them?
>
> Thanks,
> Amit.
>
> On Thu, Mar 31, 2016 at 2:39 PM, Samir Ahmic 
> wrote:
>
>> Hi Amit,
>>
>> Check regionserver logs, usual suspects, are log running GC and HDFS
>> client
>> related issues. Check for compaction queue.
>>
>> Regrads
>> Samir
>>
>> On Thu, Mar 31, 2016 at 10:48 AM, Amit Shah  wrote:
>>
>> > Hi,
>> >
>> > We have been experimenting hbase (version 1.0) and phoenix (version 4.6)
>> > for our OLAP workload. In order to precalculate aggregates we have been
>> > executing an upsert phoenix query that aggregates raw data (over 10 mil
>> > records) to generate an OLAP cube.
>> >
>> > While executing the query, one of the region servers in a cluster of 3
>> RS
>> > crashes. I am trying to figure out what could be causing the region
>> server
>> > to crash.
>> > The server shows high disk operations before the jvm crashed. Kindly
>> find
>> > the disk and other stats attached.
>> >
>> > Any suggestions on where could I look into would be helpful.
>> >
>> > The upsert query that was executed is
>> >
>> > upsert into AGENT_TER_PRO
>> >
>> (AGENT_ID,TERRITORY_ID,PRODUCT_ID,SUM_TOTAL_SALES,SUM_TOTAL_EXPENSES,SUM_UNIT_CNT_SOLD,AVG_PRICE_PER_UNIT)
>> > select /*+ INDEX(TRANSACTIONS  AG_TER_PRO2) */
>> >  AGENT_ID,TERRITORY_ID,PRODUCT_ID, sum(TOTAL_SALES)
>> > SUM_TOTAL_SALES,sum(TOTAL_EXPENSES)
>> SUM_TOTAL_EXPENSES,sum(UNIT_CNT_SOLD)
>> > SUM_UNIT_CNT_SOLD,AVG(PRICE_PER_UNIT)  AVG_PRICE_PER_UNIT  from
>> > TRANSACTIONS   group by AGENT_ID,TERRITORY_ID,PRODUCT_ID;
>> >
>> > Thanks,
>> > Amit.
>> >
>> >
>>
>
>

Re: Leveraging As Much Memory As Possible

2016-03-31 Thread Ted Yu

For #2, I did a quick search in Phoenix code base for bucket cache - I
didn't find match.

Maybe ask on Phoenix mailing list.

On Wed, Mar 30, 2016 at 11:07 PM, Amit Shah  wrote:

> Thanks Ted and Anoop for the replies. A couple of follow up questions
>
> 1. I could see the bucket cache and on heap memory stats reflected on hbase
> region server UI <http://i.imgur.com/4uwaWuC.png>, what are ways to
> monitor
> and see if its getting utilized?
> 2. We are using phoenix as a sql layer above hbase. Are there any phoenix
> configurations that I could benefit in leveraging memory?
>
> Thanks,
> Amit.
>
> On Thu, Mar 31, 2016 at 10:44 AM, Anoop John 
> wrote:
>
> > Ya having HBase side cache will be a better choice rather than HDFS
> > cache IMO.   Yes u r correct...  You might not want to give a very
> > large size for the heap. You can make use of the off heap BucketCache.
> >
> > -Anoop-
> >
> > On Thu, Mar 31, 2016 at 4:35 AM, Ted Yu  wrote:
> > > For #1, please see the top two blogs @ https://blogs.apache.org/hbase/
> > >
> > > FYI
> > >
> > > On Wed, Mar 30, 2016 at 7:59 AM, Amit Shah  wrote:
> > >
> > >> Hi,
> > >>
> > >> I am trying to configure my hbase (version 1.0) phoenix (version -
> 4.6)
> > >> cluster to utilize as much memory as possible on the server hardware.
> We
> > >> have an OLAP workload that allows users to perform interactive
> analysis
> > >> over huge sets of data. While reading about hbase configuration I came
> > >> across two configs
> > >>
> > >> 1. Hbase bucket cache
> > >> <
> > >>
> >
> http://blog.asquareb.com/blog/2014/11/24/how-to-leverage-large-physical-memory-to-improve-hbase-read-performance
> > >> >
> > >> (off heap) which looks like a good option to bypass garbage
> collection.
> > >> 2. Hadoop pinned hdfs blocks
> > >> <
> > http://blog.cloudera.com/blog/2014/08/new-in-cdh-5-1-hdfs-read-caching/>
> > >> (max locked memory) concept that loads the hdfs blocks in memory, but
> > given
> > >> that hbase is configured with short circuit reads I assume this config
> > may
> > >> not be of much help. Instead it would be right to increase hbase
> region
> > >> server heap memory. Is my understanding right?
> > >>
> > >> We use HBase with Phoenix.
> > >> Kindly let me know your thoughts or suggestions on any more options
> > that I
> > >> should explore
> > >>
> > >> Thanks,
> > >> Amit.
> > >>
> >
>

Re: find size of each table in the cluster

2016-03-30 Thread Ted Yu

bq. hbase version is 1.1.1.2.3

I don't think there was ever such a release - there should be only 3 dots.

bq. /hbase is the default storage location for tables in hdfs

the root dir is given by hbase.rootdir config parameter.

Here is sample listing:

http://pastebin.com/ekF4tsYn

Under data, you would see:

drwxr-xr-x   - hbase hdfs  0 2016-03-22 20:26
/apps/hbase/data/data/default
drwxr-xr-x   - hbase hdfs  0 2016-03-14 19:13
/apps/hbase/data/data/hbase

hbase is system namespace.

Under default (or your own namespace), you would get table dir. Here is a
sample:

drwxr-xr-x   - hbase hdfs  0 2016-03-22 20:26
/apps/hbase/data/data/default/elog_pn_split

On Wed, Mar 30, 2016 at 7:26 PM, Stephen Durfey  wrote:

> I believe the easiest way would be to run 'hadoop dfs -du -h /hbase'. I
> believe /hbase is the default storage location for tables in hdfs. The size
> will be either compressed or uncompressed, depending upon if compression is
> enabled.
>
>
>
>
>
>
> On Wed, Mar 30, 2016 at 6:32 PM -0700, "marjana" 
> wrote:
>
>
>
>
>
>
>
>
>
>
> Hello,
> I am new to hBase, so sorry if I am talking nonsense.
>
> I am trying to figure out a way how to find the total size of each table in
> my hBase.
> I have looked into hbase shell commands. There's "status 'detailed'", that
> shows storefileSizeMB. If I were to add all of these grouped by tablename,
> would that be the correct way to show MB used per table?
> Is there any other (easier/cleaner) way?
> hbase version is 1.1.1.2.3, HDFS: 2.7.1
> Thanks
> Marjana
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/find-size-of-each-table-in-the-cluster-tp4078899.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
>
>
>
>

Re: Leveraging As Much Memory As Possible

2016-03-30 Thread Ted Yu

For #1, please see the top two blogs @ https://blogs.apache.org/hbase/

FYI

On Wed, Mar 30, 2016 at 7:59 AM, Amit Shah  wrote:

> Hi,
>
> I am trying to configure my hbase (version 1.0) phoenix (version - 4.6)
> cluster to utilize as much memory as possible on the server hardware. We
> have an OLAP workload that allows users to perform interactive analysis
> over huge sets of data. While reading about hbase configuration I came
> across two configs
>
> 1. Hbase bucket cache
> <
> http://blog.asquareb.com/blog/2014/11/24/how-to-leverage-large-physical-memory-to-improve-hbase-read-performance
> >
> (off heap) which looks like a good option to bypass garbage collection.
> 2. Hadoop pinned hdfs blocks
> 
> (max locked memory) concept that loads the hdfs blocks in memory, but given
> that hbase is configured with short circuit reads I assume this config may
> not be of much help. Instead it would be right to increase hbase region
> server heap memory. Is my understanding right?
>
> We use HBase with Phoenix.
> Kindly let me know your thoughts or suggestions on any more options that I
> should explore
>
> Thanks,
> Amit.
>

Re: hbase version and hadoop version

2016-03-30 Thread Ted Yu

Please refer to http://hbase.apache.org/book.html#maven.release (especially
4. Build the binary tarball.)

Pass the following on command line:

-Dhadoop-two.version=2.7.2

FYI

On Wed, Mar 30, 2016 at 7:41 AM, Micha  wrote:

> Hi,
>
>
> hbase ships with hadoop jars in the libs directory which are older
> (2.5.2) than the actual hadoop version (2.7.2).
> So after upgrading hadoop to 2.7.2, should the jars in the libs
> directory of hbase (1.1.2) be replaced with the ones which come with
> hadoop?
>
>
> Thanks,
>  Michael
>

Re: Back up HBase tables before hadoop upgrade

2016-03-30 Thread Ted Yu

You can also snapshot each of the 647 tables.
In case something goes unexpected, you can restore any of them.

FYI

On Wed, Mar 30, 2016 at 6:46 AM, Chathuri Wimalasena 
wrote:

> Hi All,
>
> We have production system using hadoop 2.5.1 and HBase 0.94.23. We have
> nearly around 200 TB of data in HDFS and we are planning to upgrade to
> newer hadoop version 2.7.2. In HBase we have roughly 647 tables. Before the
> upgrade, we want to back up HBase tables in case of data loss or corruption
> while upgrade. We are thinking of using export and import functionality to
> export each table. Is there any other recommended way to back up hbase
> tables ?
>
> Thanks,
> Chathuri
>

Re: processing in coprocessor and region splitting

2016-03-25 Thread Ted Yu

bq. calculating another new attributes of a trade

Can you put the new attributes in separate columns ?

Cheers

On Fri, Mar 25, 2016 at 12:38 PM, Daniel Połaczański  wrote:

> The data is set of trades and the processing is some kind of enrichment
> (calculating another new attributes of a trade). All attributes are needed
> (the original and new)
>
> 2016-03-25 18:41 GMT+01:00 Ted Yu :
>
> > bq. During the processing the size of the data is doubled.
> >
> > This explains the frequent split :-)
> >
> > Is the original data needed after post-processing (maybe for auditing) ?
> >
> > Cheers
> >
> > On Fri, Mar 25, 2016 at 10:32 AM, Daniel Połaczański <
> > dpolaczan...@gmail.com
> > > wrote:
> >
> > > I am testing different solutions (POC).
> > > The region size currenlty is 32MB (I know it should be >= 1GB, but we
> are
> > > testing different solutions with smaller amount of the data ). So
> > > increasing region size is not a solution. Our problems can happen even
> > when
> > > a region will be 1 GB. We want to proces the data with coprocessor and
> > > hadoop map reduce. I can not have one big Region because I want
> sensible
> > > degree of paralerism (with Map Reduce and coprocessors).
> > >
> > > Increasing region size + pre-splitting  is not an option as well
> because
> > I
> > > know nothing about keys(random long).
> > >
> > > During the processing the size of the data is doubled.
> > >
> > > And yes, coprocessor rewrites a lot of the data written into the table.
> > The
> > > whole record is serialized to avro and stored in one column (storing
> > single
> > > attribute in single column we will try in the next POC)
> > >
> > > it is not a typical big data project where we can allow former analysis
> > of
> > > the data:)
> > >
> > > 2016-03-25 17:38 GMT+01:00 Ted Yu :
> > >
> > > > What's the current region size you use ?
> > > >
> > > > bq. During the processing size of the data gets increased
> > > >
> > > > Can you give us some quantitative measure as to how much increase you
> > > > observed (w.r.t. region size) ?
> > > >
> > > > bq. I was looking for some "global lock" in source code
> > > >
> > > > Probably not a good idea using global lock.
> > > >
> > > > I am curious, looks like your coprocesser may rewrite a lot of data
> > > written
> > > > into the table.
> > > > Can client side accommodate such logic so that the rewrite is
> reduced ?
> > > >
> > > > Thanks
> > > >
> > > > On Fri, Mar 25, 2016 at 8:55 AM, Daniel Połaczański <
> > > > dpolaczan...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > I have some processing in my coprocesserService which modifies the
> > > > existing
> > > > > data in place. It iterates over every row, modifies and puts it
> back
> > to
> > > > > region. The table can be modified by only one client.
> > > > >
> > > > > During the processing size of the data gets increased -> region's
> > size
> > > > get
> > > > > increased -> region's split happens. It makes that the processing
> is
> > > > > stopped by exception NotServingRegionException (because region is
> > > closed
> > > > > and splited to two new regions so it is closed and doesn't exist
> > > > anymore).
> > > > >
> > > > > Is there any clean way to block Region's splitting?
> > > > >
> > > > > I was looking for some "global lock" in source code but I haven't
> > found
> > > > > anything helpfull.
> > > > > Another idea is to create custom RegionSplitPolicy and explicilty
> set
> > > > some
> > > > > Flag which will return false in shouldSplit(), but I'm not sure yet
> > if
> > > it
> > > > > is safe.
> > > > > Could you advise?
> > > > > Regards
> > > > >
> > > >
> > >
> >
>

Re: processing in coprocessor and region splitting

2016-03-25 Thread Ted Yu

bq. During the processing the size of the data is doubled.

This explains the frequent split :-)

Is the original data needed after post-processing (maybe for auditing) ?

Cheers

On Fri, Mar 25, 2016 at 10:32 AM, Daniel Połaczański  wrote:

> I am testing different solutions (POC).
> The region size currenlty is 32MB (I know it should be >= 1GB, but we are
> testing different solutions with smaller amount of the data ). So
> increasing region size is not a solution. Our problems can happen even when
> a region will be 1 GB. We want to proces the data with coprocessor and
> hadoop map reduce. I can not have one big Region because I want sensible
> degree of paralerism (with Map Reduce and coprocessors).
>
> Increasing region size + pre-splitting  is not an option as well because I
> know nothing about keys(random long).
>
> During the processing the size of the data is doubled.
>
> And yes, coprocessor rewrites a lot of the data written into the table. The
> whole record is serialized to avro and stored in one column (storing single
> attribute in single column we will try in the next POC)
>
> it is not a typical big data project where we can allow former analysis of
> the data:)
>
> 2016-03-25 17:38 GMT+01:00 Ted Yu :
>
> > What's the current region size you use ?
> >
> > bq. During the processing size of the data gets increased
> >
> > Can you give us some quantitative measure as to how much increase you
> > observed (w.r.t. region size) ?
> >
> > bq. I was looking for some "global lock" in source code
> >
> > Probably not a good idea using global lock.
> >
> > I am curious, looks like your coprocesser may rewrite a lot of data
> written
> > into the table.
> > Can client side accommodate such logic so that the rewrite is reduced ?
> >
> > Thanks
> >
> > On Fri, Mar 25, 2016 at 8:55 AM, Daniel Połaczański <
> > dpolaczan...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I have some processing in my coprocesserService which modifies the
> > existing
> > > data in place. It iterates over every row, modifies and puts it back to
> > > region. The table can be modified by only one client.
> > >
> > > During the processing size of the data gets increased -> region's size
> > get
> > > increased -> region's split happens. It makes that the processing is
> > > stopped by exception NotServingRegionException (because region is
> closed
> > > and splited to two new regions so it is closed and doesn't exist
> > anymore).
> > >
> > > Is there any clean way to block Region's splitting?
> > >
> > > I was looking for some "global lock" in source code but I haven't found
> > > anything helpfull.
> > > Another idea is to create custom RegionSplitPolicy and explicilty set
> > some
> > > Flag which will return false in shouldSplit(), but I'm not sure yet if
> it
> > > is safe.
> > > Could you advise?
> > > Regards
> > >
> >
>

Re: processing in coprocessor and region splitting

2016-03-25 Thread Ted Yu

What's the current region size you use ?

bq. During the processing size of the data gets increased

Can you give us some quantitative measure as to how much increase you
observed (w.r.t. region size) ?

bq. I was looking for some "global lock" in source code

Probably not a good idea using global lock.

I am curious, looks like your coprocesser may rewrite a lot of data written
into the table.
Can client side accommodate such logic so that the rewrite is reduced ?

Thanks

On Fri, Mar 25, 2016 at 8:55 AM, Daniel Połaczański 
wrote:

> Hi,
> I have some processing in my coprocesserService which modifies the existing
> data in place. It iterates over every row, modifies and puts it back to
> region. The table can be modified by only one client.
>
> During the processing size of the data gets increased -> region's size get
> increased -> region's split happens. It makes that the processing is
> stopped by exception NotServingRegionException (because region is closed
> and splited to two new regions so it is closed and doesn't exist anymore).
>
> Is there any clean way to block Region's splitting?
>
> I was looking for some "global lock" in source code but I haven't found
> anything helpfull.
> Another idea is to create custom RegionSplitPolicy and explicilty set some
> Flag which will return false in shouldSplit(), but I'm not sure yet if it
> is safe.
> Could you advise?
> Regards
>

Re: Inconsistent scan performance

2016-03-25 Thread Ted Yu

James:
Another experiment you can do is to enable region replica - HBASE-10070. 

This would bring down the read variance greatly. 

> On Mar 25, 2016, at 2:41 AM, Nicolas Liochon  wrote:
> 
> The read path is much more complex than the write one, so the response time
> has much more variance.
> The gap is so wide here that I would bet on Ted's or Stack's points, but
> here are a few other sources of variance:
> - hbase cache: as Anoop said, may be the data is already in the hbase cache
> (setCacheBlocks(false), means "don't add blocks to the cache", not "don't
> use the cache")
> - OS cache: and if the data is not in HBase cache may be it is in the
> operating system cache (for example if you run the test multiple times)
> - data locality: if you're lucky the data is local to the region server. If
> you're not, the reads need an extra network hoop.
> - number of store: more hfiles/stores per region => slower reads.
> - number of versions and so on: sub case of the previous one: if the rows
> have been updated multiple times and the compaction has not ran yet, you
> will read much more data.
> - (another subcase): the data has not been flushed yet and is available in
> the memstore => fast read.
> 
> None of these points has any importance for the the write path. Basically
> the writes variance says nothing about the variance you will get on the
> reads.
> 
> IIRC, locality and number of stores are visible in HBase UI. Doing a table
> flush and then running a major compaction generally helps to stabilize
> response time when you do a test. But it should not explain the x25 you're
> seeing, there is something else somewhere else. I don't get the
> regionserver boundaries you're mentioning: there is no boundary between
> regionservers. A regionserver can host A->D and M->S while another hosts
> D->M and S->Z for example.
> 
>> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John  wrote:
>> 
>> I see you set cacheBlocks to be false on the Scan.   By any chance on
>> some other RS(s), the data you are looking for is already in cache?
>> (Any previous scan or  by cache on write)  And there are no concurrent
>> writes any way right?   This much difference in time !  One
>> possibility is blocks avail or not avail in cache..
>> 
>> -Anoop-
>> 
>>> On Fri, Mar 25, 2016 at 11:04 AM, Stack  wrote:
>>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
>>> james.johansvi...@gmail.com> wrote:
>>> 
 Hello all,
 
 So, I wrote a Java application for HBase that does a partitioned
>> full-table
 scan according to a set number of partitions. For example, if there are
>> 20
 partitions specified, then 20 separate full scans are launched that
>> cover
 an equal slice of the row identifier range.
 
 The rows are uniformly distributed throughout the RegionServers.
>>> 
>>> 
>>> How many RegionServers? How many Regions? Are Regions evenly distributed
>>> across the servers? If you put all partitions on one machine and then run
>>> your client, do the timings even out?
>>> 
>>> The disparity seems really wide.
>>> 
>>> St.Ack
>>> 
>>> 
>>> 
>>> 
 I
 confirmed this through the hbase shell. I have only one column family,
>> and
 each row has the same number of column qualifiers.
 
 My problem is that the individual scan performance is wildly
>> inconsistent
 even though they fetch approximately a similar number of rows. This
 inconsistency appears to be random with respect to hosts or
>> regionservers
 or partitions or CPU cores. I am the only user of the fleet and not
>> running
 any other concurrent HBase operation.
 
 I started measuring from the beginning of the scan and stopped measuring
 after the scan was completed. I am not doing any logic with the results,
 just scanning them.
 
 For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
 100+ seconds. This seems a little too bouncy for me. Does anyone have
>> any
 insight? By comparison, a similar utility I wrote to upsert to
 regionservers was very consistent in ops/sec and I had no issues with
>> it.
 
 Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
 see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
 output I saved that used 130 partitions.
 
 total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
 ops/sec:36358.38150289017
 total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
 ops/sec:31176.91380349608
 total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
 ops/sec:30772.08014764039
 total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
 ops/sec:7051.235410034865
 total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
 ops/sec:6046.3170939508955
 total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
 ops/sec:4803.316900101075
 total # parti

Re: Inconsistent scan performance

2016-03-24 Thread Ted Yu

Crossing region boundaries which happen to be on different servers may be.

On Thu, Mar 24, 2016 at 5:49 PM, James Johansville <
james.johansvi...@gmail.com> wrote:

> In theory they should be aligned with *regionserver* boundaries. Would
> crossing multiple regions on the same regionserver result in the big
> performance difference being seen here?
>
> I am using Hortonworks HBase 1.1.2
>
> On Thu, Mar 24, 2016 at 5:32 PM, Ted Yu  wrote:
>
> > I assume the partitions' boundaries don't align with region boundaries,
> > right ?
> >
> > Meaning some partitions would cross region boundaries.
> >
> > Which hbase release do you use ?
> >
> > Thanks
> >
> > On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
> > james.johansvi...@gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > So, I wrote a Java application for HBase that does a partitioned
> > full-table
> > > scan according to a set number of partitions. For example, if there are
> > 20
> > > partitions specified, then 20 separate full scans are launched that
> cover
> > > an equal slice of the row identifier range.
> > >
> > > The rows are uniformly distributed throughout the RegionServers. I
> > > confirmed this through the hbase shell. I have only one column family,
> > and
> > > each row has the same number of column qualifiers.
> > >
> > > My problem is that the individual scan performance is wildly
> inconsistent
> > > even though they fetch approximately a similar number of rows. This
> > > inconsistency appears to be random with respect to hosts or
> regionservers
> > > or partitions or CPU cores. I am the only user of the fleet and not
> > running
> > > any other concurrent HBase operation.
> > >
> > > I started measuring from the beginning of the scan and stopped
> measuring
> > > after the scan was completed. I am not doing any logic with the
> results,
> > > just scanning them.
> > >
> > > For ~230K rows fetched per scan, I am getting anywhere from 4 seconds
> to
> > > 100+ seconds. This seems a little too bouncy for me. Does anyone have
> any
> > > insight? By comparison, a similar utility I wrote to upsert to
> > > regionservers was very consistent in ops/sec and I had no issues with
> it.
> > >
> > > Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap,
> I
> > > see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of
> log
> > > output I saved that used 130 partitions.
> > >
> > > total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> > > ops/sec:36358.38150289017
> > > total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> > > ops/sec:31176.91380349608
> > > total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> > > ops/sec:30772.08014764039
> > > total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> > > ops/sec:7051.235410034865
> > > total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> > > ops/sec:6046.3170939508955
> > > total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> > > ops/sec:4803.316900101075
> > > total # partitions:130; partition id:125; rows:205334
> elapsed_sec:41.911
> > > ops/sec:4899.286583474505
> > > total # partitions:130; partition id:123; rows:206622
> elapsed_sec:42.281
> > > ops/sec:4886.875901705258
> > > total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> > > ops/sec:4743.210480206996
> > >
> > > I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> > > insight into how I can make the read performance more consistent?
> > >
> > > Thanks!
> > >
> >
>

Re: Inconsistent scan performance

2016-03-24 Thread Ted Yu

I assume the partitions' boundaries don't align with region boundaries,
right ?

Meaning some partitions would cross region boundaries.

Which hbase release do you use ?

Thanks

On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
james.johansvi...@gmail.com> wrote:

> Hello all,
>
> So, I wrote a Java application for HBase that does a partitioned full-table
> scan according to a set number of partitions. For example, if there are 20
> partitions specified, then 20 separate full scans are launched that cover
> an equal slice of the row identifier range.
>
> The rows are uniformly distributed throughout the RegionServers. I
> confirmed this through the hbase shell. I have only one column family, and
> each row has the same number of column qualifiers.
>
> My problem is that the individual scan performance is wildly inconsistent
> even though they fetch approximately a similar number of rows. This
> inconsistency appears to be random with respect to hosts or regionservers
> or partitions or CPU cores. I am the only user of the fleet and not running
> any other concurrent HBase operation.
>
> I started measuring from the beginning of the scan and stopped measuring
> after the scan was completed. I am not doing any logic with the results,
> just scanning them.
>
> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
> 100+ seconds. This seems a little too bouncy for me. Does anyone have any
> insight? By comparison, a similar utility I wrote to upsert to
> regionservers was very consistent in ops/sec and I had no issues with it.
>
> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
> output I saved that used 130 partitions.
>
> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> ops/sec:36358.38150289017
> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> ops/sec:31176.91380349608
> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> ops/sec:30772.08014764039
> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> ops/sec:7051.235410034865
> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> ops/sec:6046.3170939508955
> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> ops/sec:4803.316900101075
> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
> ops/sec:4899.286583474505
> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
> ops/sec:4886.875901705258
> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> ops/sec:4743.210480206996
>
> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> insight into how I can make the read performance more consistent?
>
> Thanks!
>

Re: Unexpected region splits

2016-03-24 Thread Ted Yu

Actually there may be a simpler solution:

http://pastebin.com/3KJ7Vxnc

We can check the ratio between online regions and total number of regions
in IncreasingToUpperBoundRegionSplitPolicy#shouldSplit().

Only when the ratio gets over certain threshold, should splitting start.

FYI

On Thu, Mar 24, 2016 at 12:39 PM, Ted Yu  wrote:

> Currently IncreasingToUpperBoundRegionSplitPolicy doesn't detect when the
> master initialization finishes.
>
> There is also some missing piece where region server notifies the
> completion of cluster initialization (by looking at RegionServerObserver).
>
> Cheers
>
> On Thu, Mar 24, 2016 at 3:50 AM, Bram Desoete  wrote:
>
>>
>>
>>
>> Pedro Gandola  writes:
>>
>> >
>> > Hi Ted,
>> >
>> > Thanks,
>> > I think I got the problem, I'm using
>> *IncreasingToUpperBoundRegionSplitPolicy
>> > (default)* instead *ConstantSizeRegionSplitPolicy* which in my use case
>> is
>> > what I want.
>> >
>> > Cheers
>> > Pedro
>> >
>> > On Mon, Feb 15, 2016 at 5:22 PM, Ted Yu  wrote:
>> >
>> > > Can you pastebin region server log snippet around the time when the
>> split
>> > > happened ?
>> > >
>> > > Was the split on data table or index table ?
>> > >
>> > > Thanks
>> > >
>> > > > On Feb 15, 2016, at 10:22 AM, Pedro Gandola 
>> > > wrote:
>> > > >
>> > > > Hi,
>> > > >
>> > > > I have a cluster using *HBase 1.1.2* where I have a table and a
>> local
>> > > index
>> > > > (using *Apache Phoenix 4.6*) in total both tables have *300 regions*
>> > > > (aprox: *18 regions per server*), my*
>> hbase.hregion.max.filesize=30GB
>> > > *and
>> > > > my region sizes are now *~4.5GB compressed (~7GB uncompressed)*.
>> However
>> > > > each time I restart a RS sometimes a region gets split. This is
>> > > unexpected
>> > > > because my key space is uniform (using MD5) and if the problem was
>> my
>> > > > *region.size
>> > > >> * *hbase.hregion.max.filesize *I would expect to have all the
>> regions or
>> > > > almost all splitting but this only happens when I restart a RS and
>> it
>> > > > happens only for 1 or 2 regions.
>> > > >
>> > > > What are the different scenarios where a region can split?
>> > > >
>> > > > What are the right steps to restart a region server in order to
>> avoid
>> > > these
>> > > > unexpected splits?
>> > > >
>> > > > Thank you,
>> > > > Cheers
>> > > > Pedro
>> > >
>> >
>>
>>
>>
>> Thanks Pedro for giving your solution.
>>
>> i see the same issue during Hbase restarts. unexpected region splits.
>> i believe it is because the *IncreasingToUpperBoundRegionSplitPolicy* is
>> basing
>>  his calculation on the amount of ONLINE regions.
>> but while the RS is starting only a couple of regions are online YET.
>> so the policy things it would be no problem to add another region
>> since 'there are only a few'.
>> (while there are actually already are 330 for that RS for that phoenix
>> table...
>> yes i know i need to merge regions.
>> but this problem got out of hand unnoticed for some time now here)
>>
>> could HBase block split region decision until it is fully up and running?
>>
>> Hbase 1.0.0 logs. (check mainly the last line)
>>
>> Mar 24, 11:06:41.494 AM INFO
>> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher
>> Flushed, sequenceid=69436099, memsize=303.3 K, hasBloomFilter=true, into
>> tmp
>> file
>>
>> hdfs://ns/hbase/data/default/CUSTOMER/60af2857a7980ce4f1ac602dd83e05a6/.tmp/
>> 0fd4988f24f24d5d9887c542182efccc
>> Mar 24, 11:06:41.529 AM INFO
>> org.apache.hadoop.hbase.regionserver.HStore
>> Added hdfs://-ns/hbase/data/default/CUSTOMER/
>> ff4ecd56e6b06f228404f05f171f8282/0/1d05cf9cac4c46008e47e3578e7a18d6,
>> entries=235, sequenceid=22828972, filesize=5.5 K
>> Mar 24, 11:06:41.561 AM INFO
>> org.apache.hadoop.hbase.regionserver.HStore
>> Completed compaction of 3 (all) file(s) in s of CUSTOMER,\x0A0+\xF6\
>> xD8,1457121856469.183f6134683e0213ccb15558a56f7c02.
>> into 730489295b8c42afaec4a3b8bc38c915(size=1.4 M),
>> total size for store is 1.4 M. This selection was in queue for
>> 0sec, and took 0sec to execute.
>> Mar 24, 11:06:41.561 AM INFO
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread
>> Completed compaction: Request = regionName=CUSTOMER,
>> \x0A0+\xF6\xD8,1457121856469.183f6134683e0213ccb15558a56f7c02.,
>> storeName=s, fileCount=3, fileSize=1.7 M, priority=7,
>> time=1456532583179472;
>> duration=0sec
>> Mar 24, 11:06:41.562 AM DEBUG
>>
>> org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
>> ShouldSplit because IB size=3269370636, sizeToCheck=2147483648,
>> regionsWithCommonTable=2
>>
>> i will also revert back to the ConstantSizeRegionSplitPolicy
>>
>> Regards,
>>
>>
>>
>>
>

Re: Unexpected region splits

2016-03-24 Thread Ted Yu

Currently IncreasingToUpperBoundRegionSplitPolicy doesn't detect when the
master initialization finishes.

There is also some missing piece where region server notifies the
completion of cluster initialization (by looking at RegionServerObserver).

Cheers

On Thu, Mar 24, 2016 at 3:50 AM, Bram Desoete  wrote:

>
>
>
> Pedro Gandola  writes:
>
> >
> > Hi Ted,
> >
> > Thanks,
> > I think I got the problem, I'm using
> *IncreasingToUpperBoundRegionSplitPolicy
> > (default)* instead *ConstantSizeRegionSplitPolicy* which in my use case
> is
> > what I want.
> >
> > Cheers
> > Pedro
> >
> > On Mon, Feb 15, 2016 at 5:22 PM, Ted Yu  wrote:
> >
> > > Can you pastebin region server log snippet around the time when the
> split
> > > happened ?
> > >
> > > Was the split on data table or index table ?
> > >
> > > Thanks
> > >
> > > > On Feb 15, 2016, at 10:22 AM, Pedro Gandola 
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I have a cluster using *HBase 1.1.2* where I have a table and a local
> > > index
> > > > (using *Apache Phoenix 4.6*) in total both tables have *300 regions*
> > > > (aprox: *18 regions per server*), my* hbase.hregion.max.filesize=30GB
> > > *and
> > > > my region sizes are now *~4.5GB compressed (~7GB uncompressed)*.
> However
> > > > each time I restart a RS sometimes a region gets split. This is
> > > unexpected
> > > > because my key space is uniform (using MD5) and if the problem was my
> > > > *region.size
> > > >> * *hbase.hregion.max.filesize *I would expect to have all the
> regions or
> > > > almost all splitting but this only happens when I restart a RS and it
> > > > happens only for 1 or 2 regions.
> > > >
> > > > What are the different scenarios where a region can split?
> > > >
> > > > What are the right steps to restart a region server in order to avoid
> > > these
> > > > unexpected splits?
> > > >
> > > > Thank you,
> > > > Cheers
> > > > Pedro
> > >
> >
>
>
>
> Thanks Pedro for giving your solution.
>
> i see the same issue during Hbase restarts. unexpected region splits.
> i believe it is because the *IncreasingToUpperBoundRegionSplitPolicy* is
> basing
>  his calculation on the amount of ONLINE regions.
> but while the RS is starting only a couple of regions are online YET.
> so the policy things it would be no problem to add another region
> since 'there are only a few'.
> (while there are actually already are 330 for that RS for that phoenix
> table...
> yes i know i need to merge regions.
> but this problem got out of hand unnoticed for some time now here)
>
> could HBase block split region decision until it is fully up and running?
>
> Hbase 1.0.0 logs. (check mainly the last line)
>
> Mar 24, 11:06:41.494 AM INFO
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher
> Flushed, sequenceid=69436099, memsize=303.3 K, hasBloomFilter=true, into
> tmp
> file
>
> hdfs://ns/hbase/data/default/CUSTOMER/60af2857a7980ce4f1ac602dd83e05a6/.tmp/
> 0fd4988f24f24d5d9887c542182efccc
> Mar 24, 11:06:41.529 AM INFO
> org.apache.hadoop.hbase.regionserver.HStore
> Added hdfs://-ns/hbase/data/default/CUSTOMER/
> ff4ecd56e6b06f228404f05f171f8282/0/1d05cf9cac4c46008e47e3578e7a18d6,
> entries=235, sequenceid=22828972, filesize=5.5 K
> Mar 24, 11:06:41.561 AM INFO
> org.apache.hadoop.hbase.regionserver.HStore
> Completed compaction of 3 (all) file(s) in s of CUSTOMER,\x0A0+\xF6\
> xD8,1457121856469.183f6134683e0213ccb15558a56f7c02.
> into 730489295b8c42afaec4a3b8bc38c915(size=1.4 M),
> total size for store is 1.4 M. This selection was in queue for
> 0sec, and took 0sec to execute.
> Mar 24, 11:06:41.561 AM INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread
> Completed compaction: Request = regionName=CUSTOMER,
> \x0A0+\xF6\xD8,1457121856469.183f6134683e0213ccb15558a56f7c02.,
> storeName=s, fileCount=3, fileSize=1.7 M, priority=7,
> time=1456532583179472;
> duration=0sec
> Mar 24, 11:06:41.562 AM DEBUG
>
> org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy
> ShouldSplit because IB size=3269370636, sizeToCheck=2147483648,
> regionsWithCommonTable=2
>
> i will also revert back to the ConstantSizeRegionSplitPolicy
>
> Regards,
>
>
>
>

Re: HBase & SSD

2016-03-22 Thread Ted Yu

Please take a look at the attachment to HBASE-14457

Cheers

On Tue, Mar 22, 2016 at 1:56 PM, Parsian, Mahmoud 
wrote:

> If anyone is using HBase with SSDs, please let us know the performance
> improvements.
>
> Thank you!
> Best regards,
> Mahmoud
>

Re: sporadic hbase "outages"

2016-03-22 Thread Ted Yu

bq. a small number will take 20 minutes or more

Were these mappers performing selective scan on big regions ?

Can you pastebin the stack trace of region server(s) which served such
regions during slow mapper operation ?

Pastebin of region server log would also give us more clue.

On Tue, Mar 22, 2016 at 10:57 AM, feedly team  wrote:

> Recently we have been experiencing short downtimes (~2-5 minutes) in our
> hbase cluster and are trying to understand why. Many times we have HLog
> write spikes around the down times, but not always. Not sure if this is a
> red herring.
>
> We have looked a bit farther back in time and have noticed many metrics
> deteriorating over the past few months:
>
> The compaction queue size seems to be growing.
>
> The flushQueueSize and flushSizeAvgTime are growing.
>
> Some map reduce tasks run extremely slowly. Maybe 90% will complete within
> a couple minutes, but a small number will take 20 minutes or more. If I
> look at the slow mappers, there is a high value for the
> MILLIS_BETWEEN_NEXTS counter (these mappers didn't run data local).
>
> We have seen application performance worsening, during slowdowns usually
> threads are blocked on hbase connection operations
> (HConnectionManager$HConnectionImplementation.processBatch).
>
>
> This is a bit puzzling as our data nodes' os load values are really low. In
> the past, we had performance issues when load got too high. The region
> server log doesn't have anything interesting, the only messages we get are
> a handful of responseTooSlow messages
> Do these symptoms point to anything or is there something else we should
> look at? We are (still) running 0.94.20. We are going to upgrade soon, but
> we want to diagnose this issue first.
>

Re: java.lang.VerifyError: class com.google.protobuf.HBaseZeroCopyByteString overrides final method equals.(Ljava/lang/Object;)Z

2016-03-21 Thread Ted Yu

In the dependency tree from your first email, I don't see any protobuf 2.6
dependency.

Would protobuf 2.6 give you any advantage ?

Cheers

On Mon, Mar 21, 2016 at 8:37 AM, yeshwanth kumar 
wrote:

> what if i use protobuf version 2.6,
> is it supported?
>
> please let me know
>
> -Yeshwanth
> Can you Imagine what I would do if I could do all I can - Art of War
>
> On Fri, Mar 18, 2016 at 10:31 PM, yeshwanth kumar 
> wrote:
>
> > Thank you Ted,
> > Thank You Sean, for the detailed explanation.
> >
> > i will revert to protbuf version 2.5
> >
> >
> >
> > -Yeshwanth
> > Can you Imagine what I would do if I could do all I can - Art of War
> >
> > On Fri, Mar 18, 2016 at 8:37 PM, Ted Yu  wrote:
> >
> >> Thanks Sean for the explanation.
> >>
> >> Yeshwanth:
> >> Looks like you're using Spark as well.
> >>
> >> org.spark-project.protobuf:protobuf-java:jar:2.4.1-shaded:compile
> >>
> >> Note that more recent Spark releases use protobuf 2.5.0 as well:
> >>
> >> 2.5.0
> >>
> >> FYI
> >>
> >> On Fri, Mar 18, 2016 at 6:11 PM, Sean Busbey 
> wrote:
> >>
> >> > The long version of Ted's answer is that your example brings in
> protobuf
> >> > version 3, which is known to be incompatible with some hbase internal
> >> > optimizations.
> >> >
> >> > The specific error you get is exactly that incompatibility: we extend
> a
> >> > protobuf class so that we can avoid an unnecessary byte array copy.
> >> > Unfortunately the stuff we override is marked final in protobuf v3
> which
> >> > results in the class verification error.
> >> >
> >> > --
> >> > Sean Busbey
> >> > On Mar 18, 2016 19:38, "Ted Yu"  wrote:
> >> >
> >> > > HBase is built with this version of protobuf:
> >> > >
> >> > > 2.5.0
> >> > >
> >> > > On Fri, Mar 18, 2016 at 5:13 PM, yeshwanth kumar <
> >> yeshwant...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > i am using  HBase 1.0.0-cdh5.5.1
> >> > > >
> >> > > > i am hitting this exception when trying to write to Hbase
> >> > > >
> >> > > > following is the stack trace
> >> > > >
> >> > > > Exception in thread "main" java.lang.VerifyError: class
> >> > > > com.google.protobuf.HBaseZeroCopyByteString overrides final method
> >> > > > equals.(Ljava/lang/Object;)Z
> >> > > > at java.lang.ClassLoader.defineClass1(Native Method)
> >> > > > at java.lang.ClassLoader.defineClass(Unknown Source)
> >> > > > at java.security.SecureClassLoader.defineClass(Unknown Source)
> >> > > > at java.net.URLClassLoader.defineClass(Unknown Source)
> >> > > > at java.net.URLClassLoader.access$100(Unknown Source)
> >> > > > at java.net.URLClassLoader$1.run(Unknown Source)
> >> > > > at java.net.URLClassLoader$1.run(Unknown Source)
> >> > > > at java.security.AccessController.doPrivileged(Native Method)
> >> > > > at java.net.URLClassLoader.findClass(Unknown Source)
> >> > > > at java.lang.ClassLoader.loadClass(Unknown Source)
> >> > > > at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
> >> > > > at java.lang.ClassLoader.loadClass(Unknown Source)
> >> > > > at
> >> > >
> >> org.apache.hadoop.hbase.util.ByteStringer.(ByteStringer.java:44)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.protobuf.RequestConverter.buildRegionSpecifier(RequestConverter.java:995)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.protobuf.RequestConverter.buildGetRowOrBeforeRequest(RequestConverter.java:138)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1579)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.Connec

Re: Unable to find cached index metadata

2016-03-21 Thread Ted Yu

Have you posted this question on Phoenix mailing list ?

Looks like you may get better answer there since the exception is related
to Phoenix coprocessor.

Thanks

On Mon, Mar 21, 2016 at 3:51 AM, Pedro Gandola 
wrote:

> Hi,
>
> I'm using *Phoenix4.6* and in my use case I have a table that keeps a
> sliding window of 7 days worth of data. I have 3 local indexes on this
> table and in out use case we have aprox: 150 producers that are inserting
> data (in batches of 300-1500 events) in real-time.
>
> Some days ago I started to get a lot of errors like the below ones. The
> number of errors was so large that the cluster performance dropped a lot
> and my disks read bandwidth was crazy high but the write bandwidth was
> normal. I can ensure that during that period no readers were running only
> producers.
>
> ERROR [B.defaultRpcServer.handler=25,queue=5,port=16020]
> > parallel.BaseTaskRunner: Found a failed task because:
> > org.apache.hadoop.hbase.DoNotRetryIOException: *ERROR 2008 (INT10): ERROR
> > 2008 (INT10): Unable to find cached index metadata.*
> >  key=4276342695061435086
> >
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> > *Index update failed*
> > java.util.concurrent.ExecutionException:
> > org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR
> > 2008 (INT10): Unable to find cached index metadata.
> >  key=4276342695061435086
> >
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> > Index update failed
> > Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008
> > (INT10): ERROR 2008 (INT10): Unable to find cached index metadata.
> >  key=4276342695061435086
> >
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> > Index update failed
> > Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find
> > cached index metadata.  key=4276342695061435086
> >
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> > INFO  [B.defaultRpcServer.handler=25,queue=5,port=16020]
> > parallel.TaskBatch: Aborting batch of tasks because Found a failed task
> > because: org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008
> (INT10):
> > ERROR 2008 (INT10): Unable to find cached index metadata.
> >  key=4276342695061435086
> >
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> > Index update failed
> > ERROR [B.defaultRpcServer.handler=25,queue=5,port=16020]
> *builder.IndexBuildManager:
> > Found a failed index update!*
> > INFO  [B.defaultRpcServer.handler=25,queue=5,port=16020]
> > util.IndexManagementUtil: Rethrowing
> > org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR
> > 2008 (INT10): Unable to find cached index metadata.
> >  key=4276342695061435086
> >
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> > Index update failed
>
>
> I searched for the error and I made the following changes on the server
> side:
>
>- *phoenix.coprocessor.maxServerCacheTimeToLiveMs *from 30s to 2min
>- *phoenix.coprocessor.maxMetaDataCacheSize* from 20MB to 40MB
>
> After I changed these properties I restarted the cluster and the errors
> were gone but disks read bandwidth was still very high and I was getting
> *responseTooSlow* warnings. As a quick solution I created fresh tables and
> then the problems were gone.
>
> Now, after one day running with new tables I started to see the problem
> again but I think this was during a major compaction but I would like to
> understand more the reasons&consequences of these problems.
>
> - What are the major consequences of these errors? I assume that index data
> is not written within the index table, right? Then, why was the read
> bandwidth of my disks so high even without readers and after changed those
> properties?
>
> - Is there any optimal or recommended value for the above properties or am
> I missing some tunning on other properties for the metadata cache?
>
> Thank you,
> Pedro
>

Re: Cant connect to hbase thrift

2016-03-20 Thread Ted Yu

If you are running Linux / OSX on localhost, you can use 'ps' command to
search for the thrift server.

You can also check any process is listening on port 9090.

FYI

On Sun, Mar 20, 2016 at 5:20 AM, ram kumar  wrote:

> Hi,
>
> I can't able to connect to Hbase Thrift
>
> thrift.transport.TTransport.TTransportException: Could not connect to
> localhost:9090
>
> Is there a way to check thrift status, whether it is running?
>
> Thanks
>

Re: Thrift Server: How to increase "max worker threads" and "max queued requests"

2016-03-20 Thread Ted Yu

See the following from hbase-default.xml

  
hbase.thrift.minWorkerThreads
16
...
  
hbase.thrift.maxWorkerThreads
1000
...
  
hbase.thrift.maxQueuedRequests
1000

On Thu, Mar 17, 2016 at 1:05 AM, Daniel  wrote:

> Hi, I find that the Thrift server will stop responding (the request hangs
> until timeout) when the number of concurrent requests reaches several
> hundred.
>
> I guess the problem is related to "max worker threads" and "max queued
> requests", according to the following console output on Thrift start:
>
> 2016-03-17 12:05:08,514 INFO  [main] thrift.ThriftServerRunner: starting
> TBoundedThreadPoolServer on /0.0.0.0:9090 with readTimeout 6ms;
> min worker threads=16, max worker threads=1000, max queued requests=1000
>
> I'd like to know how to increase "max worker threads" and "max queued
> requests", but cannot find them in the documentation.
>
> Thanks for any hint.
>
> Daniel

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 3936 matches

Mail list logo