[jira] [Resolved] (HBASE-26088) conn.getBufferedMutator(tableName) leaks thread executors and other problems

2021-07-21 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-26088.

Fix Version/s: 2.3.6
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.3, branch-2.4 and branch-2.
Thanks for the patch [~shahrs87].
Thanks for the great find [~whitney13]

> conn.getBufferedMutator(tableName) leaks thread executors and other problems
> 
>
> Key: HBASE-26088
> URL: https://issues.apache.org/jira/browse/HBASE-26088
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.13, 2.4.4
>Reporter: Whitney Jackson
>Assignee: Rushabh Shah
>Priority: Critical
> Fix For: 2.5.0, 2.3.6, 2.4.5
>
>
> TL;DR: {{conn.getBufferedMutator(tableName)}} is dangerous in hbase client 
> 2.4.4 and doesn't match documented behavior in 1.4.13.
> To work around the problems until fixed do this:
> {code:java}
> var mySingletonPool = HTable.getDefaultExecutor(hbaseConf);
> var params = new BufferedMutatorParams(tableName);
> params.pool(mySingletonPool);
> var myMutator = conn.getBufferedMutator(params);
> {code}
> And avoid code like this:
> {code:java}
> var myMutator = conn.getBufferedMutator(tableName);
> {code}
> The full story:
> My application started leaking threads after upgrading from hbase client 
> 1.4.13 to 2.4.4. So much so that after less than a minute of runtime more 
> that 30k threads are leaked and all available virtual memory on the box (> 50 
> GB) is consumed. Other processes on the box start crashing with memory 
> allocation errors. Even running {{ls}} at the shell fails with OS resource 
> allocation failures.
> A thread dump after just a few seconds of runtime shows thousands of threads 
> like this:
> {code:java}
> "htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s 
> tid=0x7efb6d2a1000 nid=0x57d2 waiting on condition [0x7ef8a6c38000]
>  java.lang.Thread.State: TIMED_WAITING (parking)
>  at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
>  - parking to wait for <0x0007e7cd6188> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>  at 
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234)
>  at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@11.0.6/SynchronousQueue.java:462)
>  at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@11.0.6/SynchronousQueue.java:361)
>  at 
> java.util.concurrent.SynchronousQueue.poll(java.base@11.0.6/SynchronousQueue.java:937)
>  at 
> java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1053)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
>  at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
> {code}
>  
> Note: All the threads are labeled {{htable-pool-0}}. That suggests we're 
> leaking thread executors not just threads. The {{htable-pool}} part indicates 
> the problem is to do with {{HTable.getDefaultExecutor(conf)}} and the only 
> part of my code that interacts with that is a call to 
> {{conn.getBufferedMutator(tableName)}}.
>  
> Looking at the hbase client code shows a few problems:
> 1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for 
> {{conn.getBufferedMutator(tableName)}} which says:
> {quote}This BufferedMutator will use the Connection's ExecutorService.
> {quote}
> That suggests some singleton thread executor is being used which is not the 
> case.
>  
> 2) Under 1.4.13 you get a new {{ThreadPoolExecutor}} for every 
> {{BufferedMutator}}. That's probably not what you want but you likely won't 
> notice. I didn't. It's a code path I hadn't profiled much.
>  
> 3) Under 2.4.4 you get a new {{ThreadPoolExecutor}} for every 
> {{BufferedMutator}} *and* that {{ThreadPoolExecutor}} *is not* cleaned up 
> after the {{Mutator}} is closed. Each completed {{ThreadPoolExecutor}} 
> carries with it one thread which hangs around until a timeout value which 
> defaults to 60 seconds.
> My application creates one {{BufferedMutator}} for every incoming stream and 
> there are lots of streams, some of them are short lived so my code leaks 
> threads fast under 2.4.4.
> Here's the part where a new executor is created for every {{BufferedMutator}} 
> (it's similar for 1.4.13):
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420]
>  
> The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic 
> added here:
> [https://github.com/apache/hbase/blob/branch-2.4/hbase-cli

[jira] [Resolved] (HBASE-25985) ReplicationSourceWALReader#run - Reset sleepMultiplier in loop once out of any IOE

2021-06-28 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25985.

Resolution: Invalid

> ReplicationSourceWALReader#run - Reset sleepMultiplier in loop once out of 
> any IOE
> --
>
> Key: HBASE-25985
> URL: https://issues.apache.org/jira/browse/HBASE-25985
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anoop Sam John
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25985) ReplicationSourceWALReader#run - Reset sleepMultiplier in loop once out of any IOE

2021-06-08 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-25985:
--

 Summary: ReplicationSourceWALReader#run - Reset sleepMultiplier in 
loop once out of any IOE
 Key: HBASE-25985
 URL: https://issues.apache.org/jira/browse/HBASE-25985
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25961) Backport HBASE-25596 to branch-2.3

2021-06-01 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-25961:
--

 Summary: Backport HBASE-25596 to branch-2.3
 Key: HBASE-25961
 URL: https://issues.apache.org/jira/browse/HBASE-25961
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25903) ReadOnlyZKClient APIs - CompletableFuture.get() calls can cause threads to hang forver when ZK client create throws Non IOException

2021-05-30 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25903.

Hadoop Flags: Reviewed
  Resolution: Fixed

> ReadOnlyZKClient APIs - CompletableFuture.get() calls can cause threads to 
> hang forver when ZK client create throws Non IOException
> ---
>
> Key: HBASE-25903
> URL: https://issues.apache.org/jira/browse/HBASE-25903
> Project: HBase
>  Issue Type: Bug
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.4
>
>
> This is applicable for zk client versions which is not having fix for 
> ZOOKEEPER-2184.
> Now we are on zookeeper 3.5.7 on active 2.x branches. Still its better to 
> handle this case in our code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25898) RS getting aborted due to NPE in Replication WALEntryStream

2021-05-26 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25898.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to master, branch-2, branch-2.4, branch-2.3, branch-1
Thanks for the reviews.

> RS getting aborted due to NPE in Replication WALEntryStream
> ---
>
> Key: HBASE-25898
> URL: https://issues.apache.org/jira/browse/HBASE-25898
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Below sequence of events happened in a customer cluster
> An empty WAL file got roll req.
> The close of file failed at HDFS side but as there  file had all edits 
> synced, we continue.
> New WAL file is created and old rolled.
> This old WAL file got archived to oldWAL 
> {code}
> 2021-05-13 13:38:46.000   Riding over failed WAL close of 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678,
>  cause="Unexpected EOF while trying to read response from server", errors=1; 
> THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
> 2021-05-13 13:38:46.000   Rolled WAL 
> /xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678 
> with entries=0, filesize=90 B; new WAL 
> /xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620913126549
> 2021-05-13 13:38:46.000Archiving 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678
>  to hdfs://xxx/oldWALs/xxxt%2C16020%2C1620828102351.1620910673678
> 2021-05-13 13:38:46.000   Log 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678
>  was moved to hdfs://xxx/oldWALs/xxx%2C16020%2C1620828102351.1620910673678
> {code}
> As there was move of file, the WALEntryStream got IOE and we will recreate 
> the stream .
> {code}
> ReplicationSourceWALReader#run
> while (isReaderRunning()) {
>   try {
> entryStream =
>   new WALEntryStream(logQueue, conf, currentPosition, 
> source.getWALFileLengthProvider(),
> source.getServerWALsBelongTo(), source.getSourceMetrics(), 
> walGroupId);
> while (isReaderRunning()) { 
> ...
> ...
> } catch (IOException e) { // stream related
> if (handleEofException(e, batch)) {
>   sleepMultiplier = 1;
> } else {
>   LOG.warn("Failed to read stream of replication entries", e);
>   if (sleepMultiplier < maxRetriesMultiplier) {
> sleepMultiplier++;
>   }
>   Threads.sleep(sleepForRetries * sleepMultiplier);
> }
> }
> {code}
> eofAutoRecovery is turned off anyways.  So it will go to outer while loop and 
> create new WALEntryStream object
> Then we do readWALEntries
> {code}
> protected WALEntryBatch readWALEntries(WALEntryStream entryStream,
>   WALEntryBatch batch) throws IOException, InterruptedException {
> Path currentPath = entryStream.getCurrentPath();
> if (!entryStream.hasNext()) {
> {code}
> Here the currentPath will be still null. 
> WALEntryStream#hasNext -> tryAdvanceEntry -> checkReader -> openNextLog
> {code}
> private boolean openNextLog() throws IOException {
> PriorityBlockingQueue queue = logQueue.getQueue(walGroupId);
> Path nextPath = queue.peek();
> if (nextPath != null) {
>   openReader(nextPath);
> 
> private void openReader(Path path) throws IOException {
> try {
>   // Detect if this is a new file, if so get a new reader else
>   // reset the current reader so that we see the new data
>   if (reader == null || !getCurrentPath().equals(path)) {
> closeReader();
> reader = WALFactory.createReader(fs, path, conf);
> seek();
> setCurrentPath(path);
>   } else {
> resetReader();
>   }
> } catch (FileNotFoundException fnfe) {
>   handleFileNotFound(path, fnfe);
> }  catch (RemoteException re) {
>   IOException ioe = re.unwrapRemoteException(FileNotFoundException.class);
>   if (!(ioe instanceof FileNotFoundException)) {
> throw ioe;
>   }
>   handleFileNotFound(path, (FileNotFoundException)ioe);
> } catch (LeaseNotRecoveredException lnre) {
>   // HBASE-15019 the WAL was not closed due to some hiccup.
>   LOG.warn("Try to recover the WAL lease " + currentPath, lnre);
>   recoverLease(conf, currentPath);
>   reader = null;
> } catch (NullPointerException npe) {
>   // Workaround for race condition in HDFS-4380
>   // which throws a NPE if we open a file before any data node has the 
> most recent block
>   // Just sleep a

[jira] [Created] (HBASE-25903) ReadOnlyZKClient APIs - CompletableFuture.get() calls can cause threads to hang forver when ZK client create throws Non IOException

2021-05-23 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-25903:
--

 Summary: ReadOnlyZKClient APIs - CompletableFuture.get() calls can 
cause threads to hang forver when ZK client create throws Non IOException
 Key: HBASE-25903
 URL: https://issues.apache.org/jira/browse/HBASE-25903
 Project: HBase
  Issue Type: Bug
Reporter: Anoop Sam John
Assignee: Anoop Sam John


This is applicable for zk client versions which is not having fix for 
ZOOKEEPER-2184.
Now we are on zookeeper 3.5.7 on active 2.x branches. Still its better to 
handle this case in our code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25898) RS getting aborted due to NPE in Replication WALEntryStream

2021-05-19 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-25898:
--

 Summary: RS getting aborted due to NPE in Replication 
WALEntryStream
 Key: HBASE-25898
 URL: https://issues.apache.org/jira/browse/HBASE-25898
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Anoop Sam John
Assignee: Anoop Sam John


Below sequence of events happened in a customer cluster
An empty WAL file got roll req.
The close of file failed at HDFS side but as there  file had all edits synced, 
we continue.
New WAL file is created and old rolled.
This old WAL file got archived to oldWAL 
{code}
2021-05-13 13:38:46.000 Riding over failed WAL close of 
hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678,
 cause="Unexpected EOF while trying to read response from server", errors=1; 
THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
2021-05-13 13:38:46.000 Rolled WAL 
/xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678 with 
entries=0, filesize=90 B; new WAL 
/xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620913126549
2021-05-13 13:38:46.000  Archiving 
hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678
 to hdfs://xxx/oldWALs/xxxt%2C16020%2C1620828102351.1620910673678
2021-05-13 13:38:46.000 Log 
hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678
 was moved to hdfs://xxx/oldWALs/xxx%2C16020%2C1620828102351.1620910673678
{code}
As there was move of file, the WALEntryStream got IOE and we will recreate the 
stream .
{code}
ReplicationSourceWALReader#run
while (isReaderRunning()) {
try {
  entryStream =
new WALEntryStream(logQueue, conf, currentPosition, 
source.getWALFileLengthProvider(),
  source.getServerWALsBelongTo(), source.getSourceMetrics(), 
walGroupId);
  while (isReaderRunning()) { 
  ...
  ...
  } catch (IOException e) { // stream related
  if (handleEofException(e, batch)) {
sleepMultiplier = 1;
  } else {
LOG.warn("Failed to read stream of replication entries", e);
if (sleepMultiplier < maxRetriesMultiplier) {
  sleepMultiplier++;
}
Threads.sleep(sleepForRetries * sleepMultiplier);
  }
}
{code}
eofAutoRecovery is turned off anyways.  So it will go to outer while loop and 
create new WALEntryStream object
Then we do readWALEntries
{code}
protected WALEntryBatch readWALEntries(WALEntryStream entryStream,
  WALEntryBatch batch) throws IOException, InterruptedException {
Path currentPath = entryStream.getCurrentPath();
if (!entryStream.hasNext()) {
{code}
Here the currentPath will be still null. 
WALEntryStream#hasNext -> tryAdvanceEntry -> checkReader -> openNextLog
{code}
private boolean openNextLog() throws IOException {
PriorityBlockingQueue queue = logQueue.getQueue(walGroupId);
Path nextPath = queue.peek();
if (nextPath != null) {
  openReader(nextPath);
  
private void openReader(Path path) throws IOException {
try {
  // Detect if this is a new file, if so get a new reader else
  // reset the current reader so that we see the new data
  if (reader == null || !getCurrentPath().equals(path)) {
closeReader();
reader = WALFactory.createReader(fs, path, conf);
seek();
setCurrentPath(path);
  } else {
resetReader();
  }
} catch (FileNotFoundException fnfe) {
  handleFileNotFound(path, fnfe);
}  catch (RemoteException re) {
  IOException ioe = re.unwrapRemoteException(FileNotFoundException.class);
  if (!(ioe instanceof FileNotFoundException)) {
throw ioe;
  }
  handleFileNotFound(path, (FileNotFoundException)ioe);
} catch (LeaseNotRecoveredException lnre) {
  // HBASE-15019 the WAL was not closed due to some hiccup.
  LOG.warn("Try to recover the WAL lease " + currentPath, lnre);
  recoverLease(conf, currentPath);
  reader = null;
} catch (NullPointerException npe) {
  // Workaround for race condition in HDFS-4380
  // which throws a NPE if we open a file before any data node has the most 
recent block
  // Just sleep and retry. Will require re-reading compressed WALs for 
compressionContext.
  LOG.warn("Got NPE opening reader, will retry.");
  reader = null;
}
  }
{code}
Here the call to WALFactory.createReader is not able to complete because of 
issue from HDFS.  We have retry mechanism there for 5 mns. But eventually it 
throws LeaseNotRecoveredException.  ya we try handle it.
But the problem here is in that call we pass the state variable currentPath 
which is still null here!
This will throw NPE 
{code}
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.fixRelativ

[jira] [Reopened] (HBASE-25772) org.apache.hadoop.hbase.NotServingRegionException: hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not online on bd133,16020,1618369396289

2021-04-16 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John reopened HBASE-25772:


> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618369396289
> --
>
> Key: HBASE-25772
> URL: https://issues.apache.org/jira/browse/HBASE-25772
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
> Environment: hbase(2.0.0-cdh6.0.1)
>Reporter: shiyu
>Priority: Major
>
> when start hbase:
> Wed Apr 14 13:32:48 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318Wed Apr 14 13:32:48 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318 at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3245)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3222)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2429)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Wed Apr 14 13:32:48 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318 at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3245)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3222)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2429)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Wed Apr 14 13:32:48 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318 at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3245)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3222)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2429)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Wed Apr 14 13:32:49 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318

[jira] [Resolved] (HBASE-25772) org.apache.hadoop.hbase.NotServingRegionException: hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not online on bd133,16020,1618369396289

2021-04-16 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25772.

Resolution: Invalid

> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618369396289
> --
>
> Key: HBASE-25772
> URL: https://issues.apache.org/jira/browse/HBASE-25772
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
> Environment: hbase(2.0.0-cdh6.0.1)
>Reporter: shiyu
>Priority: Major
>
> when start hbase:
> Wed Apr 14 13:32:48 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318Wed Apr 14 13:32:48 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318 at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3245)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3222)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2429)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Wed Apr 14 13:32:48 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318 at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3245)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3222)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2429)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Wed Apr 14 13:32:48 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on bd133,16020,1618378306318 at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3245)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3222)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414)
>  at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2429)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at 
> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> Wed Apr 14 13:32:49 CST 2021, 
> RpcRetryingCaller\{globalStartTime=1618378367830, pause=100, maxAttempts=31}, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: 
> hbase:namespace,,1544623399046.0d98a5b9654f3a3783388bc294c25067. is not 
> online on b

[jira] [Resolved] (HBASE-25673) Wrong log regarding current active master at ZKLeaderManager#waitToBecomeLeader

2021-03-18 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25673.

Fix Version/s: 2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Wrong log regarding current active master at 
> ZKLeaderManager#waitToBecomeLeader
> ---
>
> Key: HBASE-25673
> URL: https://issues.apache.org/jira/browse/HBASE-25673
> Project: HBase
>  Issue Type: Bug
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> {code}
> byte[] currentId = ZKUtil.getDataAndWatch(watcher, leaderZNode);
> if (currentId != null && Bytes.equals(currentId, nodeId)) {
>   
> } else {
>   LOG.info("Found existing leader with ID: {}", Bytes.toStringBinary(nodeId));
>   leaderExists.set(true);
> }
> {code}
> Existing id, read from ZK, is currentId. But by mistake we log 'nodeId' which 
> is the this master's node id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25673) Wrong log regarding current active master at ZKLeaderManager#waitToBecomeLeader

2021-03-17 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-25673:
--

 Summary: Wrong log regarding current active master at 
ZKLeaderManager#waitToBecomeLeader
 Key: HBASE-25673
 URL: https://issues.apache.org/jira/browse/HBASE-25673
 Project: HBase
  Issue Type: Bug
Reporter: Anoop Sam John
Assignee: Anoop Sam John


{code}
byte[] currentId = ZKUtil.getDataAndWatch(watcher, leaderZNode);
if (currentId != null && Bytes.equals(currentId, nodeId)) {
  
} else {
  LOG.info("Found existing leader with ID: {}", Bytes.toStringBinary(nodeId));
  leaderExists.set(true);
}
{code}
Existing id, read from ZK, is currentId. But by mistake we log 'nodeId' which 
is the current master node id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25582) Support setting scan ReadType to be STREAM at cluster level

2021-03-10 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25582.

Fix Version/s: 2.4.2
   2.3.5
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Support setting scan ReadType to be STREAM at cluster level
> ---
>
> Key: HBASE-25582
> URL: https://issues.apache.org/jira/browse/HBASE-25582
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.5, 2.4.2
>
>
> We have the config 'hbase.storescanner.use.pread' at cluster level to set 
> ReadType to be PRead if not explicitly specified in Scan object.
> Same way we can have a way to make scan as STREAM type at cluster level (if 
> not specified at Scan object level)
> We do not need any new configs or so.  We have the config 
> 'hbase.storescanner.pread.max.bytes' which specifies when to switch read type 
> to stream and it defaults to 4 * HFile block size.  If one config this value 
> as <= 0 means user need the switch when scanner is created itself.  With such 
> a handling we can support it.
> So every scan need not set the read type.
> The issue is in Cloud storage based system using Stream reads might be 
> better.  We introduced this PRead based scan with tests on HDFS based 
> storage.   In my customer case, Azure storage in place and WASB driver been 
> used. We have a read ahead mechanism there (Read an entire Block of a blob in 
> one REST call) and buffer that in WASB driver.  This helps a lot wrt longer 
> scans.   Ya with config 'hbase.storescanner.pread.max.bytes'  we can make the 
> switch to happen early but better to go with 1.x way where the scan starts 
> with Stream read itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25644) Scan#setSmall blindly sets ReadType as PREAD

2021-03-08 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25644.

Fix Version/s: 2.4.2
   2.3.5
   2.5.0
   2.2.7
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Scan#setSmall blindly sets ReadType as PREAD
> 
>
> Key: HBASE-25644
> URL: https://issues.apache.org/jira/browse/HBASE-25644
> Project: HBase
>  Issue Type: Bug
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Critical
>  Labels: phoenix
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.3.5, 2.4.2
>
>
> setSmall(boolean small) takes a boolean param and it might get called with 
> false also. But with out considering that, we set read type as PREAD.  
> Phoenix clones Scan object and do 
> newScan.setSmall(scan.isSmall());
> So this makes ALL types of scans from phoenix as PREAD type now. (Even full 
> table scan)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25644) Scan#setSmall blindly sets ReadType as PREAD

2021-03-07 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-25644:
--

 Summary: Scan#setSmall blindly sets ReadType as PREAD
 Key: HBASE-25644
 URL: https://issues.apache.org/jira/browse/HBASE-25644
 Project: HBase
  Issue Type: Bug
Reporter: Anoop Sam John
Assignee: Anoop Sam John


setSmall(boolean small) takes a boolean param and it might get called with 
false also. But with out considering that, we set read type as PREAD.  
Phoenix clones Scan object and do 
newScan.setSmall(scan.isSmall());
So this makes ALL types of scans from phoenix as PREAD type now. (Even full 
table scan)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25626) Possible Resource Leak in HeterogeneousRegionCountCostFunction

2021-03-07 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25626.

Hadoop Flags: Reviewed
  Resolution: Fixed

> Possible Resource Leak in HeterogeneousRegionCountCostFunction
> --
>
> Key: HBASE-25626
> URL: https://issues.apache.org/jira/browse/HBASE-25626
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.6.0
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.2
>
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hbase/blob/b522d2a33e31fbe70c341ea7068428ada8d51bc0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousRegionCountCostFunction.java#L225].
>  close() is called on {{reader}} at line 228 but if an I/O error occurs at 
> line 225, {{reader}} remains open since the exception isn't caught locally.
> I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25582) Support setting scan ReadType to be STREAM at cluster level

2021-02-16 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-25582:
--

 Summary: Support setting scan ReadType to be STREAM at cluster 
level
 Key: HBASE-25582
 URL: https://issues.apache.org/jira/browse/HBASE-25582
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John


We have the config 'hbase.storescanner.use.pread' at cluster level to set 
ReadType to be PRead if not explicitly specified in Scan object.
Same way we can have a way to make scan as STREAM type at cluster level (if not 
specified at Scan object level)
We do not need any new configs or so.  We have the config 
'hbase.storescanner.pread.max.bytes' which specifies when to switch read type 
to stream and it defaults to 4 * HFile block size.  If one config this value as 
<= 0 means user need the switch when scanner is created itself.  With such a 
handling we can support it.
So every scan need not set the read type.

The issue is in Cloud storage based system using Stream reads might be better.  
We introduced this PRead based scan with tests on HDFS based storage.   In my 
customer case, Azure storage in place and WASB driver been used. We have a read 
ahead mechanism there (Read an entire Block of a blob in one REST call) and 
buffer that in WASB driver.  This helps a lot wrt longer scans.   Ya with 
config 'hbase.storescanner.pread.max.bytes'  we can make the switch to happen 
early but better to go with 1.x way where the scan starts with Stream read 
itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25026) Create a metric to track full region scans RPCs

2020-11-18 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25026.

Hadoop Flags: Reviewed
  Resolution: Fixed

> Create a metric to track full region scans RPCs
> ---
>
> Key: HBASE-25026
> URL: https://issues.apache.org/jira/browse/HBASE-25026
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> A metric that indicates how many of the scan requests were without start row 
> and/or stop row. Generally such queries may be wrongly written or may require 
> better schema design and those may be some queries doing some sanity check to 
> verify if their actual application logic has done the necessary updates and 
> the all that expected rows are processed. 
> We do have some logs at the RPC layer to see what queries take time but 
> nothing as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24919) A tool to rewrite corrupted HFiles

2020-08-21 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24919.

Resolution: Duplicate

Dup of HBASE-24920

> A tool to rewrite corrupted HFiles
> --
>
> Key: HBASE-24919
> URL: https://issues.apache.org/jira/browse/HBASE-24919
> Project: HBase
>  Issue Type: Brainstorming
>  Components: hbase-operator-tools
>Reporter: Andrey Elenskiy
>Priority: Major
>
> Typically I have been dealing with corrupted HFiles (due to loss of hdfs 
> blocks) by just removing them. However, It always seemed wasteful to throw 
> away the entire HFile (which can be hundreds of gigabytes), just because one 
> hdfs block is missing (128MB).
> I think there's a possibility for a tool that can rewrite an HFile by 
> skipping corrupted blocks. 
> There can be multiple types of issues with hdfs blocks but any of them can be 
> treated as if the block doesn't exist:
> 1. All the replicas can be lost
> 2. The block can be corrupted due to some bug in hdfs (I've recently run into 
> HDFS-15186 by experimenting with EC).
> At the simplest the tool can be a local mapreduce job (mapper only) with a 
> custom HFile reader input that can seek to next DATABLK to skip corrupted 
> hdfs blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24850:
--

 Summary: CellComparator perf improvement
 Key: HBASE-24850
 URL: https://issues.apache.org/jira/browse/HBASE-24850
 Project: HBase
  Issue Type: Improvement
  Components: Performance, scan
Affects Versions: 2.4.0
Reporter: Anoop Sam John
 Fix For: 2.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24849) Branch-1 Backport : HBASE-24665 MultiWAL : Avoid rolling of ALL WALs when one of the WAL needs a roll

2020-08-10 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24849:
--

 Summary: Branch-1 Backport : HBASE-24665 MultiWAL :  Avoid rolling 
of ALL WALs when one of the WAL needs a roll
 Key: HBASE-24849
 URL: https://issues.apache.org/jira/browse/HBASE-24849
 Project: HBase
  Issue Type: Bug
Reporter: Anoop Sam John
Assignee: wenfeiyi666
 Fix For: 1.7.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24665) MultiWAL : Avoid rolling of ALL WALs when one of the WAL needs a roll

2020-08-10 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24665.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to trunk, branch-2, branch-2.3, branch-2.2.. Thanks for the patch 
[~wenfeiyi666]

> MultiWAL :  Avoid rolling of ALL WALs when one of the WAL needs a roll
> --
>
> Key: HBASE-24665
> URL: https://issues.apache.org/jira/browse/HBASE-24665
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.3.0, master, 2.1.10, 1.4.14, 2.2.6
>Reporter: wenfeiyi666
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.7
>
>
> when use multiwal, any a wal request roll, all wal will be together roll.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24791) Improve HFileOutputFormat2 to avoid always call getTableRelativePath method

2020-08-03 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24791.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to master.

> Improve HFileOutputFormat2 to avoid always call getTableRelativePath method
> ---
>
> Key: HBASE-24791
> URL: https://issues.apache.org/jira/browse/HBASE-24791
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 3.0.0-alpha-1
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Critical
>  Labels: HFileOutputFormat, bulkload
> Fix For: 3.0.0-alpha-1
>
>
> Bulkload use HFileOutputFormat2 to write HFile 
> In the  HFileOutputFormat2.RecordWriter
> in the write method always called the getTableRelativePath method each time
> This  is unnecessary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24695) FSHLog - close the current WAL file in a background thread

2020-08-01 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24695.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to branch-2 and trunk.  Thanks for the reviews Duo and Ram.

> FSHLog - close the current WAL file in a background thread
> --
>
> Key: HBASE-24695
> URL: https://issues.apache.org/jira/browse/HBASE-24695
> Project: HBase
>  Issue Type: Improvement
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> We have this as a TODO in code already
> {code}
> // It is at the safe point. Swap out writer from under the blocked writer 
> thread.
>   // TODO: This is close is inline with critical section. Should happen 
> in background?
>   if (this.writer != null) {
> oldFileLen = this.writer.getLength();
> try {
>   TraceUtil.addTimelineAnnotation("closing writer");
>   this.writer.close();
>   TraceUtil.addTimelineAnnotation("writer closed");
>   this.closeErrorCount.set(0);
> }
> {code}
> This close call in critical section and writes are blocked. Lets move this 
> close call into another WALCloser thread. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24790) Remove unused counter from SplitLogCounters

2020-07-29 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24790.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to master. Thanks for the patch [~chenyechao]

> Remove unused counter from SplitLogCounters
> ---
>
> Key: HBASE-24790
> URL: https://issues.apache.org/jira/browse/HBASE-24790
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Yechao Chen
>Assignee: Yechao Chen
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> remove unused counter from SplitLogCounters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24695) FSHLog - close the current WAL file in a background thread

2020-07-08 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24695:
--

 Summary: FSHLog - close the current WAL file in a background thread
 Key: HBASE-24695
 URL: https://issues.apache.org/jira/browse/HBASE-24695
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John


We have this as a TODO in code already
{code}
// It is at the safe point. Swap out writer from under the blocked writer 
thread.
  // TODO: This is close is inline with critical section. Should happen in 
background?
  if (this.writer != null) {
oldFileLen = this.writer.getLength();
try {
  TraceUtil.addTimelineAnnotation("closing writer");
  this.writer.close();
  TraceUtil.addTimelineAnnotation("writer closed");
  this.closeErrorCount.set(0);
}
{code}
This close call in critical section and writes are blocked. Lets move this 
close call into another WALCloser thread. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24679) HBase on Cloud Blob FS : Provide config to skip HFile archival while table deletion

2020-07-04 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24679:
--

 Summary: HBase on Cloud Blob FS : Provide config to skip HFile 
archival while table deletion 
 Key: HBASE-24679
 URL: https://issues.apache.org/jira/browse/HBASE-24679
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 3.0.0-alpha-1, 2.4.0


When we delete a table as part of delete of table from FS, we do below things
1. Rename to table directory to come under /hbase/.tmp. This is an atomic 
rename op
2. Go through each of HFiles under every region:cf and archive that one by one. 
(Rename the file from .tmp path to go to /hbase/archive)
3. Delete the table dir under .tmp dir

In case of HDFS this is not a big deal as every rename op is just a meta op 
(Though the HFiles archival is a costly only as there will be so many calls to 
NN based the table's regions# and total storesfiles#)  But on Cloud blob based 
FS impl, this is a concerning op. Every rename will be a copy blob op. And we 
are doing it twice per each of the HFiles in this table !

The proposal here is to provide a config option (default to false) to skip this 
archival step.
We can provide another config to even avoid the .tmp rename? The atomicity of 
the Table delete can be achieved by HM side procedure and proc WAL. In table 
delete the 1st step is to delete the table form META anyways




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24678) Add Bulk load param details into its responseTooSlow log

2020-07-04 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24678:
--

 Summary: Add Bulk load param details into its responseTooSlow log
 Key: HBASE-24678
 URL: https://issues.apache.org/jira/browse/HBASE-24678
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John


Right now the log will come like
{code}
(responseTooSlow): 
{"call":"BulkLoadHFile(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$BulkLoadHFileRequest)","starttimems":1593820455043,"responsesize":2,"method":"BulkLoadHFile","param":"TODO:
 class 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$BulkLoadHFileRequest",..}
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24189) WALSplit recreates region dirs for deleted table with recovered edits data

2020-06-13 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24189.

Hadoop Flags: Reviewed
  Resolution: Fixed

> WALSplit recreates region dirs for deleted table with recovered edits data
> --
>
> Key: HBASE-24189
> URL: https://issues.apache.org/jira/browse/HBASE-24189
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, wal
>Affects Versions: 2.2.4
> Environment: * HDFS 3.1.3
>  * HBase 2.1.4
>  * OpenJDK 8
>Reporter: Andrey Elenskiy
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.2.6
>
>
> Under the following scenario region directories in HDFS can be recreated with 
> only recovered.edits in them:
>  # Create table "test"
>  # Put into "test"
>  # Delete table "test"
>  # Create table "test" again
>  # Crash the regionserver to which the put has went to force the WAL replay
>  # Region directory in old table is recreated in new table
>  # hbase hbck returns inconsistency
> This appears to happen due to the fact that WALs are not cleaned up once a 
> table is deleted and they still contain the edits from old table. I've tried 
> wal_roll command on the regionserver before crashing it, but it doesn't seem 
> to help as under some circumstances there are still WAL files around. The 
> only solution that works consistently is to restart regionserver before 
> creating the table at step 4 because that triggers log cleanup on startup: 
> [https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508|https://github.com/apache/hbase/blob/f3ee9b8aa37dd30d34ff54cd39fb9b4b6d22e683/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/store/wal/WALProcedureStore.java#L508)]
>  
> Truncating a table also would be a workaround by in our case it's a no-go as 
> we create and delete tables in our tests which run back to back (create table 
> in the beginning of the test and delete in the end of the test).
> A nice option in our case would be to provide hbase shell utility to force 
> clean up of log files manually as I realize that it's not really viable to 
> clean all of those up every time some table is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24340) PerformanceEvaluation options should not mandate any specific order

2020-06-08 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24340.

Hadoop Flags: Reviewed
  Resolution: Fixed

> PerformanceEvaluation options should not mandate any specific order
> ---
>
> Key: HBASE-24340
> URL: https://issues.apache.org/jira/browse/HBASE-24340
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Anoop Sam John
>Assignee: Sambit Mohapatra
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> During parsing of options, there are some validations.  One such is checking 
> whether autoFlush = false AND multiPut > 0.  This validation code mandates an 
> order that autoFlush=true should be specified before adding multiPut = x in 
> PE command.
> {code}
> final String multiPut = "--multiPut=";
>   if (cmd.startsWith(multiPut)) {
> opts.multiPut = Integer.parseInt(cmd.substring(multiPut.length()));
> if (!opts.autoFlush && opts.multiPut > 0) {
>   throw new IllegalArgumentException("autoFlush must be true when 
> multiPut is more than 0");
> }
> continue;
>   }
> {code}
> 'autoFlush ' default value is false. If multiPut is specified prior to 
> autoFlush in the PE command, we will end up throwing IllegalArgumentException.
> Checking other validations, seems not having such issue.  Still better to 
> move all the validations together into a private method and call that once 
> the parse is over.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24441) CacheConfig details logged at Store open is not really useful

2020-05-26 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24441:
--

 Summary: CacheConfig details logged at Store open is not really 
useful
 Key: HBASE-24441
 URL: https://issues.apache.org/jira/browse/HBASE-24441
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John


CacheConfig constructor is logging 'this' object at INFO level. This log comes 
during Store open(As CacheConfig instance for that store is created). As the 
log is at CacheConfig only, we don't get to know this is for which 
region:store. So not really useful log.
{code}
blockCache=org.apache.hadoop.hbase.io.hfile.CombinedBlockCache@7bc02941, 
cacheDataOnRead=true, cacheDataOnWrite=true, cacheIndexesOnWrite=false, 
cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
prefetchOnOpen=false
{code}
Also during every compaction also this logs keeps coming. This is because 
during compaction we create new CacheConfig based on the HStore level 
CacheConfig object.  We can avoid this log with every compaction happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24421) Support loading cluster level CPs from Hadoop file system

2020-05-23 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24421:
--

 Summary: Support loading cluster level CPs from Hadoop file system
 Key: HBASE-24421
 URL: https://issues.apache.org/jira/browse/HBASE-24421
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 3.0.0-alpha-1


Right now we allow configuring CPs, which needs to be loaded from hadoop FS, at 
table level. (Via the Java API or shell)
> alter 't1', METHOD => 'table_att', 
> 'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'
But for the cluster level CPs at Master/RS/WAL level, only way is to config at 
hbase-site.xml. But here we dont allow to specify any jar path. This jira 
suggest to add such a feature
Note: We already support config the priority of CP at xml level 
(FQCN|).  Same way how shell command works we can take the jar 
pathalso. ||
If no '|' separator at all, consider that as FQCN in the classpath. If one '|' 
that will be FQCN and priority (Same as of today). If 2 '|' separators we 
consider the 1st part as path to the external jar.

This will help in cloud scenario specially with auto scaling. Or else customer 
should be executing some special scripts to make the CP jar available within 
the HBase classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24362) Remove/Retain deprecated configs

2020-05-12 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24362:
--

 Summary: Remove/Retain deprecated configs
 Key: HBASE-24362
 URL: https://issues.apache.org/jira/browse/HBASE-24362
 Project: HBase
  Issue Type: Umbrella
Reporter: Anoop Sam John
 Fix For: 3.0.0-alpha-1


This umbrella issue is to discuss and decide on each config which are 
deprecated in 2.x line (Or even before itself). Whether we can completely 
remove them from code. If so whether we need have some migration tool/step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24352) Skip HDFSBlockDistribution calc when FS is not HDFS

2020-05-11 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24352:
--

 Summary: Skip HDFSBlockDistribution calc when FS is not HDFS
 Key: HBASE-24352
 URL: https://issues.apache.org/jira/browse/HBASE-24352
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 3.0.0-alpha-1, 2.4.0


HDFSBlockDistribution is used in different places like Balancer,  
DateTieredCompaction, CompactionTool etc. In Balancer area there is a config 
'hbase.master.balancer.uselocality' using which we can skip this.  But 
irrespective of this config, if we are on non HDFS FS, we should skip this.  
The block distribution issue many file status commands to underlying FS which 
wont be that cheap in a cloud FS.  This jira aims at correcting all these 
places. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24341) The region should be removed from ConfigurationManager as a ConfigurationObserver when it is closed

2020-05-11 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24341.

Hadoop Flags: Reviewed
  Resolution: Fixed

> The region should be removed from ConfigurationManager as a 
> ConfigurationObserver when it is closed
> ---
>
> Key: HBASE-24341
> URL: https://issues.apache.org/jira/browse/HBASE-24341
> Project: HBase
>  Issue Type: Improvement
>Reporter: Junhong Xu
>Assignee: Junhong Xu
>Priority: Minor
> Fix For: 3.0.0-alpha-1
>
>
> When the region is opened, we register the region to the ConfigurationManager 
> as a ConfigurationObserver object.However, when the region is closed, we 
> don't deregister the region as a ConfigurationObserver object from the 
> ConfigurationManager correspondingly. It's not a bug for now cos we can 
> update the conf  whenever the region is open or not.But it is bug-prone, and 
> we should remove it from the ConfigureManager object when the region is 
> closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24340) PerformanceEvaluation options should not mandate any specific order

2020-05-06 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24340:
--

 Summary: PerformanceEvaluation options should not mandate any 
specific order
 Key: HBASE-24340
 URL: https://issues.apache.org/jira/browse/HBASE-24340
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.1.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John


During parsing of options, there are some validations.  One such is checking 
whether autoFlush = false AND multiPut > 0.  This validation code mandates an 
order that autoFlush=true should be specified before adding multiPut = x in PE 
command.
{code}
final String multiPut = "--multiPut=";
  if (cmd.startsWith(multiPut)) {
opts.multiPut = Integer.parseInt(cmd.substring(multiPut.length()));
if (!opts.autoFlush && opts.multiPut > 0) {
  throw new IllegalArgumentException("autoFlush must be true when 
multiPut is more than 0");
}
continue;
  }
{code}
'autoFlush ' default value is false. If multiPut is specified prior to 
autoFlush in the PE command, we will end up throwing IllegalArgumentException.
Checking other validations, seems not having such issue.  Still better to move 
all the validations together into a private method and call that once the parse 
is over.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24311) Add more details in MultiVersionConcurrencyControl STUCK log message

2020-05-05 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-24311.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to branch-2 and master. Thanks all for the reviews.

> Add more details in MultiVersionConcurrencyControl STUCK log message
> 
>
> Key: HBASE-24311
> URL: https://issues.apache.org/jira/browse/HBASE-24311
> Project: HBase
>  Issue Type: Improvement
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Now the warn logs look like
> {code}
> STUCK: MultiVersionConcurrencyControl{readPoint=30944485, writePoint=30944498}
> {code}
> We don't have any details on which region the writes are stuck. It would be 
> better to include the region name detail also in this log.
> May be we can even log for how long we are waiting for the read point to 
> catch up.
> cc [~stack]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24311) Add more details in MultiVersionConcurrencyControl STUCK message

2020-05-03 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24311:
--

 Summary: Add more details in MultiVersionConcurrencyControl STUCK 
message
 Key: HBASE-24311
 URL: https://issues.apache.org/jira/browse/HBASE-24311
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John


Now the warn logs look like
STUCK: MultiVersionConcurrencyControl{readPoint=30944485, writePoint=30944498}
We dont have any details on which region the writes are stuck. It would be 
better to include the region name detail also in this log.
May be we can even log for how long we are waiting for the read point to catch 
up.
cc [~stack]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24208) Remove RS entry from zk draining servers node while RS getting stopped

2020-04-17 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24208:
--

 Summary: Remove RS entry from zk draining servers node while RS 
getting stopped
 Key: HBASE-24208
 URL: https://issues.apache.org/jira/browse/HBASE-24208
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John


When a RS is been decommissioned, we will add an entry into the zk node. This 
will be there unless the same RS instance is recommissioned. 
But if we want to scale down a cluster, the best path would be to decommission 
the RSs in the scaling down nodes.  The regions in these RSs will get moved to 
live RSs. In this case these decommissioned RSs will get stopped later. These 
will never get recommissioned.  The zk nodes will still be there under draining 
servers path.
We can remove this zk node when the RS is getting stopped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24102) RegionMover should exclude draining/decommissioning nodes from target RSs

2020-04-01 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-24102:
--

 Summary: RegionMover should exclude draining/decommissioning nodes 
from target RSs
 Key: HBASE-24102
 URL: https://issues.apache.org/jira/browse/HBASE-24102
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
 Fix For: 3.0.0, 2.4.0


When using RegionMover tool to unload the regions from a given RS, it decides 
the list of destination RSs by 
{code}
List regionServers = new ArrayList<>();
regionServers.addAll(admin.getRegionServers());
// Remove the host Region server from target Region Servers list
ServerName server = stripServer(regionServers, hostname, port);
.
// Remove RS present in the exclude file
stripExcludes(regionServers);
stripMaster(regionServers);
{code}
Ya it is removing the RSs mentioned in the excludes list.  
Better when the RegionMover user is mentioning any exclude list, we can exlcude 
the draining/decommissioning RSs
Admin#listDecommissionedRegionServers()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23832) Old config hbase.hstore.compactionThreshold is ignored

2020-02-12 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-23832:
--

 Summary: Old config hbase.hstore.compactionThreshold is ignored
 Key: HBASE-23832
 URL: https://issues.apache.org/jira/browse/HBASE-23832
 Project: HBase
  Issue Type: Bug
Reporter: Anoop Sam John


In 2.x we added new name 'hbase.hstore.compaction.min' for this.  Still for 
compatibility we allow the old config name and honor that in code
{code}
minFilesToCompact = Math.max(2, conf.getInt(HBASE_HSTORE_COMPACTION_MIN_KEY,
  /*old name*/ conf.getInt("hbase.hstore.compactionThreshold", 3)));
{code}
But if hbase.hstore.compactionThreshold alone is configured by user, there is 
no impact of that.
This is because in hbase-default.xml we have the new config with a value of 3. 
So the call conf.getInt(HBASE_HSTORE_COMPACTION_MIN_KEY) always return a value 
3 even if it is not explicitly configured by customer and instead used the old 
key.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23788) ROW_INDEX_V1 encoder should consider the secondary index size with the encoded data size tracking

2020-02-03 Thread Anoop Sam John (Jira)
Anoop Sam John created HBASE-23788:
--

 Summary: ROW_INDEX_V1 encoder should consider the secondary index 
size with the encoded data size tracking
 Key: HBASE-23788
 URL: https://issues.apache.org/jira/browse/HBASE-23788
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23342) Handle NPE while closing compressingStream

2019-12-01 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-23342.

Hadoop Flags: Reviewed
  Resolution: Fixed

> Handle NPE while closing compressingStream
> --
>
> Key: HBASE-23342
> URL: https://issues.apache.org/jira/browse/HBASE-23342
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Trivial
> Fix For: 3.0.0
>
>
> EncodedDataBlock.getCompressedSize() can produce NPE while closing 
> compressingStream if createCompressionStream() throws IOException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-9065) getClosestRowBefore ignores passed column, acts on the family only

2019-10-30 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-9065.
---
Resolution: Won't Fix

Too old issue and we dont have this API anymore.

> getClosestRowBefore ignores passed column, acts on the family only
> --
>
> Key: HBASE-9065
> URL: https://issues.apache.org/jira/browse/HBASE-9065
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.95.1
>Reporter: Michael Stack
>Priority: Major
>
> If you ask for info:regioninfo, it gives you back all columns.  Thats 
> unexpected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-20284) Reduce heap overhead : Investigate removal of NavigableSet 'blocksByHFile'

2018-03-25 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-20284:
--

 Summary: Reduce heap overhead : Investigate removal of 
NavigableSet 'blocksByHFile'
 Key: HBASE-20284
 URL: https://issues.apache.org/jira/browse/HBASE-20284
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John


This Set takes 40 bytes per entry (block).   As of now the total heap 
requirement per entry is 160.   If we can avoid this Set it is 25% reduction.   
This set is used for removal of blocks for a specific HFile after its 
invalidation (Mostly because of its compaction or by Store close).  Check other 
ways to remove the blocks. May be in an async way after the compaction is over 
by a dedicated cleaner thread (?) It might be ok not to remove the invalidated 
file's entries immediately. When the cache is out of space, the Eviction thread 
might select it and remove.  Few things to consider/change
1.  When compaction process reads blocks , it might be delivered from cache. We 
should not consider this access as a real block access for this block. That 
will increase the chances of eviction thread selecting this block for removal. 
We should be able to distinguish the Cache read by compaction process/user read 
process clearly
2. When the compaction process reads a block from cache, some way we can mark 
this block (using one byte boolean) that it is just went with the compaction?  
When later the Eviction thread to select a block and if there is tie because of 
same access time/count,  we can break this tie in favor of selecting the 
already compacted block?  Need to check its pros and cons. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-17383) Improve log msg when offheap memstore breaches higher water mark

2018-03-22 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-17383.

Resolution: Cannot Reproduce
  Assignee: (was: ramkrishna.s.vasudevan)

No longer this case as some other cleanup corrected the log. Just closing as 
can not reproduce

> Improve log msg when offheap memstore breaches higher water mark
> 
>
> Key: HBASE-17383
> URL: https://issues.apache.org/jira/browse/HBASE-17383
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Priority: Trivial
>
> Currently we get this log
> {code}
> 2016-12-28 21:11:14,349 INFO  
> [RpcServer.deafult.FPBQ.Fifo.handler=39,queue=9,port=16041] 
> regionserver.MemStoreFlusher: Blocking updates on 
> stobdtserver5,16041,1482938527980: the global offheap memstore size 12.6 G + 
> global memstore heap overhead 4.0 G is >= than blocking 12.6 G size
> {code}
> Here the global offheap memstore size is greater than the blocking size. The 
> memstore heap overhead need not be included in this log unless the higher 
> water mark breach is only due to the heap overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20060) Add details of off heap memstore into book.

2018-02-23 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-20060:
--

 Summary: Add details of off heap memstore into book.
 Key: HBASE-20060
 URL: https://issues.apache.org/jira/browse/HBASE-20060
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20059) Make sure documentation is updated for the offheap Bucket cache usage

2018-02-23 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-20059:
--

 Summary: Make sure documentation is updated for the offheap Bucket 
cache usage
 Key: HBASE-20059
 URL: https://issues.apache.org/jira/browse/HBASE-20059
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20003) WALLess HBase on Persistent Memory

2018-02-14 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-20003:
--

 Summary: WALLess HBase on Persistent Memory
 Key: HBASE-20003
 URL: https://issues.apache.org/jira/browse/HBASE-20003
 Project: HBase
  Issue Type: New Feature
Reporter: Anoop Sam John
Assignee: Anoop Sam John


This JIRA aims to make use of persistent memory (pmem) technologies in HBase. 
One such usage is to make the Memstore to reside on pmem. Making a persistent 
memstore would remove the need for WAL and paves way for a WALLess HBase. 

The existing region replica feature could be used here and ensure the data 
written to memstores are synchronously replicated to the replicas and ensure 
strong consistency of the data. (pipeline model)

Advantages :
-Data Availability : Since the data across replicas are consistent 
(synchronously written) our data is always 100 % available.
-Lower MTTR : It becomes easier/faster to switch over to the replicas on a 
primary region failure as there is no WAL replay involved. Building the 
memstore map data also is much faster than reading the WAL and replaying the 
WAL.
-Possibility of bigger memstores : These pmems are designed to have more memory 
than DRAMs so it would also enable us to have bigger sized memstores which 
leads to lesser flushes/compaction IO. 
-Removes the dependency of HDFS on the write path

Initial PoC has been designed and developed. Testing is underway and we would 
publish the PoC results along with the design doc sooner. The PoC doc will talk 
about the design decisions, the libraries considered to work with these pmem 
devices, pros and cons of those libraries and the performance results.

Note : Next gen memory technologies using 3DXPoint gives persistent memory 
feature. Such memory DIMMs are soon to appear in the market. The PoC is done 
around Intel's ApachePass (AEP)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-19505) Disable ByteBufferPool by default at HM

2017-12-13 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19505:
--

 Summary: Disable ByteBufferPool by default at HM
 Key: HBASE-19505
 URL: https://issues.apache.org/jira/browse/HBASE-19505
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0-beta-1






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19452) Turn ON off heap Bucket Cache b default

2017-12-07 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19452:
--

 Summary: Turn ON off heap Bucket Cache b default
 Key: HBASE-19452
 URL: https://issues.apache.org/jira/browse/HBASE-19452
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John


BC's hbase.bucketcache.ioengine by default is empty now. Means now BC.
Make this default to be 'offheap'.  And the default off heap size for the BC 
also to be provided. This can be 8 GB?
Also we should provide a new option 'none' for this hbase.bucketcache.ioengine  
now for users who dont need BC at all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19451) Reduce default Block Cache size percentage

2017-12-07 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19451:
--

 Summary: Reduce default Block Cache size percentage
 Key: HBASE-19451
 URL: https://issues.apache.org/jira/browse/HBASE-19451
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John


This is 40% by default now. Reduce this to be 20%?  Or even 10%?
It only needs to keep index and bloom blocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-19439) Mark ShortCircuitMasterConnection with InterfaceAudience Private

2017-12-06 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-19439.

  Resolution: Fixed
Hadoop Flags: Reviewed

Thanks for the quick review.

> Mark ShortCircuitMasterConnection  with InterfaceAudience Private
> -
>
> Key: HBASE-19439
> URL: https://issues.apache.org/jira/browse/HBASE-19439
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-alpha-1
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-19439.patch
>
>
> ShortCircuitMasterConnection to be Private - It is wrongly declared Public.  
> (HBASE-17745)
> Even MasterKeepAliveConnection also to be Private. This having no 
> InterfaceAudience now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19439) Mark ShortCircuitMasterConnection with InterfaceAudience Private

2017-12-06 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19439:
--

 Summary: Mark ShortCircuitMasterConnection  with InterfaceAudience 
Private
 Key: HBASE-19439
 URL: https://issues.apache.org/jira/browse/HBASE-19439
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha-1
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0-beta-1


ShortCircuitMasterConnection to be Private - It is wrongly declared Public.  
(HBASE-17745)
Even MasterKeepAliveConnection also to be Private. This having no 
InterfaceAudience now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19438) Doc cleanup after removal of features across Cache/BucketCache

2017-12-05 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19438:
--

 Summary: Doc cleanup after removal of features across 
Cache/BucketCache
 Key: HBASE-19438
 URL: https://issues.apache.org/jira/browse/HBASE-19438
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 2.0.0-beta-1






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19357) Bucket cache no longer L2 for LRU cache

2017-11-27 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19357:
--

 Summary: Bucket cache no longer L2 for LRU cache
 Key: HBASE-19357
 URL: https://issues.apache.org/jira/browse/HBASE-19357
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0-beta-1


When Bucket cache is used, by default we dont configure it as an L2 cache 
alone. The default setting is combined mode ON where the data blocks to Bucket 
cache and index/bloom blocks go to LRU cache. But there is a way to turn this 
off and make LRU as L1 and Bucket cache as a victim handler for L1. It will be 
just L2.   
After the off heap read path optimization Bucket cache is no longer slower 
compared to L1. We have test results on data sizes from 12 GB.  The Alibaba use 
case was also with 12 GB and they have observed a ~30% QPS improve over the LRU 
cache.
This issue is to remove the option for combined mode = false. So when Bucket 
cache is in use, data blocks will go to it only and LRU will get only index 
/meta/bloom blocks.   Bucket cache will no longer be configured as a victim 
handler for LRU.

Note : WHen external cache is in use, there only the L1 L2 thing comes. LRU 
will be L1 and external cache act as its L2. That make full sense.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations

2017-11-19 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19301:
--

 Summary: Provide way for CPs to create short circuited connection 
with custom configurations
 Key: HBASE-19301
 URL: https://issues.apache.org/jira/browse/HBASE-19301
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0-beta-1


Over in HBASE-18359 we have discussions for this.
Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But 
this returns a pre created connection (per server).  This uses the configs at 
hbase-site.xml at that server. 
Phoenix needs creating connection in CP with some custom configs. Having this 
custom changes in hbase-site.xml is harmful as that will affect all connections 
been created at that server.
This issue is for providing an overloaded getConnection(Configuration) API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19235) CoprocessorEnvironment should be exposed to CPs

2017-11-09 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19235:
--

 Summary: CoprocessorEnvironment should be exposed to CPs
 Key: HBASE-19235
 URL: https://issues.apache.org/jira/browse/HBASE-19235
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0-alpha-4
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Minor
 Fix For: 2.0.0-beta-1


Its sub interfaces are exposed with 
LimitedPrivate(HBaseInterfaceAudience.COPROC).  So ideally all the functions in 
this are.  Better we mark CoprocessorEnvironment also as CP exposed to avoid 
confusion.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-18708) Configure on-heap bucketCache size using percentage of heap size

2017-11-06 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-18708.

Resolution: Won't Fix
  Assignee: (was: Biju Nair)

HBASE-19187 will remove the on heap version of BC. So resolving this as 'Wont 
fix'

> Configure on-heap bucketCache size using percentage of heap size
> 
>
> Key: HBASE-18708
> URL: https://issues.apache.org/jira/browse/HBASE-18708
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Biju Nair
>Priority: Trivial
> Attachments: HBASE-18708-BRANCH-1.PATCH
>
>
> Currently heap allocations for RS memory structures like {{memstore}} and 
> {{lruCache}} are configured as percentage of total RS heap size. Since 
> on-heap bucketCache uses RS heap, configuring it as a percentage of heap size 
> will improve usability. Currently this can be configured either as a 
> percentage of heap or a memory size in MiB and we can remove the latter 
> option which is applicable to external or off-heap bucketCache.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19067) Do not expose getHDFSBlockDistribution in StoreFile

2017-10-22 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19067:
--

 Summary: Do not expose getHDFSBlockDistribution in StoreFile
 Key: HBASE-19067
 URL: https://issues.apache.org/jira/browse/HBASE-19067
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0-alpha-4






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-19045) Deprecate RegionObserver#postInstantiateDeleteTracker

2017-10-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-19045.

  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed to branch-2 and master. Thanks Stack

> Deprecate RegionObserver#postInstantiateDeleteTracker
> -
>
> Key: HBASE-19045
> URL: https://issues.apache.org/jira/browse/HBASE-19045
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0-alpha-4
>
> Attachments: HBASE-19045.patch
>
>
> Too much of an internal thing to be exposed to CPs.
> This was added a VC feature as that is implemented as a CP , no other choice 
> then.
> We can deprecate this now with a warn that not be used.
> We might be changing AC/VC etc to be core service than CP impl for 3.0 (?)
> DeleteTracker is IA Private even



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19048) Cleanup MasterObserver hooks which takes IA private params

2017-10-19 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19048:
--

 Summary: Cleanup MasterObserver hooks which takes IA private params
 Key: HBASE-19048
 URL: https://issues.apache.org/jira/browse/HBASE-19048
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
 Fix For: 2.0.0-alpha-4


These are the ones in MasterObserver
preAbortProcedure   ProcedureExecutor
postGetProcedures   Procedure
postGetLocksLockedResource
preRequestLock  LockType
postRequestLock LockType
preLockHeartbeatLockProcedure
postLockHeartbeat   LockProcedure



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19047) CP exposed Scanner types should not extend Shipper

2017-10-19 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19047:
--

 Summary: CP exposed Scanner types should not extend Shipper
 Key: HBASE-19047
 URL: https://issues.apache.org/jira/browse/HBASE-19047
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0-alpha-4


Shipper is a IA.Private interface and very much internal.. 
Right now CP exposed RegionScanner is extending this and so exposing the 
shipped() method. This by mistake is called, can harm the correctness of the 
cells in the Results.

preScannerOpen() allowing to return a new Scanner is also problematic now.  
This can allow users to create a Region scanner from Region and then wrap it 
and return back (Well same can be done by postScannerOpen also), it can so 
happen that the wrapper is not implementing the shipped() properly.  In any way 
exposing the shipped () is problematic.

Solution
1. Remove preScannerOpen() , the use case I can think of is wrapping the 
original scanner. The original scanner can be created by Region.getScanner way 
only..  May be no need to remove this hook.  Just remove the ability for it to 
return a RegionScanner instance.  Call this with the Scan object and the CP can 
change the Scan object if they want.
2. Let RegionScanner not extending Shipper but only RegionScannerImpl 
implements this
3. We have ref to the RegionScanner created by core and let that be used by 
RegionScannerShippedCallBack when the post hook doing a wrap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19046) RegionObserver#postCompactSelection Passing an ImmutableList param of PB type

2017-10-19 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19046:
--

 Summary: RegionObserver#postCompactSelection  Passing an 
ImmutableList param of PB type
 Key: HBASE-19046
 URL: https://issues.apache.org/jira/browse/HBASE-19046
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0-alpha-4


Dont think there is any specific need for passing this PB type.  We can just 
pass an unmodifiableList list object. Javadoc can say this is an 
unmodifiableList .  Thats it . No need  that the type itself to be Immutable...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19045) Deprecate RegionObserver#postInstantiateDeleteTracker

2017-10-19 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-19045:
--

 Summary: Deprecate RegionObserver#postInstantiateDeleteTracker
 Key: HBASE-19045
 URL: https://issues.apache.org/jira/browse/HBASE-19045
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0-alpha-4


Too much of an internal thing to be exposed to CPs.
This was added a VC feature as that is implemented as a CP , no other choice 
then.
We can deprecate this now with a warn that not be used.
We might be changing AC/VC etc to be core service than CP impl for 3.0 (?)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18906) Investigate Phoenix usages of Region#waitForXXX APIs

2017-09-28 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-18906:
--

 Summary: Investigate Phoenix usages of Region#waitForXXX APIs
 Key: HBASE-18906
 URL: https://issues.apache.org/jira/browse/HBASE-18906
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John


While reviewing HBASE-18183, Andy pointed out that Phoenix uses 
waitForFlushesAndCompactions and/or waitForFlushes for diff reasons.  This 
issue is to see why they need them and whether alternate ways are possible. 
This seems to be too much internal stuff and a normal CP hook calling these 
would be dangerous.
If there are alternate ways for Phoenix not to use this and not landing in 
issues (As said by Andy) we should suggest/fix for them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18905) Allow CPs to request flush/compaction on Region

2017-09-28 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-18905:
--

 Summary: Allow CPs to request flush/compaction on Region
 Key: HBASE-18905
 URL: https://issues.apache.org/jira/browse/HBASE-18905
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John


Follow up for HBASE-18183
As per that Jira, we keep only requestCompaction API in Region.  We did not 
have any such for flush in Region.  Only API which was there is a flush which 
will block the callee unless flush is done. This issue has to tacke
1. Decide whether we need a requestFlush in Region and if so add
2. Whether the requestCompaction (And requestFlush too) should return a Future? 
 Right now the former do  not return any but allow to pass a 
CompactionLifeCycleTracker which will get notified on start and end of 
compaction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18898) Provide way for the core flow to know whether CP implemented each of the hooks

2017-09-27 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-18898:
--

 Summary: Provide way for the core flow to know whether CP 
implemented each of the hooks
 Key: HBASE-18898
 URL: https://issues.apache.org/jira/browse/HBASE-18898
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John


This came as a discussion topic at the tale of HBASE-17732
Can we have a way in the code (before trying to call the hook) to know whether 
the user has implemented one particular hook or not? eg: On write related hooks 
only prePut() might be what the user CP implemented. All others are just dummy 
impl from the interface. Can we have a way for the core code to know this and 
avoid the call to other dummy hooks fully? Some times we do some processing for 
just calling CP hooks (Say we have to make a POJO out of PB object for calling) 
and if the user CP not impl this hook, we can avoid this extra work fully. The 
pain of this will be more when we have to later deprecate one hook and add new. 
So the dummy impl in new hook has to call the old one and that might be doing 
some extra work normally.
If the CP f/w itself is having a way to tell this, the core code can make use. 
What am expecting is some thing like in PB way where we can call 
CPObject.hasPre(), then CPObject. pre ().. Should not like asking users 
to impl this extra ugly thing. When the CP instance is loaded in the RS/HM, 
that object will be having this info also. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Reopened] (HBASE-18448) EndPoint example for refreshing HFiles for stores

2017-08-24 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John reopened HBASE-18448:


> EndPoint example  for refreshing HFiles for stores
> --
>
> Key: HBASE-18448
> URL: https://issues.apache.org/jira/browse/HBASE-18448
> Project: HBase
>  Issue Type: Sub-task
>  Components: Coprocessors
>Affects Versions: 2.0.0, 1.3.1
>Reporter: Ajay Jadhav
>Assignee: Ajay Jadhav
>Priority: Minor
> Fix For: 3.0.0, 2.0.0-alpha-3
>
> Attachments: HBASE-18448.branch-1.001.patch, 
> HBASE-18448.branch-1.002.patch, HBASE-18448.branch-1.003.patch, 
> HBASE-18448.branch-1.004.patch, HBASE-18448.branch-1.005.patch, 
> HBASE-18448.branch-1.006.patch, HBASE-18448.branch-1.007.patch, 
> HBASE-18448.master.001.patch, HBASE-18448.master.002.patch
>
>
> In the case where multiple HBase clusters are sharing a common rootDir, even 
> after flushing the data from
> one cluster doesn't mean that other clusters (replicas) will automatically 
> pick the new HFile. Through this patch,
> we are exposing the refresh HFiles API which when issued from a replica will 
> update the in-memory file handle list
> with the newly added file.
> This allows replicas to be consistent with the data written through the 
> primary cluster. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18673) Some more unwanted reference to unshaded PB classes

2017-08-24 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-18673:
--

 Summary: Some more unwanted reference to unshaded PB classes
 Key: HBASE-18673
 URL: https://issues.apache.org/jira/browse/HBASE-18673
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Priority: Minor


ProtobufLogReader  - Seems using Unshaded CIS which seems a miss
HBaseFsck - Some public methods throw PB ServiceException. Its strange. No code 
within that throws this.
Public exposed PBType class.  I dont know what this type allows the users to 
do. Make their own Type?  If so the Unshaded might be ok.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-18664) netty-transport-native-epoll is not found using 2.0.0-alpha2

2017-08-23 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-18664.

Resolution: Duplicate

Duplicate of HBASE-18663

> netty-transport-native-epoll is not found using 2.0.0-alpha2 
> -
>
> Key: HBASE-18664
> URL: https://issues.apache.org/jira/browse/HBASE-18664
> Project: HBase
>  Issue Type: Bug
>Reporter: Miklos Csanady
>
> I am working on a Flume HBase sink module.
> When I switch maven version number for hbase* dependencies from 2.0.0-alpha-1 
> to 2.0.0-alpha2 the build fails:
> {code}
> Stacktrace
> java.io.IOException: Shutting down
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:232)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1065)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:936)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:930)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:859)
>   at 
> org.apache.flume.sink.hbase2.TestHBase2Sink.setUpOnce(TestHBase2Sink.java:85)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
>   at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
>   at 
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
> Caused by: java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMasterno netty-transport-native-epoll in 
> java.library.path
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:217)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:152)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214)
>   ... 29 more
> Caused by: java.lang.UnsatisfiedLinkError: failed to load the required native 
> library
>   at 
> org.apache.hadoop.hbase.shaded.io.netty.channel.epoll.Epoll.ensureAvailability(Epoll.java:78)
>   at 
> org.apache.hadoop.hbase.shaded.io.netty.channel.epoll.EpollEventLoopGroup.(EpollEventLoopGroup.java:38)
>   at 
> org.apache.hadoop.hbase.util.NettyEventLoopGroupConfig.(NettyEventLoopGroupConfig.java:61)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:554)
>   at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:469)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance

[jira] [Created] (HBASE-18298) RegionServerServices Interface cleanup for CP expose

2017-06-29 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-18298:
--

 Summary: RegionServerServices Interface cleanup for CP expose
 Key: HBASE-18298
 URL: https://issues.apache.org/jira/browse/HBASE-18298
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18183) Region interface cleanup for CP expose

2017-06-07 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-18183:
--

 Summary: Region interface cleanup for CP expose
 Key: HBASE-18183
 URL: https://issues.apache.org/jira/browse/HBASE-18183
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HBASE-17757) Unify blocksize after encoding to decrease memory fragment

2017-04-27 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John reopened HBASE-17757:


Reopen so as to commit to branch-1 also.. Can u pls provide patch for the same.

> Unify blocksize after encoding to decrease memory fragment 
> ---
>
> Key: HBASE-17757
> URL: https://issues.apache.org/jira/browse/HBASE-17757
> Project: HBase
>  Issue Type: New Feature
>Reporter: Allan Yang
>Assignee: Allan Yang
> Fix For: 2.0.0
>
> Attachments: HBASE-17757.patch, HBASE-17757v2.patch, 
> HBASE-17757v3.patch, HBASE-17757v4.patch
>
>
> Usually, we store encoded block(uncompressed) in blockcache/bucketCache. 
> Though we have set the blocksize, after encoding, blocksize is varied. Varied 
> blocksize will cause memory fragment problem, which will result in more FGC 
> finally.In order to relief the memory fragment, This issue adjusts the 
> encoded block to a unified size.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-15179) Cell/DBB end-to-end on the write-path

2017-04-05 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-15179.

  Resolution: Fixed
Hadoop Flags: Reviewed

> Cell/DBB end-to-end on the write-path
> -
>
> Key: HBASE-15179
> URL: https://issues.apache.org/jira/browse/HBASE-15179
> Project: HBase
>  Issue Type: Umbrella
>  Components: regionserver
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
>
> Umbrella jira to make the HBase write path off heap E2E. We have to make sure 
> we have Cells flowing in entire write path. Starting from request received in 
> RPC layer, till the Cells get flushed out as HFiles, we have to keep the Cell 
> data off heap.
> https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17874) Limiting of read request response size based on block size may go wrong when blocks are read from onheap or off heap bucket cache

2017-04-04 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17874:
--

 Summary: Limiting of read request response size based on block 
size may go wrong when blocks are read from onheap or off heap bucket cache
 Key: HBASE-17874
 URL: https://issues.apache.org/jira/browse/HBASE-17874
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 2.0.0


HBASE-14978 added this size limiting so as to make sure the multi read requests 
do not retain two many blocks. This works well when the blocks are obtained 
from any where other than memory mode BucketCache. In case of on heap or off 
heap Bucket Cache, the entire cache area is split into N ByteBuffers each of 
size 4 MB. When we hit a block in this cache, we no longer do copy data into 
temp array. We use the same shared memory (BB).  Its  capacity is 4 MB.
The block size accounting logic is RSRpcServices is like below
{code}
 if (c instanceof ByteBufferCell) {
  ByteBufferCell bbCell = (ByteBufferCell) c;
  ByteBuffer bb = bbCell.getValueByteBuffer();
  if (bb != lastBlock) {
context.incrementResponseBlockSize(bb.capacity());
lastBlock = bb;
  }
} else {
  // We're using the last block being the same as the current block as
  // a proxy for pointing to a new block. This won't be exact.
  // If there are multiple gets that bounce back and forth
  // Then it's possible that this will over count the size of
  // referenced blocks. However it's better to over count and
  // use two rpcs than to OOME the regionserver.
  byte[] valueArray = c.getValueArray();
  if (valueArray != lastBlock) {
context.incrementResponseBlockSize(valueArray.length);
lastBlock = valueArray;
  }
}
{code}
We take the BBCell's value buffer and takes its capacity. The cell is backed by 
the same BB that backs the HFileBlock. When the HFileBlock is created from the 
BC, we do as below duplicating and proper positioning and limiting the BB
{code}
 ByteBuffer bb = buffers[i].duplicate();
  if (i == startBuffer) {
cnt = bufferSize - startBufferOffset;
if (cnt > len) cnt = len;
bb.limit(startBufferOffset + cnt).position(startBufferOffset);
{code}
Still this BB's capacity is 4 MB.
This will make the size limit breach to happen too soon. What we expect is 
block size defaults to 64 KB and so we here by allow cells from different 
blocks to appear in response. We have a way to check whether we move from one 
block to next.
{code}
if (bb != lastBlock) {
...
lastBlock = bb;
}
{code}
But already just by considering the 1st cell, we added 4 MB size!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17819) Reduce the heap overhead for BucketCache

2017-03-22 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17819:
--

 Summary: Reduce the heap overhead for BucketCache
 Key: HBASE-17819
 URL: https://issues.apache.org/jira/browse/HBASE-17819
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


We keep Bucket entry map in BucketCache.  Below is the math for heapSize for 
the key , value into this map.
BlockCacheKey
---
String hfileName  -  Ref  - 4
long offset  - 8
BlockType blockType  - Ref  - 4
boolean isPrimaryReplicaBlock  - 1
Total  =  12 (Object) + 17 = 29

BucketEntry

int offsetBase  -  4
int length  - 4
byte offset1  -  1
byte deserialiserIndex  -  1
long accessCounter  -  8
BlockPriority priority  - Ref  - 4
volatile boolean markedForEvict  -  1
AtomicInteger refCount  -  16 + 4
long cachedTime  -  8
Total = 12 (Object) + 51 = 63

ConcurrentHashMap Map.Entry  -  40
blocksByHFile ConcurrentSkipListSet Entry  -  40

Total = 29 + 63 + 80 = 172

For 10 million blocks we will end up having 1.6GB of heap size.  
This jira aims to reduce this as much as possible



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HBASE-17781) TestAcidGuarantees is broken

2017-03-14 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-17781.

Resolution: Invalid

HBASE-17723 is not yet committed.  So if locally applying that patch makes a 
test to fail, the correction has to be done in that old issue itself. Pls 
report there at HBASE-17723 and get all resolved. I am closing this as Invalid

> TestAcidGuarantees is broken
> 
>
> Key: HBASE-17781
> URL: https://issues.apache.org/jira/browse/HBASE-17781
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Affects Versions: 2.0.0
> Environment: OS: Ubuntu 14.04
> Arch: x86_64
> uname -a = Linux xx-xx 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 
> 19:11:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Anup Halarnkar
> Fix For: 2.0.0
>
>
> Command:
> mvn clean install -X
> Previous History:
> After applying patch HBASE-17723, the test "TestSimpleRpcScheduler" passes 
> but, the below test fails.
> Final Output:
> Tests in error:
>   
> TestAcidGuarantees.testGetAtomicity:418->runTestAtomicity:315->runTestAtomicity:324->runTestAtomicity:388
>  » Runtime
>   TestAcidGuarantees.testMobGetAtomicity:435->runTestAtomicity:388 » Runtime 
> Def...
> Tests run: 1774, Failures: 0, Errors: 2, Skipped: 6



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17647) OffheapKeyValue#heapSize() implementation is wrong

2017-02-14 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17647:
--

 Summary: OffheapKeyValue#heapSize() implementation is wrong
 Key: HBASE-17647
 URL: https://issues.apache.org/jira/browse/HBASE-17647
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


We consider the key and data lengths also even though the data is actually in 
off heap area.  We should correct it.
The impact will be at ScannerContext limit tracking where we use heapSize of 
cells to account the result size.  So my proposal is to consider the cells 
length and heap size in Limit tracking and accounting.  We have a maxResultSize 
which defaults to 2MB.  When the sum of all cell's data size reaches 
'maxResultSize'  OR the sum of all cell's heap size reaches 'maxResultSize' , 
we need to send back the RPC response



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HBASE-17302) The region flush request disappeared from flushQueue

2016-12-19 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John reopened HBASE-17302:


> The region flush request disappeared from flushQueue
> 
>
> Key: HBASE-17302
> URL: https://issues.apache.org/jira/browse/HBASE-17302
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 0.98.23, 1.2.4
>Reporter: Guangxu Cheng
>Assignee: Guangxu Cheng
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17302-branch-1-addendum-v1.patch, 
> HBASE-17302-branch-1-addendum.patch, HBASE-17302-branch-1.2-v1.patch, 
> HBASE-17302-branch-master-v1.patch, HBASE-17302-master-addendum-v1.patch, 
> HBASE-17302-master-addendum.patch
>
>
> Region has too many store files delaying flush up to blockingWaitTime ms, and 
> the region flush request is requeued into the flushQueue.
> When the region flush request is requeued into the flushQueue frequently, the 
> request is inexplicably disappeared sometimes. 
> But regionsInQueue still contains the information of the region request, 
> which leads to new flush request can not be inserted into the flushQueue.
> Then, the region will not do flush anymore.
> In order to locate the problem, I added a lot of log in the code.
> {code:title=MemStoreFlusher.java|borderStyle=solid}
> private boolean flushRegion(final HRegion region, final boolean 
> emergencyFlush) {
> long startTime = 0;
> synchronized (this.regionsInQueue) {
>   FlushRegionEntry fqe = this.regionsInQueue.remove(region);
>   // Use the start time of the FlushRegionEntry if available
>   if (fqe != null) {
>   startTime = fqe.createTime;
>   }
>   if (fqe != null && emergencyFlush) {
>   // Need to remove from region from delay queue.  When NOT an
>   // emergencyFlush, then item was removed via a flushQueue.poll.
>   flushQueue.remove(fqe);
>  }
> }
> {code}
> When encountered emergencyFlush, the region flusher will be removed from the 
> flushQueue.
> By comparing the flushQueue content before and after remove, RegionA should 
> have been removed, it is possible to remove RegionB.
> {code:title=MemStoreFlusher.java|borderStyle=solid}
> public boolean equals(Object obj) {
>   if (this == obj) {
>   return true;
>   }
>   if (obj == null || getClass() != obj.getClass()) {
>   return false;
>   }
>   Delayed other = (Delayed) obj;
>   return compareTo(other) == 0;
> }
> {code}
> FlushRegionEntry in achieving the equals function, only comparison of the 
> delay time, if different regions of the same delay time, it is possible that 
> A wrong B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17338) Treat Cell data size under global memstore heap size only when that Cell can not be copied to MSLAB

2016-12-19 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17338:
--

 Summary: Treat Cell data size under global memstore heap size only 
when that Cell can not be copied to MSLAB
 Key: HBASE-17338
 URL: https://issues.apache.org/jira/browse/HBASE-17338
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


We have only data size and heap overhead being tracked globally.  Off heap 
memstore works with off heap backed MSLAB pool.  But a cell, when added to 
memstore, not always getting copied to MSLAB.  Append/Increment ops doing an 
upsert, dont use MSLAB.  Also based on the Cell size, we sometimes avoid MSLAB 
copy.  But now we track these cell data size also under the global memstore 
data size which indicated off heap size in case of off heap memstore.  For 
global checks for flushes (against lower/upper watermark levels), we check this 
size against max off heap memstore size.  We do check heap overhead against 
global heap memstore size (Defaults to 40% of xmx)  But for such cells the data 
size also should be accounted under the heap overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-15361) Remove unnecessary or Document constraints on BucketCache possible bucket sizes

2016-12-16 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-15361.

Resolution: Duplicate

> Remove unnecessary or Document  constraints on BucketCache possible bucket 
> sizes
> 
>
> Key: HBASE-15361
> URL: https://issues.apache.org/jira/browse/HBASE-15361
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: deepankar
>Priority: Minor
>
> When we were trying to tune the bucket sizes 
> {{hbase.bucketcache.bucket.sizes}} according to our workload, we encountered 
> an issue due to the way offset is stored in the bucket entry. We divide the 
> offset into integer base and byte value and it assumes that all bucket 
> offsets  will be a multiple of 256 (left shifting by 8). See the code below
> {code}
> long offset() { // Java has no unsigned numbers
>   long o = ((long) offsetBase) & 0x;
>   o += (((long) (offset1)) & 0xFF) << 32;
>   return o << 8;
> }
> private void setOffset(long value) {
>   assert (value & 0xFF) == 0;
>   value >>= 8;
>   offsetBase = (int) value;
>   offset1 = (byte) (value >> 32);
> }
> {code}
> This was there to save 3 bytes per BucketEntry instead of using long and when 
> there are no other fields in the Bucket Entry, but now there are lot of 
> fields in the bucket entry , This not documented so we could either document 
> the constraint that it should be a strict 256 bytes multiple of just go away 
> with this constraint.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-17268) Add OffheapMemoryTuner

2016-12-06 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-17268.

Resolution: Duplicate
  Assignee: (was: ramkrishna.s.vasudevan)

Dup of HBASE-17267

> Add OffheapMemoryTuner
> --
>
> Key: HBASE-17268
> URL: https://issues.apache.org/jira/browse/HBASE-17268
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
>
> This JIRA is aimed at tuning the offheap memory. It is not straight forward 
> as we should not cross the available Direct_memory configured.
> Should include BC and offheap memstore configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17204) Make L2 off heap cache default ON

2016-11-30 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17204:
--

 Summary: Make L2 off heap cache default ON
 Key: HBASE-17204
 URL: https://issues.apache.org/jira/browse/HBASE-17204
 Project: HBase
  Issue Type: New Feature
Affects Versions: 2.0.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


L2 cache can be used for data blocks.  By default it is off now. After 
HBASE-11425 work, L2 off heap cache can equally perform with L1 on heap cache. 
On heavy loaded workload, this can even out perform L1 cache.  Pls see recently 
published report by Alibaba.  Also this work was backported by Rocketfuel and 
similar perf improvement report from them too.
Let us turn L2 off heap cache ON. As it is off heap, we can have much larger 
sized L2 BC.  What should be the default size?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-17086) Add comments to explain why Cell#getTagsLength() returns an int, rather than a short

2016-11-28 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-17086.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0

Thanks for the patch Xiang Li

> Add comments to explain why Cell#getTagsLength() returns an int, rather than 
> a short
> 
>
> Key: HBASE-17086
> URL: https://issues.apache.org/jira/browse/HBASE-17086
> Project: HBase
>  Issue Type: Improvement
>  Components:  Interface
>Affects Versions: 2.0.0
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17086.master.000.patch, 
> HBASE-17086.master.001.patch
>
>
> In the Cell interface, getTagsLength() returns a int
> But in the KeyValue implementation, tags length is of 2 bytes. Also in 
> ExtendedCell, when explaining the KeyValue format, tags length is stated to 
> be 2 bytes
> Any plan to update Cell interface to make getTagsLength() returns a short ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17183) Handle ByteBufferCell while making TagRewriteCell

2016-11-28 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17183:
--

 Summary: Handle ByteBufferCell while making TagRewriteCell
 Key: HBASE-17183
 URL: https://issues.apache.org/jira/browse/HBASE-17183
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0
 Attachments: HBASE-17183.patch

TagRewriteCell is the normal ExtendedCell. When it wraps a ByteBufferCell, we 
need a new TagRewriteCell type of type ByteBufferCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-13096) NPE from SecureWALCellCodec$EncryptedKvEncoder#write when using WAL encryption and Phoenix secondary indexes

2016-11-24 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John reopened HBASE-13096:


> NPE from SecureWALCellCodec$EncryptedKvEncoder#write when using WAL 
> encryption and Phoenix secondary indexes
> 
>
> Key: HBASE-13096
> URL: https://issues.apache.org/jira/browse/HBASE-13096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6
>Reporter: Andrew Purtell
>  Labels: phoenix
>
> On user@phoenix Dhavi Rami reported:
> {quote}
> I tried using phoenix in hBase with Transparent Encryption of Data At Rest 
> enabled ( AES encryption) 
> Works fine for a table with primary key column.
> But it doesn't work if I create Secondary index on that tables.I tried to dig 
> deep into the problem and found WAL file encryption throws exception when I 
> have Global Secondary Index created on my mutable table.
> Following is the error I was getting on one of the region server.
> {noformat}
> 2015-02-20 10:44:48,768 ERROR 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: UNEXPECTED
> java.lang.NullPointerException
> at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:767)
> at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:754)
> at org.apache.hadoop.hbase.KeyValue.getKeyLength(KeyValue.java:1253)
> at 
> org.apache.hadoop.hbase.regionserver.wal.SecureWALCellCodec$EncryptedKvEncoder.write(SecureWALCellCodec.java:194)
> at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:117)
> at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncWriter.run(FSHLog.java:1137)
> at java.lang.Thread.run(Thread.java:745)
> 2015-02-20 10:44:48,776 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: 
> regionserver60020-WAL.AsyncWriter exiting
> {noformat}
> I had to disable WAL encryption, and it started working fine with secondary 
> Index. So Hfile encryption works with secondary index but WAL encryption 
> doesn't work.
> {quote}
> Parking this here for later investigation. For now I'm going to assume this 
> is something in SecureWALCellCodec that needs looking at, but if it turns out 
> to be a Phoenix indexer issue I will move this JIRA there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17169) Remove Cell variants with ShareableMemory

2016-11-23 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17169:
--

 Summary: Remove Cell variants with ShareableMemory
 Key: HBASE-17169
 URL: https://issues.apache.org/jira/browse/HBASE-17169
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


As asked by Stack in review comment of other sub tasks of the parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17162) Avoid unconditional call to getXXXArray() in write path

2016-11-22 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17162:
--

 Summary: Avoid unconditional call to getXXXArray() in write path
 Key: HBASE-17162
 URL: https://issues.apache.org/jira/browse/HBASE-17162
 Project: HBase
  Issue Type: Sub-task
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


Still some calls left. Patch will address these areas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17161) MOB : Make ref cell creation more efficient

2016-11-22 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17161:
--

 Summary: MOB : Make ref cell creation more efficient
 Key: HBASE-17161
 URL: https://issues.apache.org/jira/browse/HBASE-17161
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


When we flush MOB data, ref cells are created per actual MOB cell. This creates 
lots of garbage
Refer MobUtils#createMobRefCell
We need to add 2 tags into the ref cell. An ArrayList is created with default 
size creating ref array. Call to CellUtil.getTags will create a new ArrayList 
even if the original cell is having no tags.
A new KV is created which will create a new backing byte[] and do copy.  Also 
along with each of the flush/compaction op, a fresh Tag object is created for 
TableName tag.
Fixes include
1. A table name tag is not going to change per HStore.  Even both the new tags 
to be added.  Will keep a byte[] of these 2 tags at MobHStore level so that all 
flush and compactions in this store can use it.
2. Create a new MobRefCell just like TagRewriteCell where only value and tags 
part will be diff from the original cell.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17139) Remove sweep tool related configs from hbase-default.xml

2016-11-21 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17139:
--

 Summary: Remove sweep tool related configs from hbase-default.xml
 Key: HBASE-17139
 URL: https://issues.apache.org/jira/browse/HBASE-17139
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


Missed doing this in parent jira



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17097) Documentation for max off heap configuration

2016-11-14 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17097:
--

 Summary: Documentation for max off heap configuration
 Key: HBASE-17097
 URL: https://issues.apache.org/jira/browse/HBASE-17097
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Anoop Sam John
 Fix For: 2.0.0


{quote}
# Uncomment below if you intend to use off heap cache. For example, to allocate 
8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G
{quote}
This is what we have in hbase-env.sh now. We need more notes around this and 
some suggestion. May be more details in book also. This depends on whether we 
want L2 off heap BC as the default data cache in 2.0. (I would like to and will 
open a jira soon) Also if users use off heap MSLAB pool, then that also to be 
added up.. 
As of now we will use a ByteBufferPool which pool direct BBs and this is ON by 
default. The max size, this pool can keep is 2 MB *  2 * #handlers.
Will add more details as we turn ON other features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17087) Enable Aliasing for CodedInputStream created by ByteInputByteString#newCodedInput

2016-11-13 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17087:
--

 Summary: Enable Aliasing for CodedInputStream created by 
ByteInputByteString#newCodedInput
 Key: HBASE-17087
 URL: https://issues.apache.org/jira/browse/HBASE-17087
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


Missed setting this while doing HBASE-15789.  We make CIS with 
'bufferIsImmutable' as true but we should do enableAliasing also to avoid copy 
while building PB objects from this new CIS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17073) Increase the max number of buffers in ByteBufferPool

2016-11-10 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17073:
--

 Summary: Increase the max number of buffers in ByteBufferPool
 Key: HBASE-17073
 URL: https://issues.apache.org/jira/browse/HBASE-17073
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


Before the HBASE-15525 issue fix, we had variable sized buffers in our buffer 
pool. The max size upto which one buffer can grow was 2 MB.  Now we have 
changed it to be a fixed sized BBPool. By default 64 KB is the size of each 
buffer.  But the max number of BBs allowed to be in the pool was not changed.  
ie. twice the number of handlers. May be we should be changing increasing it 
now?  To make it equal to the way like 2 MB, we will need 32 * 2 * handlers.  
There is no initial #BBs any way. 2 MB is the default max response size what we 
have. And write reqs also, when it is Buffered mutator 2 MB is the default 
flush limit.  We can make it to be 32 * #handlers as the def max #BBs I believe.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17071) Do not initialize MemstoreChunkPool when use mslab option is turned off

2016-11-10 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17071:
--

 Summary: Do not initialize MemstoreChunkPool when use mslab option 
is turned off
 Key: HBASE-17071
 URL: https://issues.apache.org/jira/browse/HBASE-17071
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 2.0.0


This is a 2.0 only issue and induced by HBASE-16407. 
We are initializing MSLAB chunk pool along with RS start itself now. (To pass 
it as a HeapMemoryTuneObserver).
When MSLAB is turned off  (ie. hbase.hregion.memstore.mslab.enabled is 
configured false) we should not be initializing MSLAB chunk pool at all.  By 
default the initial chunk count to be created will be 0 only.  Still better to 
avoid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16408) Handle On heap BucketCache size when HeapMemoryManager tunes memory

2016-11-07 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-16408.

   Resolution: Not A Problem
 Assignee: (was: Anoop Sam John)
Fix Version/s: (was: 2.0.0)

We are not considering L2 cache at all (even if it is On heap) while memory 
tuning. So not an issue.

> Handle On heap BucketCache size when HeapMemoryManager tunes memory
> ---
>
> Key: HBASE-16408
> URL: https://issues.apache.org/jira/browse/HBASE-16408
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Anoop Sam John
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-15513) hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default

2016-11-07 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John reopened HBASE-15513:


> hbase.hregion.memstore.chunkpool.maxsize is 0.0 by default
> --
>
> Key: HBASE-15513
> URL: https://issues.apache.org/jira/browse/HBASE-15513
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-15513-addendum.patch, HBASE-15513-v1.patch
>
>
> That results in excessive MemStoreLAB chunk allocations because we can not 
> reuse them. Not sure, why it has been disabled, by default. May be the code 
> has not been tested well?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17012) Handle Offheap cells in CompressedKvEncoder

2016-11-03 Thread Anoop Sam John (JIRA)
Anoop Sam John created HBASE-17012:
--

 Summary: Handle Offheap cells in CompressedKvEncoder
 Key: HBASE-17012
 URL: https://issues.apache.org/jira/browse/HBASE-17012
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Anoop Sam John
 Fix For: 2.0.0


When we deal with off heap cells we will end up copying Cell components on heap
{code}
public void write(Cell cell) throws IOException {
.
  write(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength(), 
compression.rowDict);
  write(cell.getFamilyArray(), cell.getFamilyOffset(), 
cell.getFamilyLength(),
  compression.familyDict);
  write(cell.getQualifierArray(), cell.getQualifierOffset(), 
cell.getQualifierLength(),
  compression.qualifierDict);
..
  out.write(cell.getValueArray(), cell.getValueOffset(), 
cell.getValueLength());
...
{code}
We need to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-16747) Track memstore data size and heap overhead separately

2016-11-01 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-16747.

Resolution: Fixed

> Track memstore data size and heap overhead separately 
> --
>
> Key: HBASE-16747
> URL: https://issues.apache.org/jira/browse/HBASE-16747
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-16747.patch, HBASE-16747.patch, 
> HBASE-16747_V2.patch, HBASE-16747_V2.patch, HBASE-16747_V3.patch, 
> HBASE-16747_V3.patch, HBASE-16747_V3.patch, HBASE-16747_V4.patch, 
> HBASE-16747_WIP.patch, HBASE-16747_addendum.patch
>
>
> We track the memstore size in 3 places.
> 1. Global at RS level in RegionServerAccounting. This tracks all memstore's 
> size and used to calculate whether forced flushes needed because of global 
> heap pressure
> 2. At region level in HRegion. This is sum of sizes of all memstores within 
> this region. This is used to decide whether region reaches flush size (128 MB)
> 3. Segment level. This tracks the in memory flush/compaction decisions.
> All these use the Cell's heap size which include the data bytes# as well as 
> Cell object heap overhead.  Also we include the overhead because of addition 
> of Cells into Segment's data structures (Like CSLM).
> Once we have off heap memstore, we will keep the cell data bytes in off heap 
> area. So we can not track both data size and heap overhead as one entity. We 
> need to separate them and track.
> Proposal here is to track both cell data size and heap overhead separately at 
> global accounting layer.  As of now we have only on heap memstore. So the 
> global memstore boundary checks will consider both (adds up and check against 
> global max memstore size)
> Track cell data size alone (This can be on heap or off heap) in region level. 
>  Region flushes use cell data size alone for the region flush decision. A 
> user configuring 128 MB as flush size, normally he will expect to get a 128MB 
> data flush size. But as we were including the heap overhead also, once the 
> flush happens, the actual data size getting flushed is way behind this 128 
> MB.  Now with this change we will behave more like what a user thinks.
> Segment level in memory flush/compaction also considers cell data size alone. 
>  But we will need to track the heap overhead also. (Once the in memory flush 
> or normal flush happens, we will have to adjust both cell data size and heap 
> overhead)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-10713) A MemStore implementation with in memory flushes to CellBlocks

2016-10-20 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-10713.

Resolution: Duplicate
  Assignee: (was: Anoop Sam John)

Closing as a dup of HBASE-14918.  There we are doing conceptually the same..  
Instead of CellBlock as flat byte[], we have Segments there which can be cell 
array or cell chunk with index.  HBASE-14918 is almost getting to its closure. 

> A MemStore implementation with in memory flushes to CellBlocks
> --
>
> Key: HBASE-10713
> URL: https://issues.apache.org/jira/browse/HBASE-10713
> Project: HBase
>  Issue Type: New Feature
>Reporter: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-10713_WIP.patch
>
>
> After HBASE-10648 we can plugin any implementation for MemStore. This issue 
> aims at coming up with an implementation, in which we will have in between in 
> memory flushes. This will reduce the need to keep lots of KVs in heap as well 
> as in CSLM.  CSLM perform poor when no# items in it increases.  We can keep 
> create CellBlocks (contigous byte[] like HFile block) out of KVs and keep it 
> as one object rather than many KVs.  At some point in time, MemStore might 
> have N CellBlocks and one CSLM.  
> These in memory CellBlocks can be compacted to one bigger block in between. 
> We can target that in follow on tasks once the basic code is ready.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   >