[jira] [Created] (HBASE-18380) Implement async RSGroup admin based on the async admin

2017-07-13 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-18380:
--

 Summary: Implement async RSGroup admin based on the async admin
 Key: HBASE-18380
 URL: https://issues.apache.org/jira/browse/HBASE-18380
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


Now the RSGroup admin client get a blocking stub based on the blocking admin's 
coprocessor service. As we add coprocessor service support in async admin. So 
we can implement a new async RSGroup admin client based on the new async admin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Exception in HBase get

2017-07-13 Thread mukund murrali
Hi

Few more pointers in the issue. Let's say there are two cells with column
keys as A and B for same row. These two cells are present in two different
files. Cell B is deleted recently. While scanning StoreFileScanner has read
B first then reads A. Though the lexographical sorting is preserved in
individual files why does the scanner reads deleted cell B first? This is
the issue causing problem in reads. Because of this, we have commented the
else part which throws IllegalStateException. Will this commenting cause
any issues(though no issues till now). Please someone enlighten more on
this issue please. We are using 1.2.5 stable version

On Mon, 10 Jul 2017 at 7:29 PM, mukund murrali 
wrote:

> +dev
>
> Regards,
>
> Mukund Murrali
>
> On Mon, Jul 10, 2017 at 1:17 PM, mukund murrali 
> wrote:
>
>> I know the row key. So in what way it will be helpful in analyzing this
>> issue?
>>
>> Regards,
>>
>> Mukund Murrali
>>
>> On Mon, Jul 10, 2017 at 5:32 AM, Ted Yu  wrote:
>>
>>> Can you find the hfile where this exception happens when doing get /
>>> scan ?
>>>
>>> Unfortunately the log didn't contain row key.
>>> Here is small change which would log row key:
>>>
>>> https://pastebin.com/XU5hCLXq
>>>
>>> Cheers
>>>
>>> On Thu, Jul 6, 2017 at 10:17 PM, Graceline Abigail Prem Kumar <
>>> pgabigai...@gmail.com> wrote:
>>>
>>> > Hi
>>> >
>>> > We are currently used Hbase 1.2.5. This exception occurs frequently,
>>> both
>>> > on the server and on the client side.
>>> >
>>> > Regards,
>>> > Graceline Abigail P
>>> >
>>> > On Thu, Jul 6, 2017 at 11:08 AM, Graceline Abigail Prem Kumar <
>>> > pgabigai...@gmail.com> wrote:
>>> >
>>> > > Hi
>>> > >
>>> > > We have got an IllegalStateException during HBase get operation. And
>>> > we're
>>> > > not sure about the cause. Here is the exception trace. What could the
>>> > > problem be?
>>> > >
>>> > > Caused by: java.lang.IllegalStateException: isDelete failed:
>>> > > deleteBuffer=22, qualifier=21, timestamp=1487055525513, comparison
>>> > result:
>>> > > 1
>>> > > at
>>> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(
>>> > > ScanDeleteTracker.java:147)
>>> > > at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.
>>> > > match(ScanQueryMatcher.java:395)
>>> > > at org.apache.hadoop.hbase.regionserver.StoreScanner.
>>> > > next(StoreScanner.java:529)
>>> > > at org.apache.hadoop.hbase.regionserver.KeyValueHeap.
>>> > > next(KeyValueHeap.java:150)
>>> > > at
>>> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.
>>> > > populateResult(HRegion.java:5731)
>>> > > at
>>> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.
>>> > > nextInternal(HRegion.java:5894)
>>> > > at org.apache.hadoop.hbase.regionserver.HRegion$
>>> > > RegionScannerImpl.nextRaw(HRegion.java:5668)
>>> > > at org.apache.hadoop.hbase.regionserver.HRegion$
>>> > > RegionScannerImpl.next(HRegion.java:5645)
>>> > > at org.apache.hadoop.hbase.regionserver.HRegion$
>>> > > RegionScannerImpl.next(HRegion.java:5631)
>>> > > at org.apache.hadoop.hbase.regionserver.HRegion.get(
>>> > > HRegion.java:6829)
>>> > > at org.apache.hadoop.hbase.regionserver.HRegion.get(
>>> > > HRegion.java:6807)
>>> > > at org.apache.hadoop.hbase.regionserver.RSRpcServices.
>>> > > get(RSRpcServices.java:2049)
>>> > > at org.apache.hadoop.hbase.protobuf.generated.
>>> > >
>>> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
>>> > > at
>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2188)
>>> > > ... 4 more
>>> > >
>>> > > Mon Jul 03 04:34:07 PDT 2017, RpcRetryingCaller{
>>> > globalStartTime=1499081647183,
>>> > > pause=100, retries=35}, java.io.IOException: java.io.IOException:
>>> > isDelete
>>> > > failed: deleteBuffer=22, qualifier=21, timestamp=1487055525513,
>>> > comparison
>>> > > result: 1
>>> > > at
>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2239)
>>> > > at org.apache.hadoop.hbase.ipc.Ca
>>> llRunner.run(CallRunner.java:112)
>>> > > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
>>> > > RpcExecutor.java:133)
>>> > > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
>>> > > java:108)
>>> > > at java.lang.Thread.run(Thread.java:745)
>>> > >
>>> > > Regards,
>>> > > Graceline Abigail P
>>> > >
>>> >
>>>
>>
>>
> --
Regards,

Mukund Murrali


[jira] [Created] (HBASE-18379) SnapshotManager#checkSnapshotSupport() should better handle malfunctioning hdfs snapshot

2017-07-13 Thread Ted Yu (JIRA)
Ted Yu created HBASE-18379:
--

 Summary: SnapshotManager#checkSnapshotSupport() should better 
handle malfunctioning hdfs snapshot
 Key: HBASE-18379
 URL: https://issues.apache.org/jira/browse/HBASE-18379
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


The following was observed by a customer which prevented master from coming up:
{code}
2017-07-13 13:25:07,898 FATAL [xyz:16000.activeMasterManager] master.HMaster: 
Failed to become active master
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: Daily_Snapshot_Apps_2017-xx
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.(Path.java:171)
at org.apache.hadoop.fs.Path.(Path.java:93)
at 
org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:230)
at 
org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:263)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:911)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:113)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:966)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:962)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:962)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1534)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1574)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.getCompletedSnapshots(SnapshotManager.java:206)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.checkSnapshotSupport(SnapshotManager.java:1011)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.initialize(SnapshotManager.java:1070)
at 
org.apache.hadoop.hbase.procedure.MasterProcedureManagerHost.initialize(MasterProcedureManagerHost.java:50)
at 
org.apache.hadoop.hbase.master.HMaster.initializeZKBasedSystemTrackers(HMaster.java:667)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:732)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:213)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1863)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
Daily_Snapshot_Apps_2017-xx
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
{code}
Turns out the exception can be reproduced using hdfs command line accessing 
.snapshot directory.

SnapshotManager#checkSnapshotSupport() should better handle malfunctioning hdfs 
snapshot so that master starts up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18378) Cloning configuration contained in CoprocessorEnvironment doesn't work

2017-07-13 Thread Samarth Jain (JIRA)
Samarth Jain created HBASE-18378:


 Summary: Cloning configuration contained in CoprocessorEnvironment 
doesn't work
 Key: HBASE-18378
 URL: https://issues.apache.org/jira/browse/HBASE-18378
 Project: HBase
  Issue Type: Bug
Reporter: Samarth Jain


In our phoenix co-processors, we need to clone configuration passed in 
CoprocessorEnvironment.
However, using the copy constructor declared in it's parent class, 
Configuration, doesn't copy over anything.

For example:
{code}
CorpocessorEnvironment e
Configuration original = e.getConfiguration();
Configuration clone = new Configuration(original);
clone.get(HConstants.ZK_SESSION_TIMEOUT) -> returns null
e.configuration.get(HConstants.ZK_SEESION_TIMEOUT) -> returns 
HConstants.DEFAULT_ZK_SESSION_TIMEOUT
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18377) Error handling for FileNotFoundException should consider RemoteException in ReplicationSource#openReader()

2017-07-13 Thread Ted Yu (JIRA)
Ted Yu created HBASE-18377:
--

 Summary: Error handling for FileNotFoundException should consider 
RemoteException in ReplicationSource#openReader()
 Key: HBASE-18377
 URL: https://issues.apache.org/jira/browse/HBASE-18377
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


In region server log, I observed the following:
{code}
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does 
not exist: /apps/hbase/data/WALs/lx.p.com,16020,1497300923131/497300923131. 
default.1497302873178
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
  at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1860)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831)
  at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744)
...
  at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:326)
  at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162)
  at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:782)
  at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293)
  at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
  at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:255)
  at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:414)
  at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:69)
  at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:605)
  at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:364)
{code}
We have code in ReplicationSource#openReader() which is supposed to handle 
FileNotFoundException but RemoteException wrapping FileNotFoundException was 
missed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-11249) Missing null check in finally block of HRegion#processRowsWithLocks() may lead to partially rolled back state in memstore.

2017-07-13 Thread Mike Drob (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved HBASE-11249.
---
Resolution: Not A Problem

This code was rewritten in HBASE-15158, doesn't look like a problem anymore.

> Missing null check in finally block of HRegion#processRowsWithLocks() may 
> lead to partially rolled back state in memstore.
> --
>
> Key: HBASE-11249
> URL: https://issues.apache.org/jira/browse/HBASE-11249
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> At line 4883:
> {code}
> Store store = getStore(kv);
> if (store == null) {
>   checkFamily(CellUtil.cloneFamily(kv));
>   // unreachable
> }
> {code}
> Exception would be thrown from checkFamily() if store is null.
> In the finally block:
> {code}
>   } finally {
> if (!mutations.isEmpty() && !walSyncSuccessful) {
>   LOG.warn("Wal sync failed. Roll back " + mutations.size() +
>   " memstore keyvalues for row(s):" + StringUtils.byteToHexString(
>   processor.getRowsToLock().iterator().next()) + "...");
>   for (KeyValue kv : mutations) {
> getStore(kv).rollback(kv);
>   }
> {code}
> There is no corresponding null check for return value of getStore() above, 
> potentially leading to partially rolled back state in memstore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18376) Flaky exclusion doesn't appear to work in precommit

2017-07-13 Thread Sean Busbey (JIRA)
Sean Busbey created HBASE-18376:
---

 Summary: Flaky exclusion doesn't appear to work in precommit
 Key: HBASE-18376
 URL: https://issues.apache.org/jira/browse/HBASE-18376
 Project: HBase
  Issue Type: Bug
  Components: community, test
Reporter: Sean Busbey
Priority: Critical


Yesterday we started defaulting the precommit parameter for the flaky test list 
to point to the job on builds.a.o. Looks like the personality is ignoring it.

example build that's marked to keep:

https://builds.apache.org/job/PreCommit-HBASE-Build/7646/

(search for 'Running unit tests' to skip to the right part of the console')

should add some more debug output in there too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-18375) The pool chunks from ChunkCreator are deallocated while in pool because there is no reference to them

2017-07-13 Thread Anastasia Braginsky (JIRA)
Anastasia Braginsky created HBASE-18375:
---

 Summary: The pool chunks from ChunkCreator are deallocated while 
in pool because there is no reference to them
 Key: HBASE-18375
 URL: https://issues.apache.org/jira/browse/HBASE-18375
 Project: HBase
  Issue Type: Sub-task
Reporter: Anastasia Braginsky


Because MSLAB list of chunks was changed to list of chunk IDs, the chunks 
returned back to pool can be deallocated by JVM because there is no reference 
to them. The solution is to protect pool chunks from GC by the strong map of 
ChunkCreator introduced by HBASE-18010. Will prepare the patch today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)