[jira] [Created] (HBASE-21391) RefreshPeerProcedure should also wait master initialized before executing

2018-10-25 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21391:
-

 Summary: RefreshPeerProcedure should also wait master initialized 
before executing
 Key: HBASE-21391
 URL: https://issues.apache.org/jira/browse/HBASE-21391
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Duo Zhang
 Fix For: 3.0.0, 2.2.0, 2.1.2


Missed this one when introducing the waitInitialized method in Procedure, and 
found when implementing HBASE-21389.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21390) [test] TestRpcAccessChecks is buggy

2018-10-25 Thread Reid Chan (JIRA)
Reid Chan created HBASE-21390:
-

 Summary: [test] TestRpcAccessChecks is buggy
 Key: HBASE-21390
 URL: https://issues.apache.org/jira/browse/HBASE-21390
 Project: HBase
  Issue Type: Improvement
Reporter: Reid Chan
Assignee: Reid Chan


TestRpcAccessChecks is buggy.
>From setup() we know, USER_ADMIN is only granted ADMIN action, but 
>testTableFlush() and testTableFlushAndSnapshot() require CREATE action which 
>USER_ADMIN doesn't have.
Both tests should fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Dealing with an unassigned hbase:namespace

2018-10-25 Thread Stack
On Thu, Oct 25, 2018 at 4:50 PM Josh Elser  wrote:

> Stack just hit me up in Slack. He suggested wiping the
> data/hbase/namespace and the relevant rows in hbase:meta.
>
> Fine "fix" for me given I didn't have anything in there besides the
> default ns.
>
>
2.0.3 should allow us do better. It has support for the fixit tool hbck2.
It can add in assigns if for whatever reason, there are vital ones missing,
even on startup (Our Josh is digging in on why namespace lost an assign).

Thanks,
S


> On 10/25/18 5:47 PM, Josh Elser wrote:
> > Hey Stephen,
> >
> > Thanks for the reply.
> >
> > I'm not concerned about the cause at this point as I haven't done
> > due-diligence yet. I am looking for any tips on how to fix this by hand
> > (prior to hbck2).
> >
> > I thought I could trick the master into assigning it by messing with the
> > row in meta, but I've not been successful yet. I fear that the master
> > won't do anything unless there's an SCP which I can't easily create.
> >
> > It sucks because we're sinking all Master operations when it's just NS
> > operations that are blocked. I thought we had fixed this way back when,
> > but it seems like we're no better off than we were in early 1.x releases.
> >
> > On 10/25/18 1:45 PM, Tak-Lon (Stephen) Wu wrote:
> >> Not sure if this is related to
> >> https://issues.apache.org/jira/browse/HBASE-20671
> >>
> >> if so, and if you have HBASE-20702 on your branch, when a new cluster
> >> that
> >> master starts on the
> >> same root directory (my case is on root directory is on S3),
> >> hbase:namespace table cannot be assigned.
> >> (Sorry that I don't have time to work on the solution, but I locally
> >> removed patch of HBASE-20702)
> >>
> >> -Stephen
> >>
> >> On Thu, Oct 25, 2018 at 8:12 AM Josh Elser  wrote:
> >>
> >>> I have a cluster on branch-2.0 from a week or two ago where
> >>> hbase:namespace is not assigned (haven't figured out why yet), I can't
> >>> use any of the normal assign/move shell commands because the Master is
> >>> useless as it's not initialized (because hbase:namespace is
> >>> unreachable), but I can't figure out how I can get the Master to
> realize
> >>> it needs to assign it.
> >>>
> >>> Do we have this written down somewhere already?
> >>>
> >>> Thanks in advance.
> >>>
> >>> - Josh
> >>>
> >>
> >>
>


[jira] [Created] (HBASE-21389) Revisit the procedure lock for sync replication

2018-10-25 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-21389:
-

 Summary: Revisit the procedure lock for sync replication
 Key: HBASE-21389
 URL: https://issues.apache.org/jira/browse/HBASE-21389
 Project: HBase
  Issue Type: Sub-task
  Components: proc-v2, Replication
Reporter: Duo Zhang
 Fix For: 3.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21388) No need to instantiate MemStore for master which not carry table

2018-10-25 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-21388:
--

 Summary: No need to instantiate MemStore for master which not 
carry table
 Key: HBASE-21388
 URL: https://issues.apache.org/jira/browse/HBASE-21388
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


We found this log in our master.

2018-10-26,10:00:00,449 INFO 
[master/c4-hadoop-tst-ct16:42900:becomeActiveMaster] 
org.apache.hadoop.hbase.regionserver.ChunkCreator: Allocating data 
MemStoreChunkPool with chunk size 2 MB, max count 737, initial count 0
2018-10-26,10:00:00,452 INFO 
[master/c4-hadoop-tst-ct16:42900:becomeActiveMaster] 
org.apache.hadoop.hbase.regionserver.ChunkCreator: Allocating index 
MemStoreChunkPool with chunk size 204.80 KB, max count 819, initial count 0

 

Same with HBASE-21290, we don't need to instantiate MemStore for master which 
not carry table.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Dealing with an unassigned hbase:namespace

2018-10-25 Thread Josh Elser
Stack just hit me up in Slack. He suggested wiping the 
data/hbase/namespace and the relevant rows in hbase:meta.


Fine "fix" for me given I didn't have anything in there besides the 
default ns.


On 10/25/18 5:47 PM, Josh Elser wrote:

Hey Stephen,

Thanks for the reply.

I'm not concerned about the cause at this point as I haven't done 
due-diligence yet. I am looking for any tips on how to fix this by hand 
(prior to hbck2).


I thought I could trick the master into assigning it by messing with the 
row in meta, but I've not been successful yet. I fear that the master 
won't do anything unless there's an SCP which I can't easily create.


It sucks because we're sinking all Master operations when it's just NS 
operations that are blocked. I thought we had fixed this way back when, 
but it seems like we're no better off than we were in early 1.x releases.


On 10/25/18 1:45 PM, Tak-Lon (Stephen) Wu wrote:

Not sure if this is related to
https://issues.apache.org/jira/browse/HBASE-20671

if so, and if you have HBASE-20702 on your branch, when a new cluster 
that

master starts on the
same root directory (my case is on root directory is on S3),
hbase:namespace table cannot be assigned.
(Sorry that I don't have time to work on the solution, but I locally
removed patch of HBASE-20702)

-Stephen

On Thu, Oct 25, 2018 at 8:12 AM Josh Elser  wrote:


I have a cluster on branch-2.0 from a week or two ago where
hbase:namespace is not assigned (haven't figured out why yet), I can't
use any of the normal assign/move shell commands because the Master is
useless as it's not initialized (because hbase:namespace is
unreachable), but I can't figure out how I can get the Master to realize
it needs to assign it.

Do we have this written down somewhere already?

Thanks in advance.

- Josh






[jira] [Resolved] (HBASE-9559) getRowKeyAtOrBefore may be incorrect for some cases

2018-10-25 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-9559.
--
Resolution: Won't Fix

We don't use getRowAtOrBefore anymore. Removed in hbase-2.

> getRowKeyAtOrBefore may be incorrect for some cases
> ---
>
> Key: HBASE-9559
> URL: https://issues.apache.org/jira/browse/HBASE-9559
> Project: HBase
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Minor
>
> See also HBASE-9503. Unless I'm missing something, getRowKeyAtOrBefore does 
> not handle cross-file deletes correctly. It also doesn't handle timestamps 
> between two candidates of the same row if they are in different file (latest 
> by ts is going to be returned).
> It is only used for meta, so it might be working due to low update rate, lack 
> of anomalies and the fact that row values in meta are reasonably persistent, 
> new ones are only added in split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-7044) verifyRegionLocation in CatalogTracker.java didn't check if regionserver is in the cluster

2018-10-25 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-7044.
--
Resolution: Won't Fix

Resolving old issue w/ no progress.

> verifyRegionLocation in CatalogTracker.java didn't check if  regionserver is 
> in the cluster
> ---
>
> Key: HBASE-7044
> URL: https://issues.apache.org/jira/browse/HBASE-7044
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: wonderyl
>Priority: Major
>
> at the beginning there is 1 whole hbase cluster, then I decide to split is 
> into 2 cluster, one is for offline mining, one is for online service, and the 
> online one is striped, the offline one contains the original master.
> unfortunately, the META of the original cluster is assigned to the machine 
> stripped, and as there is a cache policy for META, the offline cluster is 
> still access the META of the stripped one.
> after inspected the code, I found that in verifyRegionLocation of 
> CatalogTracker.java, although it checks if the region server still contains 
> the region, but it didn't check if the regions erver is still in the cluster 
> which is very easy, just inspect if it is registered int zk.
> all in all, I have to shutdown the online cluster and restart the offline 
> one, then the META is re-assgined. then everything is back to normal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-7959) HBCK skips regions that have been recently modified, which then leads to it to report holes in region chain

2018-10-25 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-7959.
--
Resolution: Won't Fix

Resolving old issue that has had no work done.

> HBCK skips regions that have been recently modified, which then leads to it 
> to report holes in region chain
> ---
>
> Key: HBASE-7959
> URL: https://issues.apache.org/jira/browse/HBASE-7959
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: Enis Soztutar
>Priority: Major
>
> While lots of region splits going on, HBCK incorrectly reports 
> inconsistencies since it skips recently modified, but does not take those 
> into account for computing the region chain. 
> {code}
> 13/02/28 03:33:16 WARN util.HBaseFsck: Region { meta => 
> cluster_test,,1362021481742.69639761fdf693ab1e2bf33f523cd1ae., hdfs => 
> NN:8020/apps/hbase-trunk/data/cluster_test/69639761fdf693ab1e2bf33f523cd1ae, 
> deployed =>  } was recently modified -- skipping
> 13/02/28 03:33:16 DEBUG util.HBaseFsck: There are 23 region info entries
> ERROR: (region 
> cluster_test,0ccc,1362021481742.ec3ba583b4ea01393591572bf1f31e07.) First 
> region should start with an empty key.  You need to  create a new region and 
> regioninfo in HDFS to plug the hole.
> ERROR: Found inconsistency in table cluster_test
> Summary:
>   -ROOT- is okay.
> Number of regions: 1
> Deployed on:  RSs
>   .META. is okay.
> Number of regions: 1
> Deployed on:  RSs
> Table cluster_test is inconsistent.
> Number of regions: 19
> Deployed on:  RSs
> 1 inconsistencies detected.
> Status: INCONSISTENT
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-13535) Regions go unassigned when meta-carrying RS is killed

2018-10-25 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-13535.
---
Resolution: Invalid

No DLR anymore. Assignment is different now.

> Regions go unassigned when meta-carrying RS is killed
> -
>
> Key: HBASE-13535
> URL: https://issues.apache.org/jira/browse/HBASE-13535
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> hbase-1.1 will be the first release with DLR on by default. I've been running 
>  ITBLLs on a cluster trying to find issues with DLR. My first few runs ran 
> nicely... but the current run failed complaining regions are not online and 
> indeed recovery is stuck making no progress.
> Upon examination, it looks to be an assignment rather than DLR issue. A 
> server carring meta has its meta log replayed first but we are seemingly 
> failing to assign regions after meta is back online.
> Meantime, my regionserver logs are filling with spewing complaint that 
> regions are not online (we should dampen our logging of region not being 
> online... ) and then the split log workers are stuck:
> {code}
> Thread 13206 (RS_LOG_REPLAY_OPS-c2021:16020-1-Writer-2):
>   State: TIMED_WAITING
>   Blocked count: 45
>   Waited count: 59
>   Stack:
> java.lang.Thread.sleep(Native Method)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.waitUntilRegionOnline(WALSplitter.java:1959)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.locateRegionAndRefreshLastFlushedSequenceId(WALSplitter.java:1857)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.groupEditsByServer(WALSplitter.java:1761)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.append(WALSplitter.java:1674)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1104)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1096)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1066)
> Thread 13205 (RS_LOG_REPLAY_OPS-c2021:16020-1-Writer-1):
>   State: TIMED_WAITING
>   Blocked count: 45
>   Waited count: 59
>   Stack:
> java.lang.Thread.sleep(Native Method)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.waitUntilRegionOnline(WALSplitter.java:1959)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.locateRegionAndRefreshLastFlushedSequenceId(WALSplitter.java:1857)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.groupEditsByServer(WALSplitter.java:1761)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.append(WALSplitter.java:1674)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1104)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1096)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1066)
> Thread 13204 (RS_LOG_REPLAY_OPS-c2021:16020-1-Writer-0):
>   State: TIMED_WAITING
>   Blocked count: 50
>   Waited count: 63
>   Stack:
> java.lang.Thread.sleep(Native Method)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.waitUntilRegionOnline(WALSplitter.java:1959)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.locateRegionAndRefreshLastFlushedSequenceId(WALSplitter.java:1857)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.groupEditsByServer(WALSplitter.java:1761)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$LogReplayOutputSink.append(WALSplitter.java:1674)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1104)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1096)
> 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1066)
> {code}
> ...complaining that:
> 2015-04-22 21:28:02,746 DEBUG [RS_LOG_REPLAY_OPS-c2021:16020-1] 
> wal.WALSplitter: Used 134248328 bytes of buffered edits, waiting for IO 
> threads...
> The accounting seems off around here in SSH where it is moving regions that 
> were on dead server to OFFLINE but is reporting no regions to assign:
> {code}
> 143320 2015-04-21 17:05:07,571 INFO  [MASTER_SERVER_OPERATIONS-c2020:16000-0] 
> handler.ServerShutdownHandler: Mark regions in recovery for crashed server 
> c2024.halxg.cloudera.com,16020,1429660802192 before assignment; regions=[]
> 143321 2015-04-21 17:05:07,572 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16000-0] 
> master.RegionStates: Adding to processed servers 
> c2024.halxg.cloudera.com,16020,1429660802192
> 143322 2015-04-21 17:05:07,575 INFO  [MASTER_SERVER_OPERATIONS-c2020:16000-0] 
> master.RegionStates: Transition {8d63312bc39a39727afea627bb20fee4 

[jira] [Resolved] (HBASE-17801) Assigning dead region causing FAILED_OPEN permanent RIT that needs manual resolve

2018-10-25 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-17801.
---
Resolution: Cannot Reproduce

Resolving as 'cannot reproduce'

> Assigning dead region causing FAILED_OPEN permanent RIT that needs manual 
> resolve 
> --
>
> Key: HBASE-17801
> URL: https://issues.apache.org/jira/browse/HBASE-17801
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.1.2
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
>Priority: Critical
>
> In Apache 1.x, there is a Assignment Manager bug when SSH and drop table 
> happens at the same time.  Here is the sequence:
> (1). The Region Server hosting the target region is dead, SSH (server 
> shutdown handler) offlined all regions hosted by the RS: 
> {noformat}
> 2017-02-20 20:39:25,022 ERROR 
> org.apache.hadoop.hbase.master.MasterRpcServices: Region server 
> rs01.foo.com,60020,1486760911253 reported a fatal error:
> ABORTING region server rs01.foo.com,60020,1486760911253: 
> regionserver:60020-0x55a076071923f5f, 
> quorum=zk01.foo.com:2181,zk02.foo.com:2181,zk3.foo.com:2181, baseZNode=/hbase 
> regionserver:60020-0x1234567890abcdf received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:613)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:524)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> 2017-02-20 20:42:43,775 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
> for rs01.foo.com,60020,1486760911253 before assignment; region count=999
> 2017-02-20 20:43:31,784 INFO org.apache.hadoop.hbase.master.RegionStates: 
> Transition {783a4814b862a6e23a3265a874c3048b state=OPEN, ts=1487568368296, 
> server=rs01.foo.com,60020,1486760911253} to {783a4814b862a6e23a3265a874c3048b 
> state=OFFLINE, ts=1487648611784, server=rs01.foo.com,60020,1486760911253}
> {noformat}
> (2). Now SSH goes through each region and check whether it should be 
> re-assigned (at this time, SSH do check whether a table is disabled/deleted). 
>  If a region needs to be re-assigned, it would put into a list.  Since at 
> this time, the troubled region is still on the table that is enabled, it will 
> be in the list.
> {noformat}
> 2017-02-20 20:43:31,795 INFO 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 999 
> region(s) that rs01.foo.com,60020,1486760911253 was carrying (and 0 
> regions(s) that were opening on this server)
> {noformat}
> (3). Now, disable and delete table come in and also try to offline the 
> region; since the region is already offlined, the deleted table just removes 
> the region from meta and in-memory.
> {noformat}
> 2017-02-20 20:43:32,429 INFO org.apache.hadoop.hbase.master.HMaster: 
> Client=b_kylin/null disable t1
> 2017-02-20 20:43:34,275 INFO 
> org.apache.hadoop.hbase.zookeeper.ZKTableStateManager: Moving table t1 state 
> from DISABLING to DISABLED
> 2017-02-20 20:43:34,276 INFO 
> org.apache.hadoop.hbase.master.procedure.DisableTableProcedure: Disabled 
> table, t1, is completed.
> 2017-02-20 20:43:35,624 INFO org.apache.hadoop.hbase.master.HMaster: 
> Client=b_kylin/null delete t1
> 2017-02-20 20:43:36,011 INFO org.apache.hadoop.hbase.MetaTableAccessor: 
> Deleted [{ENCODED => fbf9fda1381636aa5b3cd6e3fe0f6c1e, NAME => 
> 't1,,1487568367030.fbf9fda1381636aa5b3cd6e3fe0f6c1e.', STARTKEY => '', ENDKEY 
> => '\x00\x01'}, {ENCODED => 783a4814b862a6e23a3265a874c3048b, NAME => 
> 't1,\x00\x01,1487568367030.783a4814b862a6e23a3265a874c3048b.', STARTKEY => 
> '\x00\x01', ENDKEY => ''}]
> {noformat}
> (4). However, SSH calls Assignment Manager to reassign the dead region (note 
> that the dead region is in the re-assign list SSH collected and we don't 
> re-check again)
> {noformat}
> 2017-02-20 20:43:52,725 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning but not in region 
> states: {ENCODED => 783a4814b862a6e23a3265a874c3048b, NAME => 
> 't1,\x00\x01,1487568367030.783a4814b862a6e23a3265a874c3048b.', STARTKEY => 
> '\x00\x01', ENDKEY => ''}
> {noformat}
> (5).  In the region server that the dead region tries to land, because the 
> table is dropped, we could not open region and now the dead region is in 
> FAILED_OPEN, which is in permanent RIT state. 
> {noformat}
> 2017-02-20 20:43:52,861 INFO 
> org.apache.hadoop.hbase.regionserver.RSRpcServices: Open 
> t1,\x00\x01,1487568367030.783a4814b86

[jira] [Resolved] (HBASE-6184) HRegionInfo was null or empty in Meta

2018-10-25 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-6184.
--
Resolution: Cannot Reproduce

Resolving old issue that we don't see anymore as 'cannot reproduce'

> HRegionInfo was null or empty in Meta 
> --
>
> Key: HBASE-6184
> URL: https://issues.apache.org/jira/browse/HBASE-6184
> Project: HBase
>  Issue Type: Bug
>  Components: Client, io
>Affects Versions: 0.94.0
>Reporter: jiafeng.zhang
>Priority: Major
> Attachments: HBASE-6184.patch
>
>
> insert data
> hadoop-0.23.2 + hbase-0.94.0
> 2012-06-07 13:09:38,573 WARN  
> [org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation] 
> Encountered problems when prefetch META table: 
> java.io.IOException: HRegionInfo was null or empty in Meta for hbase_one_col, 
> row=hbase_one_col,09115303780247449149,99
> at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160)
> at 
> org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:48)
> at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:126)
> at 
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:123)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:359)
> at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:123)
> at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:99)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:894)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:948)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:836)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1482)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1367)
> at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:945)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:801)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:776)
> at 
> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
> at com.dinglicom.hbase.HbaseImport.insertData(HbaseImport.java:177)
> at com.dinglicom.hbase.HbaseImport.run(HbaseImport.java:210)
> at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-15355) region.jsp can not be found on info server of master

2018-10-25 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-15355.
---
Resolution: Won't Fix

This issue became invalid after we ruled out master hosting Regions (See above 
for agreement by Jianwei).

> region.jsp can not be found on info server of master
> 
>
> Key: HBASE-15355
> URL: https://issues.apache.org/jira/browse/HBASE-15355
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>Priority: Minor
>
> After [HBASE-10569|https://issues.apache.org/jira/browse/HBASE-10569], master 
> is also a regionserver and it will serve regions of system tables. The meta 
> region info could be viewed on master at the address such as : 
> http://localhost:16010/region.jsp?name=1588230740. The real path of 
> region.jsp for the request will be hbase-webapps/master/region.jsp on master, 
> however, the region.jsp is under the directory hbase-webapps/regionserver, so 
> that can not be found on master.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Dealing with an unassigned hbase:namespace

2018-10-25 Thread Josh Elser

Hey Stephen,

Thanks for the reply.

I'm not concerned about the cause at this point as I haven't done 
due-diligence yet. I am looking for any tips on how to fix this by hand 
(prior to hbck2).


I thought I could trick the master into assigning it by messing with the 
row in meta, but I've not been successful yet. I fear that the master 
won't do anything unless there's an SCP which I can't easily create.


It sucks because we're sinking all Master operations when it's just NS 
operations that are blocked. I thought we had fixed this way back when, 
but it seems like we're no better off than we were in early 1.x releases.


On 10/25/18 1:45 PM, Tak-Lon (Stephen) Wu wrote:

Not sure if this is related to
https://issues.apache.org/jira/browse/HBASE-20671

if so, and if you have HBASE-20702 on your branch, when a new cluster that
master starts on the
same root directory (my case is on root directory is on S3),
hbase:namespace table cannot be assigned.
(Sorry that I don't have time to work on the solution, but I locally
removed patch of HBASE-20702)

-Stephen

On Thu, Oct 25, 2018 at 8:12 AM Josh Elser  wrote:


I have a cluster on branch-2.0 from a week or two ago where
hbase:namespace is not assigned (haven't figured out why yet), I can't
use any of the normal assign/move shell commands because the Master is
useless as it's not initialized (because hbase:namespace is
unreachable), but I can't figure out how I can get the Master to realize
it needs to assign it.

Do we have this written down somewhere already?

Thanks in advance.

- Josh






[jira] [Created] (HBASE-21387) Race condition in snapshot cache refreshing leads to loss of snapshot files

2018-10-25 Thread Ted Yu (JIRA)
Ted Yu created HBASE-21387:
--

 Summary: Race condition in snapshot cache refreshing leads to loss 
of snapshot files
 Key: HBASE-21387
 URL: https://issues.apache.org/jira/browse/HBASE-21387
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu


During recent report from customer where ExportSnapshot failed:
{code}
2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] 
snapshot.SnapshotReferenceUtil: Can't find hfile: 
44f6c3c646e84de6a63fe30da4fcb3aa in the real 
(hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 or archive 
(hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa)
 directory for the primary table. 
{code}
We found the following in log:
{code}
2018-10-09 18:54:23,675 DEBUG 
[00:16000.activeMasterManager-HFileCleaner.large-1539035367427] 
cleaner.HFileCleaner: Removing: 
hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa 
from archive
{code}
The root cause is race condition surrounding SnapshotFileCache#refreshCache().
There are two callers of refreshCache: one from RefreshCacheTask#run and the 
other from SnapshotHFileCleaner.
Let's look at the code of refreshCache:
{code}
// if the snapshot directory wasn't modified since we last check, we are 
done
if (dirStatus.getModificationTime() <= this.lastModifiedTime) return;

// 1. update the modified time
this.lastModifiedTime = dirStatus.getModificationTime();

// 2.clear the cache
this.cache.clear();
{code}
Suppose the RefreshCacheTask runs past the if check and sets 
this.lastModifiedTime
The cleaner executes refreshCache and returns immediately since 
this.lastModifiedTime matches the modification time of the directory.
Now RefreshCacheTask clears the cache. By the time the cleaner performs cache 
lookup, the cache is empty.
Therefore cleaner puts the file into unReferencedFiles - leading to data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Dealing with an unassigned hbase:namespace

2018-10-25 Thread Tak-Lon (Stephen) Wu
Not sure if this is related to
https://issues.apache.org/jira/browse/HBASE-20671

if so, and if you have HBASE-20702 on your branch, when a new cluster that
master starts on the
same root directory (my case is on root directory is on S3),
hbase:namespace table cannot be assigned.
(Sorry that I don't have time to work on the solution, but I locally
removed patch of HBASE-20702)

-Stephen

On Thu, Oct 25, 2018 at 8:12 AM Josh Elser  wrote:

> I have a cluster on branch-2.0 from a week or two ago where
> hbase:namespace is not assigned (haven't figured out why yet), I can't
> use any of the normal assign/move shell commands because the Master is
> useless as it's not initialized (because hbase:namespace is
> unreachable), but I can't figure out how I can get the Master to realize
> it needs to assign it.
>
> Do we have this written down somewhere already?
>
> Thanks in advance.
>
> - Josh
>


-- 
Stephen Wu | Indiana University, Bloomington


[jira] [Resolved] (HBASE-18152) [AMv2] Corrupt Procedure WAL file; procedure data stored out of order

2018-10-25 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-18152.
---
Resolution: Cannot Reproduce

Closing as 'Cannot Reproduce' Haven seen this in months. It looks like its 
fixed to me.

> [AMv2] Corrupt Procedure WAL file; procedure data stored out of order
> -
>
> Key: HBASE-18152
> URL: https://issues.apache.org/jira/browse/HBASE-18152
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: 
> 0001-TestWALProcedureExecutore-order-checking-test-that-d.patch, 
> HBASE-18152.master.001.patch, 
> hbase-hbase-master-ctr-e138-1518143905142-221855-01-02.hwx.site.log.gz, 
> pv2-0036.log, pv2-0047.log, 
> reading_bad_wal.patch
>
>
> I've seen corruption from time-to-time testing.  Its rare enough. Often we 
> can get over it but sometimes we can't. It took me a while to capture an 
> instance of corruption. Turns out we are write to the WAL out-of-order which 
> undoes a basic tenet; that WAL content is ordered in line w/ execution.
> Below I'll post a corrupt WAL.
> Looking at the write-side, there is a lot going on. I'm not clear on how we 
> could write out of order. Will try and get more insight. Meantime parking 
> this issue here to fill data into.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Changing hadoop check versions in our hbase-personality?

2018-10-25 Thread Josh Elser
I hear what you're saying, but I'd also be pissed if we waste our own 
time dealing with Hadoop issues that have already been addressed. Just 
my feeling, but I think I'm distracting from the original question at 
this point.


On 10/24/18 5:01 PM, Sean Busbey wrote:

Yes, because they deployed when 2.6.5 wasn't the latest and they don't want
to deal with the headaches of Hadoop upgrades.

If we do only one, it should be the oldest IMHO.

But if we're just talking about changing thing in new minor releases of
HBase this is moot. We make 2.7.7 the minimum instead of 2.7.1, call out
the need to check later release lines against that CVE, and move on.


On Wed, Oct 24, 2018, 15:00 Josh Elser  wrote:


IMO -- for the 2.6 line, let's just use 2.6.latest (2.6.5).

Hadoop seems to have moved beyond 2.6, it doesn't seem likely that we're
creating a lot of value for our users. Would someone deploying a Hadoop
2.6 release seriously try a release other than the latest?

On 10/22/18 9:32 PM, 张铎(Duo Zhang) wrote:

See here:

https://access.redhat.com/security/cve/cve-2018-8009

All 2.7.x releases before 2.7.7 have the problem. And for 2.6.x, the

hadoop

team seems to drop the support as there is no release about two years, so
either we keep the original support versions, or we just drop the support
for the 2.6.x release line.

Zach York  于2018年10月23日周二 上午8:51写道:


What is the main reason for the change? Build time speedup?

Any reason for testing all of the 2.6.x line, but not the 2.7.x line? We
don't check at all for 2.8.x?

Can we be more consistent with how we test compatibility? (Do we only

care

about the latest patch release in a line?)

Sorry If I'm missing some of the reasoning, but at a surface level it

seems

fairly arbitrary which releases we are cutting.

On Mon, Oct 22, 2018 at 5:44 PM Sean Busbey  wrote:


Please leave me time to review before it is committed.

On Mon, Oct 22, 2018, 13:58 Stack  wrote:


Duo has a patch up on HBASE-20970 that changes the Hadoop versions we

check

at build time. Any objections to committing to branch-2.1+?

It makes following changes:

2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4

becomes

2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.7

And...

3.0.0

goes to

3.0.3

Shout if you are against the change else will commit tomorrow.

Thanks,
S













Dealing with an unassigned hbase:namespace

2018-10-25 Thread Josh Elser
I have a cluster on branch-2.0 from a week or two ago where 
hbase:namespace is not assigned (haven't figured out why yet), I can't 
use any of the normal assign/move shell commands because the Master is 
useless as it's not initialized (because hbase:namespace is 
unreachable), but I can't figure out how I can get the Master to realize 
it needs to assign it.


Do we have this written down somewhere already?

Thanks in advance.

- Josh


[jira] [Created] (HBASE-21386) Disable TestRSGroups#testRSGroupsWithHBaseQuota; causes TestRSGroups to fail in branch-2.1.

2018-10-25 Thread stack (JIRA)
stack created HBASE-21386:
-

 Summary: Disable TestRSGroups#testRSGroupsWithHBaseQuota; causes 
TestRSGroups to fail in branch-2.1.
 Key: HBASE-21386
 URL: https://issues.apache.org/jira/browse/HBASE-21386
 Project: HBase
  Issue Type: Sub-task
  Components: test
Reporter: stack
Assignee: stack
 Fix For: 2.1.1


Disable testRSGroupsWithHBaseQuota in TestRSGroups. It is a test added after 
the original set in TestRSGroups that is not like the others. After fix of the 
balancer in HBASE-21266, its manufacture causes a bunch of failures. Parent 
issue is about making a real fix. Pushing this to branch-2.1 in meantime so can 
cut an RC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)