[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline
[ https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842122#comment-13842122 ] Lars Hofhansl commented on HBASE-9047: -- Looks good (this is actually very clever I think). The only concern is the potential cost of this: {code} + // In the case of disaster/recovery, HMaster may be shutdown/crashed before flush data + // from .logs to .oldlogs. Loop into .logs folders and check whether a match exists + FileStatus[] rss = fs.listStatus(manager.getLogDir()); + for (FileStatus rs : rss) { +Path p = rs.getPath(); +FileStatus[] logs = fs.listStatus(p); ... {code} Any way we can restrict this to the case when we run this tool? Or maybe it's not a problem since we only get here when did not find a log to begin with...? > Tool to handle finishing replication when the cluster is offline > > > Key: HBASE-9047 > URL: https://issues.apache.org/jira/browse/HBASE-9047 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.96.0 >Reporter: Jean-Daniel Cryans >Assignee: Demai Ni > Fix For: 0.98.0 > > Attachments: HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch, > HBASE-9047-trunk-v1.patch, HBASE-9047-trunk-v2.patch, > HBASE-9047-trunk-v3.patch, HBASE-9047-trunk-v4.patch, > HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v5.patch > > > We're having a discussion on the mailing list about replicating the data on a > cluster that was shut down in an offline fashion. The motivation could be > that you don't want to bring HBase back up but still need that data on the > slave. > So I have this idea of a tool that would be running on the master cluster > while it is down, although it could also run at any time. Basically it would > be able to read the replication state of each master region server, finish > replicating what's missing to all the slave, and then clear that state in > zookeeper. > The code that handles replication does most of that already, see > ReplicationSourceManager and ReplicationSource. Basically when > ReplicationSourceManager.init() is called, it will check all the queues in ZK > and try to grab those that aren't attached to a region server. If the whole > cluster is down, it will grab all of them. > The beautiful thing here is that you could start that tool on all your > machines and the load will be spread out, but that might not be a big concern > if replication wasn't lagging since it would take a few seconds to finish > replicating the missing data for each region server. > I'm guessing when starting ReplicationSourceManager you'd give it a fake > region server ID, and you'd tell it not to start its own source. > FWIW the main difference in how replication is handled between Apache's HBase > and Facebook's is that the latter is always done separately of HBase itself. > This jira isn't about doing that. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-10093) Unregister ReplicationSource metric bean when the replication source thread is terminated
[ https://issues.apache.org/jira/browse/HBASE-10093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-10093. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to 0.94. Thanks for the patch. > Unregister ReplicationSource metric bean when the replication source thread > is terminated > -- > > Key: HBASE-10093 > URL: https://issues.apache.org/jira/browse/HBASE-10093 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 0.94.14 >Reporter: cuijianwei >Assignee: cuijianwei > Fix For: 0.94.15 > > Attachments: HBASE-10093-0.94-v1.patch > > > Each replication source thread will register a metric bean to show its > statistics. The source threads will be terminated when region server exit and > the metric beans will be removed. However, replication source thread may also > be terminated when user removing the peer explicitly or it just takes a > recover queue and finished replicating the queued HLogs. In these situations, > the metric bean won't be unregistered and user may be confused to always see > the statistics from terminated replication source threads. Maybe, it is more > clear to remove the metric bean after replication source thread terminated? > Then, the statistics will only from active replication sources. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10097) Remove a region name string creation in HRegion#nextInternal
[ https://issues.apache.org/jira/browse/HBASE-10097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842115#comment-13842115 ] Hudson commented on HBASE-10097: FAILURE: Integrated in hbase-0.96 #217 (See [https://builds.apache.org/job/hbase-0.96/217/]) HBASE-10097 Remove a region name string creation in HRegion#nextInternal (nkeywal: rev 1548712) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcCallContext.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java > Remove a region name string creation in HRegion#nextInternal > > > Key: HBASE-10097 > URL: https://issues.apache.org/jira/browse/HBASE-10097 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.98.0, 0.96.1, 0.99.0 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon >Priority: Critical > Fix For: 0.96.1, 0.98.1, 0.99.0 > > Attachments: 10097.v1.patch > > > We're creating a String in each "nextInternal". Before HBASE-9983 this was > cached, but it's not the case anymore... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10094) Add batching to HLogPerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842116#comment-13842116 ] Hudson commented on HBASE-10094: FAILURE: Integrated in hbase-0.96 #217 (See [https://builds.apache.org/job/hbase-0.96/217/]) HBASE-10094 Add batching to HLogPerformanceEvaluation (stack: rev 1548754) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java > Add batching to HLogPerformanceEvaluation > - > > Key: HBASE-10094 > URL: https://issues.apache.org/jira/browse/HBASE-10094 > Project: HBase > Issue Type: Sub-task > Components: Performance, wal >Reporter: stack >Assignee: Himanshu Vashishtha > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: 10094v2.txt > > > As Himanshu points out in the the parent issue, HLogPE is using an unorthodox > API appending edits to the WAL; it is using an API that is meant for tests > only that does an append immediately followed by a sync call. > In normal deploy, WAL appends are done as a bunch of appends followed by a > sync on the tail of the transaction -- not a sync per append. > This issue is about changing HLogPE to use append and then sync. It also > adds an argument so you can specifying batching of a set of appends before > the sync is called. The latter lets HLogPE mimic multi puts that use the > minibatch... which appends, appends, appends.. and then syncs. > Assigning to Himanshu for review. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842118#comment-13842118 ] Hudson commented on HBASE-10061: FAILURE: Integrated in hbase-0.96 #217 (See [https://builds.apache.org/job/hbase-0.96/217/]) HBASE-10061 TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE (Amit Sela) (ndimiduk: rev 1548749) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.94.15, 0.99.0 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a cluster restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842117#comment-13842117 ] Hudson commented on HBASE-10085: FAILURE: Integrated in hbase-0.96 #217 (See [https://builds.apache.org/job/hbase-0.96/217/]) HBASE-10085: Some regions aren't re-assigned after a master restarts (jeffreyz: rev 1548728) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java > Some regions aren't re-assigned after a cluster restarts > > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10089) Metrics intern table names cause eventual permgen OOM in 0.94
[ https://issues.apache.org/jira/browse/HBASE-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842112#comment-13842112 ] Lars Hofhansl commented on HBASE-10089: --- I would still question how much memory we are actually saving. The number of life tables will relatively small (certainly not more than 1000) and the number of metrics per table per CF is not too large either. > Metrics intern table names cause eventual permgen OOM in 0.94 > - > > Key: HBASE-10089 > URL: https://issues.apache.org/jira/browse/HBASE-10089 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.0, 0.94.14 >Reporter: Dave Latham >Assignee: Ted Yu >Priority: Minor > Fix For: 0.94.15 > > Attachments: 10089-0.94.txt > > > As part of the metrics system introduced in HBASE-4768 there are two places > that hbase uses String interning ( SchemaConfigured and SchemaMetrics ). > This includes interning table names. We have long running environment where > we run regular integration tests on our application using hbase. Those tests > create and drop tables with new names regularly. These leads to filling up > the permgen with interned table names. Workaround is to periodically restart > the region servers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10000) Initiate lease recovery for outstanding WAL files at the very beginning of recovery
[ https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842104#comment-13842104 ] Ted Yu commented on HBASE-1: bq. the expected gain in case num_log_files > available split log workers. Right - this is what the change targets. > Initiate lease recovery for outstanding WAL files at the very beginning of > recovery > --- > > Key: HBASE-1 > URL: https://issues.apache.org/jira/browse/HBASE-1 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.98.1 > > Attachments: 1-recover-ts-with-pb-2.txt, > 1-recover-ts-with-pb-3.txt, 1-recover-ts-with-pb-4.txt, 1-v1.txt, > 1-v4.txt, 1-v5.txt, 1-v6.txt > > > At the beginning of recovery, master can send lease recovery requests > concurrently for outstanding WAL files using a thread pool. > Each split worker would first check whether the WAL file it processes is > closed. > Thanks to Nicolas Liochon and Jeffery discussion with whom gave rise to this > idea. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10048) Add hlog number metric in regionserver
[ https://issues.apache.org/jira/browse/HBASE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842084#comment-13842084 ] stack commented on HBASE-10048: --- [~lhofhansl] I added it to trunk. Will add to 0.94 after I shoehorn it into 0.96 (morrow) > Add hlog number metric in regionserver > -- > > Key: HBASE-10048 > URL: https://issues.apache.org/jira/browse/HBASE-10048 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-10048-0.94-v1.diff, HBASE-10048-0.94-v2.diff, > HBASE-10048-trunk-v1.diff, HBASE-10048-trunk-v2.diff, > HBASE-10048-trunk-v3.diff, HBASE-10048-trunk-v4.diff > > > Add hlog number metric in regionserver. > We can use this metric to alert about memstore flush because of too many > hlogs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842081#comment-13842081 ] Anoop Sam John commented on HBASE-10061: Oh I am late. Still +1... Thanks Amit and Nick! > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.94.15, 0.99.0 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10090) Master could hang in assigning meta
[ https://issues.apache.org/jira/browse/HBASE-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842080#comment-13842080 ] Hadoop QA commented on HBASE-10090: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617518/trunk-10090_v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8084//console This message is automatically generated. > Master could hang in assigning meta > --- > > Key: HBASE-10090 > URL: https://issues.apache.org/jira/browse/HBASE-10090 > Project: HBase > Issue Type: Bug >Affects Versions: 0.96.0 >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Attachments: trunk-10090.patch, trunk-10090_v2.patch > > > Under very rare scenario, master could hang waiting for meta to be assigned > while the meta server is dead. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10101) testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
[ https://issues.apache.org/jira/browse/HBASE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10101: Attachment: test.log Here is the right log. > testOfflineRegionReAssginedAfterMasterRestart times out sometimes. > -- > > Key: HBASE-10101 > URL: https://issues.apache.org/jira/browse/HBASE-10101 > Project: HBase > Issue Type: Test >Reporter: Jimmy Xiang >Priority: Minor > Attachments: test.log > > > Sometimes, I got this test timed out. The log is attached. It could be > because the new cluster takes a while to process the dead server, or assign > meta. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10059) TestSplitLogWorker#testMultipleTasks fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842072#comment-13842072 ] Jimmy Xiang commented on HBASE-10059: - +1 > TestSplitLogWorker#testMultipleTasks fails occasionally > --- > > Key: HBASE-10059 > URL: https://issues.apache.org/jira/browse/HBASE-10059 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Jeffrey Zhong > Attachments: hbase-10059.patch > > > From > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/857/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitLogWorker/testMultipleTasks/ > : > {code} > 2013-11-30 01:13:23,022 INFO [pool-1-thread-1] hbase.ResourceChecker(147): > before: regionserver.TestSplitLogWorker#testMultipleTasks Thread=16, > OpenFileDescriptor=157, MaxFileDescriptor=4, SystemLoadAverage=338, > ProcessCount=144, AvailableMemoryMB=1474, ConnectionCount=0 > 2013-11-30 01:13:23,026 INFO [pool-1-thread-1] > zookeeper.MiniZooKeeperCluster(200): Started MiniZK Cluster and connect 1 ZK > server on client port: 53800 > 2013-11-30 01:13:23,029 INFO [pool-1-thread-1] > zookeeper.RecoverableZooKeeper(120): Process > identifier=split-log-worker-tests connecting to ZooKeeper > ensemble=localhost:53800 > 2013-11-30 01:13:23,249 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, type=None, > state=SyncConnected, path=null > 2013-11-30 01:13:23,251 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(387): split-log-worker-tests-0x142a6913350 > connected > 2013-11-30 01:13:23,261 DEBUG [pool-1-thread-1] > regionserver.TestSplitLogWorker(105): /hbase created > 2013-11-30 01:13:23,270 DEBUG [pool-1-thread-1] > regionserver.TestSplitLogWorker(108): /hbase/splitWAL created > 2013-11-30 01:13:23,278 DEBUG [pool-1-thread-1] executor.ExecutorService(99): > Starting executor service name=RS_LOG_REPLAY_OPS-TestSplitLogWorker, > corePoolSize=10, maxPoolSize=10 > 2013-11-30 01:13:23,278 INFO [pool-1-thread-1] > regionserver.TestSplitLogWorker(246): testMultipleTasks > 2013-11-30 01:13:23,280 INFO [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(175): SplitLogWorker tmt_svr,1,1 starting > 2013-11-30 01:13:23,380 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,394 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeChildrenChanged, state=SyncConnected, path=/hbase/splitWAL > 2013-11-30 01:13:23,394 DEBUG [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(595): tasks arrived or departed > 2013-11-30 01:13:23,394 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,402 INFO [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(363): worker tmt_svr,1,1 acquired task > /hbase/splitWAL/tmt_task > 2013-11-30 01:13:23,410 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeChildrenChanged, state=SyncConnected, path=/hbase/splitWAL > 2013-11-30 01:13:23,410 DEBUG [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(595): tasks arrived or departed > 2013-11-30 01:13:23,418 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeDataChanged, state=SyncConnected, path=/hbase/splitWAL/tmt_task > 2013-11-30 01:13:23,419 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,420 INFO [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(522): task /hbase/splitWAL/tmt_task preempted > from tmt_svr,1,1, current task state and owner=OWNED another-worker,1,1 > 2013-11-30 01:13:23,420 INFO [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(608): Sending interrupt to stop the worker thread > 2013-11-30 01:13:23,420 WARN [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(374): Interrupted while yielding for other region > servers > java.lang.InterruptedException: sleep interrupted > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:372) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:251) > at > org.apache.hadoop.hbase.regionserve
[jira] [Commented] (HBASE-10103) TestNodeHealthCheckChore#testRSHealthChore: Stoppable must have been stopped
[ https://issues.apache.org/jira/browse/HBASE-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842059#comment-13842059 ] Andrew Purtell commented on HBASE-10103: Test output, will come back later to look at cause, maybe RS initialization doesn't happen fast enough for this test now that we are using Hadoop 2 as default: {noformat} 2013-12-07 02:22:11,094 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: TestNodeHealthCh eckChore#testRSHealthChore Thread=114, OpenFileDescriptor=796, MaxFileDescriptor=65536, SystemLoadAv erage=119, ProcessCount=73, AvailableMemoryMB=1744, ConnectionCount=2 2013-12-07 02:22:11,096 INFO [pool-1-thread-1] hbase.TestNodeHealthCheckChore(129): Created /data/s rc/hbase/hbase-server/target/test-data/bc5c27a4-678b-4600-9279-e471aebe046f/HealthScript9595afc4-7f2 a-4054-be8a-6af3a2993645.sh, executable=true 2013-12-07 02:22:11,097 INFO [pool-1-thread-1] hbase.HealthCheckChore(42): Health Check Chore runs every 0sec 2013-12-07 02:22:11,097 INFO [pool-1-thread-1] hbase.HealthChecker(68): HealthChecker initialized with script at /data/src/hbase/hbase-server/target/test-data/bc5c27a4-678b-4600-9279-e471aebe046f/HealthScript9595afc4-7f2a-4054-be8a-6af3a2993645.sh, timeout=2000 2013-12-07 02:22:11,446 INFO [pool-1-thread-1] hbase.HealthCheckChore(65): Health status at 385106hrs, 22mins, 11sec : ERROR Server not healthy 2013-12-07 02:22:11,793 INFO [pool-1-thread-1] hbase.HealthCheckChore(65): Health status at 385106hrs, 22mins, 11sec : ERROR Server not healthy 2013-12-07 02:22:12,141 INFO [pool-1-thread-1] hbase.HealthCheckChore(65): Health status at 385106hrs, 22mins, 12sec : ERROR Server not healthy 2013-12-07 02:22:12,142 DEBUG [pool-1-thread-1] fs.HFileSystem(214): The file system is not a DistributedFileSystem. Skipping on block location reordering 2013-12-07 02:22:12,497 INFO [pool-1-thread-1] hbase.ResourceChecker(171): after: TestNodeHealthCheckChore#testRSHealthChore Thread=114 (was 114), OpenFileDescriptor=798 (was 796) - OpenFileDescriptor LEAK? -, MaxFileDescriptor=65536 (was 65536), SystemLoadAverage=119 (was 119), ProcessCount=73 (was 73), AvailableMemoryMB=1744 (was 1744), ConnectionCount=2 (was 2) {noformat} > TestNodeHealthCheckChore#testRSHealthChore: Stoppable must have been stopped > > > Key: HBASE-10103 > URL: https://issues.apache.org/jira/browse/HBASE-10103 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Andrew Purtell > Fix For: 0.98.0, 0.99.0 > > > {noformat} > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 623.639 sec > <<< FAILURE! > testRSHealthChore(org.apache.hadoop.hbase.TestNodeHealthCheckChore) Time > elapsed: 0.001 sec <<< FAILURE! > java.lang.AssertionError: Stoppable must have been stopped. > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hbase.TestNodeHealthCheckChore.testRSHealthChore(TestNodeHealthCheckChore.java:108) > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10103) TestNodeHealthCheckChore#testRSHealthChore: Stoppable must have been stopped
Andrew Purtell created HBASE-10103: -- Summary: TestNodeHealthCheckChore#testRSHealthChore: Stoppable must have been stopped Key: HBASE-10103 URL: https://issues.apache.org/jira/browse/HBASE-10103 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Andrew Purtell Fix For: 0.98.0, 0.99.0 {noformat} Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 623.639 sec <<< FAILURE! testRSHealthChore(org.apache.hadoop.hbase.TestNodeHealthCheckChore) Time elapsed: 0.001 sec <<< FAILURE! java.lang.AssertionError: Stoppable must have been stopped. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.TestNodeHealthCheckChore.testRSHealthChore(TestNodeHealthCheckChore.java:108) {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10059) TestSplitLogWorker#testMultipleTasks fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842049#comment-13842049 ] Hadoop QA commented on HBASE-10059: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617522/hbase-10059.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8083//console This message is automatically generated. > TestSplitLogWorker#testMultipleTasks fails occasionally > --- > > Key: HBASE-10059 > URL: https://issues.apache.org/jira/browse/HBASE-10059 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Jeffrey Zhong > Attachments: hbase-10059.patch > > > From > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/857/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitLogWorker/testMultipleTasks/ > : > {code} > 2013-11-30 01:13:23,022 INFO [pool-1-thread-1] hbase.ResourceChecker(147): > before: regionserver.TestSplitLogWorker#testMultipleTasks Thread=16, > OpenFileDescriptor=157, MaxFileDescriptor=4, SystemLoadAverage=338, > ProcessCount=144, AvailableMemoryMB=1474, ConnectionCount=0 > 2013-11-30 01:13:23,026 INFO [pool-1-thread-1] > zookeeper.MiniZooKeeperCluster(200): Started MiniZK Cluster and connect 1 ZK > server on client port: 53800 > 2013-11-30 01:13:23,029 INFO [pool-1-thread-1] > zookeeper.RecoverableZooKeeper(120): Process > identifier=split-log-worker-tests connecting to ZooKeeper > ensemble=localhost:53800 > 2013-11-30 01:13:23,249 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, type=None, > state=SyncConnected, path=null > 2013-11-30 01:13:23,251 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(387): split-log-worker-tests-0x142a6913350 > connected > 2013-11-30 01:13:23,261 DEBUG [pool-1-thread-1] > regionserver.TestSplitLogWorker(105): /hbase created > 2013-11-30 01:13:23,270 DEBUG [pool-1-thread-1] > regionserver.TestSplitLogWorker(108): /hbase/splitWAL created > 2013-11-30 01:13:23,278 DEBUG [pool-1-thread-1] executor.ExecutorService(99): > Starting executor se
[jira] [Commented] (HBASE-10000) Initiate lease recovery for outstanding WAL files at the very beginning of recovery
[ https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842031#comment-13842031 ] Enis Soztutar commented on HBASE-1: --- Skimmed the patch. It looks ok to me. There is some expected slowdown, because of the parallelization (default 6) for submitting the recoverLease() RPC's to NN might be a bit slower than doing this from the RS's in parallel, where 6 < num_log_files < num_region_servers. I guess we can live with it, because of the expected gain in case num_log_files > available split log workers. > Initiate lease recovery for outstanding WAL files at the very beginning of > recovery > --- > > Key: HBASE-1 > URL: https://issues.apache.org/jira/browse/HBASE-1 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.98.1 > > Attachments: 1-recover-ts-with-pb-2.txt, > 1-recover-ts-with-pb-3.txt, 1-recover-ts-with-pb-4.txt, 1-v1.txt, > 1-v4.txt, 1-v5.txt, 1-v6.txt > > > At the beginning of recovery, master can send lease recovery requests > concurrently for outstanding WAL files using a thread pool. > Each split worker would first check whether the WAL file it processes is > closed. > Thanks to Nicolas Liochon and Jeffery discussion with whom gave rise to this > idea. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10102) CF.VERSIONS is not enforced with timerange scans
[ https://issues.apache.org/jira/browse/HBASE-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-10102: -- Attachment: 10102-0.94-POC.txt POC patch. Just need to park it somewhere. Not tested. > CF.VERSIONS is not enforced with timerange scans > > > Key: HBASE-10102 > URL: https://issues.apache.org/jira/browse/HBASE-10102 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Attachments: 10102-0.94-POC.txt > > > Example brought up by Niels Basjes on the user list: > If I do the following commands into the hbase shell > {code} > create 't1', {NAME => 'c1', VERSIONS => 1} > put 't1', 'r1', 'c1', 'One', 1000 > put 't1', 'r1', 'c1', 'Two', 2000 > put 't1', 'r1', 'c1', 'Three', 3000 > get 't1', 'r1' > get 't1', 'r1' , {TIMERANGE => [0,1500]} > the result is this: > get 't1', 'r1' > COLUMN CELL > c1: timestamp=3000, value=Three > 1 row(s) in 0.0780 seconds > get 't1', 'r1' , {TIMERANGE => [0,1500]} > COLUMN CELL > c1: timestamp=1000, value=One > 1 row(s) in 0.1390 seconds > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10102) CF.VERSIONS is not enforced with timerange scans
[ https://issues.apache.org/jira/browse/HBASE-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842019#comment-13842019 ] Lars Hofhansl commented on HBASE-10102: --- Currently the workflow in ScanQueryMatcher is something like this: # = min(, ) # filter by timerange # filter out columns (i.e. columns not specified in the scan) # apply customer filters # filter by Every KV is passed through this filtering process. What we should do is this: # filter by # filter by timerange # filter out columns (i.e. columns not specified in the scan) # apply customer filters # filter by I have a POC patch that does this. It does not slow scanning in a measurable way. > CF.VERSIONS is not enforced with timerange scans > > > Key: HBASE-10102 > URL: https://issues.apache.org/jira/browse/HBASE-10102 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > > Example brought up by Niels Basjes on the user list: > If I do the following commands into the hbase shell > {code} > create 't1', {NAME => 'c1', VERSIONS => 1} > put 't1', 'r1', 'c1', 'One', 1000 > put 't1', 'r1', 'c1', 'Two', 2000 > put 't1', 'r1', 'c1', 'Three', 3000 > get 't1', 'r1' > get 't1', 'r1' , {TIMERANGE => [0,1500]} > the result is this: > get 't1', 'r1' > COLUMN CELL > c1: timestamp=3000, value=Three > 1 row(s) in 0.0780 seconds > get 't1', 'r1' , {TIMERANGE => [0,1500]} > COLUMN CELL > c1: timestamp=1000, value=One > 1 row(s) in 0.1390 seconds > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10102) CF.VERSIONS is not enforced with timerange scans
Lars Hofhansl created HBASE-10102: - Summary: CF.VERSIONS is not enforced with timerange scans Key: HBASE-10102 URL: https://issues.apache.org/jira/browse/HBASE-10102 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Example brought up by Niels Basjes on the user list: If I do the following commands into the hbase shell {code} create 't1', {NAME => 'c1', VERSIONS => 1} put 't1', 'r1', 'c1', 'One', 1000 put 't1', 'r1', 'c1', 'Two', 2000 put 't1', 'r1', 'c1', 'Three', 3000 get 't1', 'r1' get 't1', 'r1' , {TIMERANGE => [0,1500]} the result is this: get 't1', 'r1' COLUMN CELL c1: timestamp=3000, value=Three 1 row(s) in 0.0780 seconds get 't1', 'r1' , {TIMERANGE => [0,1500]} COLUMN CELL c1: timestamp=1000, value=One 1 row(s) in 0.1390 seconds {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10089) Metrics intern table names cause eventual permgen OOM in 0.94
[ https://issues.apache.org/jira/browse/HBASE-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842008#comment-13842008 ] Mikhail Bautin commented on HBASE-10089: [~xieliang007]: yes, you are right. In that case, this problem is specific to JDK 6 with low permgen size. > Metrics intern table names cause eventual permgen OOM in 0.94 > - > > Key: HBASE-10089 > URL: https://issues.apache.org/jira/browse/HBASE-10089 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.0, 0.94.14 >Reporter: Dave Latham >Assignee: Ted Yu >Priority: Minor > Fix For: 0.94.15 > > Attachments: 10089-0.94.txt > > > As part of the metrics system introduced in HBASE-4768 there are two places > that hbase uses String interning ( SchemaConfigured and SchemaMetrics ). > This includes interning table names. We have long running environment where > we run regular integration tests on our application using hbase. Those tests > create and drop tables with new names regularly. These leads to filling up > the permgen with interned table names. Workaround is to periodically restart > the region servers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7572) move metadata settings that duplicate xml config settings to CF/table config in a backward-compatible manner
[ https://issues.apache.org/jira/browse/HBASE-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842009#comment-13842009 ] Hadoop QA commented on HBASE-7572: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617506/HBASE-7572-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 27 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestShell Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8081//console This message is automatically generated. > move metadata settings that duplicate xml config settings to CF/table config > in a backward-compatible manner > > > Key: HBASE-7572 > URL: https://issues.apache.org/jira/browse/HBASE-7572 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HBASE-7572-v0.patch, HBASE-7572-v1.patch, > HBASE-7572-v2.patch, HBASE-7572-v3.patch, HBASE-7572-v4.patch > > > 2nd part of splitting HBASE-7236 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10089) Metrics intern table names cause eventual permgen OOM in 0.94
[ https://issues.apache.org/jira/browse/HBASE-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841993#comment-13841993 ] Liang Xie commented on HBASE-10089: --- [~mikhail], IMHO, that link is not correct, the interned strings could be garbage collected absolutely... > Metrics intern table names cause eventual permgen OOM in 0.94 > - > > Key: HBASE-10089 > URL: https://issues.apache.org/jira/browse/HBASE-10089 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.0, 0.94.14 >Reporter: Dave Latham >Assignee: Ted Yu >Priority: Minor > Fix For: 0.94.15 > > Attachments: 10089-0.94.txt > > > As part of the metrics system introduced in HBASE-4768 there are two places > that hbase uses String interning ( SchemaConfigured and SchemaMetrics ). > This includes interning table names. We have long running environment where > we run regular integration tests on our application using hbase. Those tests > create and drop tables with new names regularly. These leads to filling up > the permgen with interned table names. Workaround is to periodically restart > the region servers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841971#comment-13841971 ] Hadoop QA commented on HBASE-9892: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617512/HBASE-9892-trunk-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestMetricsRegionServerSourceImpl Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8082//console This message is automatically generated. > Add info port to ServerName to support multi instances in a node > > > Key: HBASE-9892 > URL: https://issues.apache.org/jira/browse/HBASE-9892 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, > HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, > HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, > HBASE-9892-trunk-v1.patch, HBASE-9892-v5.txt > > > The full GC time of regionserver with big heap(> 30G ) usually can not be > controlled in 30s. At the same time, the servers with 64G memory are normal. > So we try to deploy multi rs instances(2-3 ) in a single node and the heap of > each rs is about 20G ~ 24G. > Most of the things works fine, except the hbase web ui. The master get the RS > info port from conf, which is suitable for this situation of multi rs > instances in a node. So we add info port to ServerName. > a. at the startup, rs report it's info port to Hmaster. > b, For root region, rs write the servername with info port ro the zookeeper > root-region-server node. > c, For meta regions, rs write the servername with info port to root region > d. For user regions, rs write the servername with info port to meta regions > So hmaster and client can get info port from the servername. > To test this feature, I change the rs num from 1 to
[jira] [Updated] (HBASE-10101) testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
[ https://issues.apache.org/jira/browse/HBASE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10101: Attachment: (was: test.log) > testOfflineRegionReAssginedAfterMasterRestart times out sometimes. > -- > > Key: HBASE-10101 > URL: https://issues.apache.org/jira/browse/HBASE-10101 > Project: HBase > Issue Type: Test >Reporter: Jimmy Xiang >Priority: Minor > > Sometimes, I got this test timed out. The log is attached. It could be > because the new cluster takes a while to process the dead server, or assign > meta. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10101) testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
[ https://issues.apache.org/jira/browse/HBASE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841968#comment-13841968 ] Jimmy Xiang commented on HBASE-10101: - Attached a wrong log. Let me get the right one. > testOfflineRegionReAssginedAfterMasterRestart times out sometimes. > -- > > Key: HBASE-10101 > URL: https://issues.apache.org/jira/browse/HBASE-10101 > Project: HBase > Issue Type: Test >Reporter: Jimmy Xiang >Priority: Minor > > Sometimes, I got this test timed out. The log is attached. It could be > because the new cluster takes a while to process the dead server, or assign > meta. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a cluster restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841969#comment-13841969 ] Jeffrey Zhong commented on HBASE-10085: --- ok. Let me check it. Thanks. > Some regions aren't re-assigned after a cluster restarts > > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a cluster restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841967#comment-13841967 ] Jimmy Xiang commented on HBASE-10085: - [~jeffreyz], I filed an issue on the test: HBASE-10101. Do you want to take a look? > Some regions aren't re-assigned after a cluster restarts > > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10059) TestSplitLogWorker#testMultipleTasks fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10059: -- Status: Patch Available (was: Open) > TestSplitLogWorker#testMultipleTasks fails occasionally > --- > > Key: HBASE-10059 > URL: https://issues.apache.org/jira/browse/HBASE-10059 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Jeffrey Zhong > Attachments: hbase-10059.patch > > > From > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/857/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitLogWorker/testMultipleTasks/ > : > {code} > 2013-11-30 01:13:23,022 INFO [pool-1-thread-1] hbase.ResourceChecker(147): > before: regionserver.TestSplitLogWorker#testMultipleTasks Thread=16, > OpenFileDescriptor=157, MaxFileDescriptor=4, SystemLoadAverage=338, > ProcessCount=144, AvailableMemoryMB=1474, ConnectionCount=0 > 2013-11-30 01:13:23,026 INFO [pool-1-thread-1] > zookeeper.MiniZooKeeperCluster(200): Started MiniZK Cluster and connect 1 ZK > server on client port: 53800 > 2013-11-30 01:13:23,029 INFO [pool-1-thread-1] > zookeeper.RecoverableZooKeeper(120): Process > identifier=split-log-worker-tests connecting to ZooKeeper > ensemble=localhost:53800 > 2013-11-30 01:13:23,249 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, type=None, > state=SyncConnected, path=null > 2013-11-30 01:13:23,251 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(387): split-log-worker-tests-0x142a6913350 > connected > 2013-11-30 01:13:23,261 DEBUG [pool-1-thread-1] > regionserver.TestSplitLogWorker(105): /hbase created > 2013-11-30 01:13:23,270 DEBUG [pool-1-thread-1] > regionserver.TestSplitLogWorker(108): /hbase/splitWAL created > 2013-11-30 01:13:23,278 DEBUG [pool-1-thread-1] executor.ExecutorService(99): > Starting executor service name=RS_LOG_REPLAY_OPS-TestSplitLogWorker, > corePoolSize=10, maxPoolSize=10 > 2013-11-30 01:13:23,278 INFO [pool-1-thread-1] > regionserver.TestSplitLogWorker(246): testMultipleTasks > 2013-11-30 01:13:23,280 INFO [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(175): SplitLogWorker tmt_svr,1,1 starting > 2013-11-30 01:13:23,380 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,394 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeChildrenChanged, state=SyncConnected, path=/hbase/splitWAL > 2013-11-30 01:13:23,394 DEBUG [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(595): tasks arrived or departed > 2013-11-30 01:13:23,394 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,402 INFO [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(363): worker tmt_svr,1,1 acquired task > /hbase/splitWAL/tmt_task > 2013-11-30 01:13:23,410 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeChildrenChanged, state=SyncConnected, path=/hbase/splitWAL > 2013-11-30 01:13:23,410 DEBUG [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(595): tasks arrived or departed > 2013-11-30 01:13:23,418 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeDataChanged, state=SyncConnected, path=/hbase/splitWAL/tmt_task > 2013-11-30 01:13:23,419 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,420 INFO [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(522): task /hbase/splitWAL/tmt_task preempted > from tmt_svr,1,1, current task state and owner=OWNED another-worker,1,1 > 2013-11-30 01:13:23,420 INFO [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(608): Sending interrupt to stop the worker thread > 2013-11-30 01:13:23,420 WARN [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(374): Interrupted while yielding for other region > servers > java.lang.InterruptedException: sleep interrupted > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:372) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:251) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorke
[jira] [Commented] (HBASE-10048) Add hlog number metric in regionserver
[ https://issues.apache.org/jira/browse/HBASE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841963#comment-13841963 ] Lars Hofhansl commented on HBASE-10048: --- That is nice. And something we would monitor. +1 for 0.94. Needs to be in trunk (0.99 now) as well, right? > Add hlog number metric in regionserver > -- > > Key: HBASE-10048 > URL: https://issues.apache.org/jira/browse/HBASE-10048 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-10048-0.94-v1.diff, HBASE-10048-0.94-v2.diff, > HBASE-10048-trunk-v1.diff, HBASE-10048-trunk-v2.diff, > HBASE-10048-trunk-v3.diff, HBASE-10048-trunk-v4.diff > > > Add hlog number metric in regionserver. > We can use this metric to alert about memstore flush because of too many > hlogs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10101) testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
[ https://issues.apache.org/jira/browse/HBASE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10101: Attachment: test.log > testOfflineRegionReAssginedAfterMasterRestart times out sometimes. > -- > > Key: HBASE-10101 > URL: https://issues.apache.org/jira/browse/HBASE-10101 > Project: HBase > Issue Type: Test >Reporter: Jimmy Xiang >Priority: Minor > Attachments: test.log > > > Sometimes, I got this test timed out. The log is attached. It could be > because the new cluster takes a while to process the dead server, or assign > meta. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10059) TestSplitLogWorker#testMultipleTasks fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-10059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10059: -- Attachment: hbase-10059.patch Increased the timeout from 1.5 secs to 5 secs. > TestSplitLogWorker#testMultipleTasks fails occasionally > --- > > Key: HBASE-10059 > URL: https://issues.apache.org/jira/browse/HBASE-10059 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Jeffrey Zhong > Attachments: hbase-10059.patch > > > From > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/857/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitLogWorker/testMultipleTasks/ > : > {code} > 2013-11-30 01:13:23,022 INFO [pool-1-thread-1] hbase.ResourceChecker(147): > before: regionserver.TestSplitLogWorker#testMultipleTasks Thread=16, > OpenFileDescriptor=157, MaxFileDescriptor=4, SystemLoadAverage=338, > ProcessCount=144, AvailableMemoryMB=1474, ConnectionCount=0 > 2013-11-30 01:13:23,026 INFO [pool-1-thread-1] > zookeeper.MiniZooKeeperCluster(200): Started MiniZK Cluster and connect 1 ZK > server on client port: 53800 > 2013-11-30 01:13:23,029 INFO [pool-1-thread-1] > zookeeper.RecoverableZooKeeper(120): Process > identifier=split-log-worker-tests connecting to ZooKeeper > ensemble=localhost:53800 > 2013-11-30 01:13:23,249 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, type=None, > state=SyncConnected, path=null > 2013-11-30 01:13:23,251 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(387): split-log-worker-tests-0x142a6913350 > connected > 2013-11-30 01:13:23,261 DEBUG [pool-1-thread-1] > regionserver.TestSplitLogWorker(105): /hbase created > 2013-11-30 01:13:23,270 DEBUG [pool-1-thread-1] > regionserver.TestSplitLogWorker(108): /hbase/splitWAL created > 2013-11-30 01:13:23,278 DEBUG [pool-1-thread-1] executor.ExecutorService(99): > Starting executor service name=RS_LOG_REPLAY_OPS-TestSplitLogWorker, > corePoolSize=10, maxPoolSize=10 > 2013-11-30 01:13:23,278 INFO [pool-1-thread-1] > regionserver.TestSplitLogWorker(246): testMultipleTasks > 2013-11-30 01:13:23,280 INFO [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(175): SplitLogWorker tmt_svr,1,1 starting > 2013-11-30 01:13:23,380 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,394 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeChildrenChanged, state=SyncConnected, path=/hbase/splitWAL > 2013-11-30 01:13:23,394 DEBUG [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(595): tasks arrived or departed > 2013-11-30 01:13:23,394 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,402 INFO [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(363): worker tmt_svr,1,1 acquired task > /hbase/splitWAL/tmt_task > 2013-11-30 01:13:23,410 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeChildrenChanged, state=SyncConnected, path=/hbase/splitWAL > 2013-11-30 01:13:23,410 DEBUG [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(595): tasks arrived or departed > 2013-11-30 01:13:23,418 DEBUG [pool-1-thread-1-EventThread] > zookeeper.ZooKeeperWatcher(310): split-log-worker-tests-0x142a6913350, > quorum=localhost:53800, baseZNode=/hbase Received ZooKeeper Event, > type=NodeDataChanged, state=SyncConnected, path=/hbase/splitWAL/tmt_task > 2013-11-30 01:13:23,419 INFO [pool-1-thread-1] hbase.Waiter(174): Waiting up > to [1,500] milli-secs(wait.for.ratio=[1]) > 2013-11-30 01:13:23,420 INFO [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(522): task /hbase/splitWAL/tmt_task preempted > from tmt_svr,1,1, current task state and owner=OWNED another-worker,1,1 > 2013-11-30 01:13:23,420 INFO [pool-1-thread-1-EventThread] > regionserver.SplitLogWorker(608): Sending interrupt to stop the worker thread > 2013-11-30 01:13:23,420 WARN [SplitLogWorker-tmt_svr,1,1] > regionserver.SplitLogWorker(374): Interrupted while yielding for other region > servers > java.lang.InterruptedException: sleep interrupted > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:372) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:251) > at > org.apach
[jira] [Created] (HBASE-10101) testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
Jimmy Xiang created HBASE-10101: --- Summary: testOfflineRegionReAssginedAfterMasterRestart times out sometimes. Key: HBASE-10101 URL: https://issues.apache.org/jira/browse/HBASE-10101 Project: HBase Issue Type: Test Reporter: Jimmy Xiang Priority: Minor Attachments: test.log Sometimes, I got this test timed out. The log is attached. It could be because the new cluster takes a while to process the dead server, or assign meta. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10090) Master could hang in assigning meta
[ https://issues.apache.org/jira/browse/HBASE-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10090: Status: Patch Available (was: Open) Incorporated the latest patch for HBASE-10085. > Master could hang in assigning meta > --- > > Key: HBASE-10090 > URL: https://issues.apache.org/jira/browse/HBASE-10090 > Project: HBase > Issue Type: Bug >Affects Versions: 0.96.0 >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Attachments: trunk-10090.patch, trunk-10090_v2.patch > > > Under very rare scenario, master could hang waiting for meta to be assigned > while the meta server is dead. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10090) Master could hang in assigning meta
[ https://issues.apache.org/jira/browse/HBASE-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10090: Attachment: trunk-10090_v2.patch > Master could hang in assigning meta > --- > > Key: HBASE-10090 > URL: https://issues.apache.org/jira/browse/HBASE-10090 > Project: HBase > Issue Type: Bug >Affects Versions: 0.96.0 >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Attachments: trunk-10090.patch, trunk-10090_v2.patch > > > Under very rare scenario, master could hang waiting for meta to be assigned > while the meta server is dead. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10100) Hbase replication cluster can have varying peers under certain conditions
churro morales created HBASE-10100: -- Summary: Hbase replication cluster can have varying peers under certain conditions Key: HBASE-10100 URL: https://issues.apache.org/jira/browse/HBASE-10100 Project: HBase Issue Type: Bug Affects Versions: 0.96.0, 0.95.0, 0.94.5 Reporter: churro morales We were trying to replicate hbase data over to a new datacenter recently. After we turned on replication and then did our copy tables. We noticed that verify replication had discrepancies. We ran a list_peers and it returned back both peers, the original datacenter we were replicating to and the new datacenter (this was correct). When grepping through the logs for a few regionservers we noticed that a few regionservers had the following entry in their logs: 2013-09-26 10:55:46,907 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Error while adding a new peer java.net.UnknownHostException: xxx.xxx.flurry.com (this was due to a transient dns issue) Thus a very small subet of our regionservers were not replicating to this new cluster while most were. We probably don't want to abort if this type of issue comes up, it could potentially be fatal if someone does an "add_peer" operation with a typo. This could potentially shut down the cluster. One solution I can think of is keeping some flag in ReplicationSourceManager which is a boolean that keeps track of whether there was an errorAddingPeer. Then in the logPositionAndCleanOldLogs we can do something like: {code} if (errorAddingPeer) { LOG.error("There was an error adding a peer, logs will not be marked for deletion"); return; } {code} thus we are not deleting these logs from the queue. You will notice your replicating queue rising on certain machines and you can still replay the logs, thus avoiding a lengthy copy table. I have a patch (with unit test) for the above proposal, if everyone thinks that is an okay solution. An additional idea would be to add some retry logic inside the PeersWatcher class for the nodeChildrenChanged method. Thus if there happens to be some issue we could sort it out without having to bounce that particular regionserver. Would love to hear everyones thoughts. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10048) Add hlog number metric in regionserver
[ https://issues.apache.org/jira/browse/HBASE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841948#comment-13841948 ] stack commented on HBASE-10048: --- Applied to trunk and to 0.98. 0.96 not going in. Will fix it later. [~lhofhansl] You want this? Its nice. > Add hlog number metric in regionserver > -- > > Key: HBASE-10048 > URL: https://issues.apache.org/jira/browse/HBASE-10048 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-10048-0.94-v1.diff, HBASE-10048-0.94-v2.diff, > HBASE-10048-trunk-v1.diff, HBASE-10048-trunk-v2.diff, > HBASE-10048-trunk-v3.diff, HBASE-10048-trunk-v4.diff > > > Add hlog number metric in regionserver. > We can use this metric to alert about memstore flush because of too many > hlogs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10000) Initiate lease recovery for outstanding WAL files at the very beginning of recovery
[ https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841941#comment-13841941 ] Hadoop QA commented on HBASE-1: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617497/1-recover-ts-with-pb-4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8079//console This message is automatically generated. > Initiate lease recovery for outstanding WAL files at the very beginning of > recovery > --- > > Key: HBASE-1 > URL: https://issues.apache.org/jira/browse/HBASE-1 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.98.1 > > Attachments: 1-recover-ts-with-pb-2.txt, > 1-recover-ts-with-pb-3.txt, 1-recover-ts-with-pb-4.txt, 1-v1.txt, > 1-v4.txt, 1-v5.txt, 1-v6.txt > > > At the beginning of recovery, master can send lease recovery requests > concurrently for outstanding WAL files using a thread pool. > Each split worker would first check whether the WAL file it processes is > closed. > Thanks to Nicolas Liochon and Jeffery discussion with whom gave rise to this > idea. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10048) Add hlog number metric in regionserver
[ https://issues.apache.org/jira/browse/HBASE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841937#comment-13841937 ] Elliott Clark commented on HBASE-10048: --- +1 thanks > Add hlog number metric in regionserver > -- > > Key: HBASE-10048 > URL: https://issues.apache.org/jira/browse/HBASE-10048 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-10048-0.94-v1.diff, HBASE-10048-0.94-v2.diff, > HBASE-10048-trunk-v1.diff, HBASE-10048-trunk-v2.diff, > HBASE-10048-trunk-v3.diff, HBASE-10048-trunk-v4.diff > > > Add hlog number metric in regionserver. > We can use this metric to alert about memstore flush because of too many > hlogs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10089) Metrics intern table names cause eventual permgen OOM in 0.94
[ https://issues.apache.org/jira/browse/HBASE-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841933#comment-13841933 ] Lars Hofhansl commented on HBASE-10089: --- Need to perf test this as well. Things like {{cfName.equals(other.cfName)}} will be sped up if both string are interned, no? In that case equals can just compare the references. > Metrics intern table names cause eventual permgen OOM in 0.94 > - > > Key: HBASE-10089 > URL: https://issues.apache.org/jira/browse/HBASE-10089 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.0, 0.94.14 >Reporter: Dave Latham >Assignee: Ted Yu >Priority: Minor > Fix For: 0.94.15 > > Attachments: 10089-0.94.txt > > > As part of the metrics system introduced in HBASE-4768 there are two places > that hbase uses String interning ( SchemaConfigured and SchemaMetrics ). > This includes interning table names. We have long running environment where > we run regular integration tests on our application using hbase. Those tests > create and drop tables with new names regularly. These leads to filling up > the permgen with interned table names. Workaround is to periodically restart > the region servers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9892: - Attachment: HBASE-9892-trunk-v1.patch > Add info port to ServerName to support multi instances in a node > > > Key: HBASE-9892 > URL: https://issues.apache.org/jira/browse/HBASE-9892 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, > HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, > HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, > HBASE-9892-trunk-v1.patch, HBASE-9892-v5.txt > > > The full GC time of regionserver with big heap(> 30G ) usually can not be > controlled in 30s. At the same time, the servers with 64G memory are normal. > So we try to deploy multi rs instances(2-3 ) in a single node and the heap of > each rs is about 20G ~ 24G. > Most of the things works fine, except the hbase web ui. The master get the RS > info port from conf, which is suitable for this situation of multi rs > instances in a node. So we add info port to ServerName. > a. at the startup, rs report it's info port to Hmaster. > b, For root region, rs write the servername with info port ro the zookeeper > root-region-server node. > c, For meta regions, rs write the servername with info port to root region > d. For user regions, rs write the servername with info port to meta regions > So hmaster and client can get info port from the servername. > To test this feature, I change the rs num from 1 to 3 in standalone mode, so > we can test it in standalone mode, > I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows > how Hoya handle this problem? > PS: There are different formats for servername in zk node and meta table, i > think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841919#comment-13841919 ] Hadoop QA commented on HBASE-9892: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617503/HBASE-9892-trunk-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestMetricsRegionServerSourceImpl Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8080//console This message is automatically generated. > Add info port to ServerName to support multi instances in a node > > > Key: HBASE-9892 > URL: https://issues.apache.org/jira/browse/HBASE-9892 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, > HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, > HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-v5.txt > > > The full GC time of regionserver with big heap(> 30G ) usually can not be > controlled in 30s. At the same time, the servers with 64G memory are normal. > So we try to deploy multi rs instances(2-3 ) in a single node and the heap of > each rs is about 20G ~ 24G. > Most of the things works fine, except the hbase web ui. The master get the RS > info port from conf, which is suitable for this situation of multi rs > instances in a node. So we add info port to ServerName. > a. at the startup, rs report it's info port to Hmaster. > b, For root region, rs write the servername with info port ro the zookeeper > root-region-server node. > c, For meta regions, rs write the servername with info port to root region > d. For user regions, rs write the servername with info port to meta regions > So hmaster and client can get info port from the servername. > To test this feature, I change the rs num from 1 to 3 in standalone mode, so > w
[jira] [Commented] (HBASE-10071) support data type for get/scan in hbase shell
[ https://issues.apache.org/jira/browse/HBASE-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841914#comment-13841914 ] stack commented on HBASE-10071: --- [~cuijianwei] Nice. What the lads said about it coming in via trunk but, one question: this is different from the formatting that is already present? See the scan help in the shell: {code} Besides the default 'toStringBinary' format, 'scan' supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the scan specification. The FORMATTER can be stipulated: 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString) 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'. Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } Note that you can specify a FORMATTER by column only (cf:qualifer). You cannot specify a FORMATTER for all columns of a column family. {code} > support data type for get/scan in hbase shell > - > > Key: HBASE-10071 > URL: https://issues.apache.org/jira/browse/HBASE-10071 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 0.94.14 >Reporter: cuijianwei > Attachments: HBASE-10071-0.94-v1.patch > > > Users tend to run hbase shell to query hbase quickly. The result will be > shown as binary format which may not look clear enough when users write > columns using specified types, such as long/int/short. Therefore, it may be > helpful if the results could be shown as specified format. We make a patch to > extend get/scan in hbase shell in which user could specify the data type in > get/scan for each column as: > {code} > scan 'table', {COLUMNS=>['CF:QF:long']} > get 'table', 'r0', {COLUMN=>'CF:QF:long'} > {code} > Then, the result will be shown as Long type. The result of above get will be: > {code} > COLUMNCELL > > > CF:QFtimestamp=24311261, value=24311229 > {code} > This extended format is compatible with previous format, if users do not > specify the data type, the command will also work and output binary format. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-4163) Create Split Strategy for YCSB Benchmark
[ https://issues.apache.org/jira/browse/HBASE-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4163. -- Resolution: Fixed Fix Version/s: 0.99.0 Assignee: Luke Lu (was: Lars George) Closing as fixed. I also added note to our little ycsb section in the doc that will show the next time I push the site; it points at Luke's little script. > Create Split Strategy for YCSB Benchmark > > > Key: HBASE-4163 > URL: https://issues.apache.org/jira/browse/HBASE-4163 > Project: HBase > Issue Type: Improvement > Components: util >Affects Versions: 0.90.3, 0.92.0 >Reporter: Nicolas Spiegelberg >Assignee: Luke Lu >Priority: Minor > Labels: benchmark > Fix For: 0.99.0 > > > Talked with Lars about how we can make it easier for users to run the YCSB > benchmarks against HBase & get realistic results. Currently, HBase is > optimized for the random/uniform read/write case, which is the YCSB load. > The initial reason why we perform bad when users test against us is because > they do not presplit regions & have the split ratio really low. We need a > one-line way for a user to create a table that is pre-split to 200 regions > (or some decent number) by default & disable splitting. Realistically, this > is how a uniform load cluster should scale, so it's not a hack. This will > also give us a good use case to point to for how users should pre-split > regions. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-7572) move metadata settings that duplicate xml config settings to CF/table config in a backward-compatible manner
[ https://issues.apache.org/jira/browse/HBASE-7572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7572: Attachment: HBASE-7572-v4.patch I noticed this got stuck... rebased the patch. [~enis] I noticed durability changes added another setting to HTD instead of a config setting... Did you add it intentionally, or didn't know about config overrides? Just checking. With this patch (maybe I will add comment about it to HTD on review iteration/commit), hopefully we can start using configuration in preference for HTD custom fields > move metadata settings that duplicate xml config settings to CF/table config > in a backward-compatible manner > > > Key: HBASE-7572 > URL: https://issues.apache.org/jira/browse/HBASE-7572 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HBASE-7572-v0.patch, HBASE-7572-v1.patch, > HBASE-7572-v2.patch, HBASE-7572-v3.patch, HBASE-7572-v4.patch > > > 2nd part of splitting HBASE-7236 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10048) Add hlog number metric in regionserver
[ https://issues.apache.org/jira/browse/HBASE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10048: -- Component/s: metrics Fix Version/s: 0.99.0 0.96.1 0.98.0 > Add hlog number metric in regionserver > -- > > Key: HBASE-10048 > URL: https://issues.apache.org/jira/browse/HBASE-10048 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-10048-0.94-v1.diff, HBASE-10048-0.94-v2.diff, > HBASE-10048-trunk-v1.diff, HBASE-10048-trunk-v2.diff, > HBASE-10048-trunk-v3.diff, HBASE-10048-trunk-v4.diff > > > Add hlog number metric in regionserver. > We can use this metric to alert about memstore flush because of too many > hlogs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10048) Add hlog number metric in regionserver
[ https://issues.apache.org/jira/browse/HBASE-10048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841904#comment-13841904 ] stack commented on HBASE-10048: --- +1 Patch is great. [~eclark] Please bless and then I'll commit. > Add hlog number metric in regionserver > -- > > Key: HBASE-10048 > URL: https://issues.apache.org/jira/browse/HBASE-10048 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-10048-0.94-v1.diff, HBASE-10048-0.94-v2.diff, > HBASE-10048-trunk-v1.diff, HBASE-10048-trunk-v2.diff, > HBASE-10048-trunk-v3.diff, HBASE-10048-trunk-v4.diff > > > Add hlog number metric in regionserver. > We can use this metric to alert about memstore flush because of too many > hlogs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9892: - Fix Version/s: 0.99.0 0.96.1 0.98.0 > Add info port to ServerName to support multi instances in a node > > > Key: HBASE-9892 > URL: https://issues.apache.org/jira/browse/HBASE-9892 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, > HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, > HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-v5.txt > > > The full GC time of regionserver with big heap(> 30G ) usually can not be > controlled in 30s. At the same time, the servers with 64G memory are normal. > So we try to deploy multi rs instances(2-3 ) in a single node and the heap of > each rs is about 20G ~ 24G. > Most of the things works fine, except the hbase web ui. The master get the RS > info port from conf, which is suitable for this situation of multi rs > instances in a node. So we add info port to ServerName. > a. at the startup, rs report it's info port to Hmaster. > b, For root region, rs write the servername with info port ro the zookeeper > root-region-server node. > c, For meta regions, rs write the servername with info port to root region > d. For user regions, rs write the servername with info port to meta regions > So hmaster and client can get info port from the servername. > To test this feature, I change the rs num from 1 to 3 in standalone mode, so > we can test it in standalone mode, > I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows > how Hoya handle this problem? > PS: There are different formats for servername in zk node and meta table, i > think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841897#comment-13841897 ] stack commented on HBASE-9892: -- [~enis] You good w/ patch as is? I am. Wouldn't mind getting it into 0.96.1. > Add info port to ServerName to support multi instances in a node > > > Key: HBASE-9892 > URL: https://issues.apache.org/jira/browse/HBASE-9892 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, > HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, > HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-v5.txt > > > The full GC time of regionserver with big heap(> 30G ) usually can not be > controlled in 30s. At the same time, the servers with 64G memory are normal. > So we try to deploy multi rs instances(2-3 ) in a single node and the heap of > each rs is about 20G ~ 24G. > Most of the things works fine, except the hbase web ui. The master get the RS > info port from conf, which is suitable for this situation of multi rs > instances in a node. So we add info port to ServerName. > a. at the startup, rs report it's info port to Hmaster. > b, For root region, rs write the servername with info port ro the zookeeper > root-region-server node. > c, For meta regions, rs write the servername with info port to root region > d. For user regions, rs write the servername with info port to meta regions > So hmaster and client can get info port from the servername. > To test this feature, I change the rs num from 1 to 3 in standalone mode, so > we can test it in standalone mode, > I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows > how Hoya handle this problem? > PS: There are different formats for servername in zk node and meta table, i > think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9892: - Attachment: HBASE-9892-trunk-v1.patch Retry > Add info port to ServerName to support multi instances in a node > > > Key: HBASE-9892 > URL: https://issues.apache.org/jira/browse/HBASE-9892 > Project: HBase > Issue Type: Improvement >Reporter: Liu Shaohui >Assignee: Liu Shaohui >Priority: Minor > Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, > HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, > HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-v5.txt > > > The full GC time of regionserver with big heap(> 30G ) usually can not be > controlled in 30s. At the same time, the servers with 64G memory are normal. > So we try to deploy multi rs instances(2-3 ) in a single node and the heap of > each rs is about 20G ~ 24G. > Most of the things works fine, except the hbase web ui. The master get the RS > info port from conf, which is suitable for this situation of multi rs > instances in a node. So we add info port to ServerName. > a. at the startup, rs report it's info port to Hmaster. > b, For root region, rs write the servername with info port ro the zookeeper > root-region-server node. > c, For meta regions, rs write the servername with info port to root region > d. For user regions, rs write the servername with info port to meta regions > So hmaster and client can get info port from the servername. > To test this feature, I change the rs num from 1 to 3 in standalone mode, so > we can test it in standalone mode, > I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows > how Hoya handle this problem? > PS: There are different formats for servername in zk node and meta table, i > think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10092) Move up on to log4j2
[ https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841892#comment-13841892 ] stack commented on HBASE-10092: --- Make sure zkcli works when I am done...see if I can fix the eclipse issue over in HBASE-10073 while I am at it. > Move up on to log4j2 > > > Key: HBASE-10092 > URL: https://issues.apache.org/jira/browse/HBASE-10092 > Project: HBase > Issue Type: Task >Reporter: stack > Attachments: 10092.txt > > > Allows logging with less friction. See http://logging.apache.org/log4j/2.x/ > This rather radical transition can be done w/ minor change given they have an > adapter for apache's logging, the one we use. They also have and adapter for > slf4j so we likely can remove at least some of the 4 versions of this module > our dependencies make use of. > I made a start in attached patch but am currently stuck in maven dependency > resolve hell courtesy of our slf4j. Fixing will take some concentration and > a good net connection, an item I currently lack. Other TODOs are that will > need to fix our little log level setting jsp page -- will likely have to undo > our use of hadoop's tool here -- and the config system changes a little. > I will return to this project soon. Will bring numbers. > -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10073) Revert HBASE-9718 (Add a test scope dependency on org.slf4j:slf4j-api to hbase-client)
[ https://issues.apache.org/jira/browse/HBASE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841890#comment-13841890 ] stack commented on HBASE-10073: --- Ugh. Yeah, this is awful. I was trying to update the logging lib in hbase and noticed that there four versions of slf4j being pulled in by our dependencies and that these four versions cannot be replaced by one only as they are incompatible. slf4j is also doing us the favor of spewing any console w/ a message if it finds more than one version of slf4j on the classpath. Nice. Let me link this issue to my log4j update issue to be sure I don't break it. At the moment I cannot upgrade because of this issue in slf4j. > Revert HBASE-9718 (Add a test scope dependency on org.slf4j:slf4j-api to > hbase-client) > -- > > Key: HBASE-10073 > URL: https://issues.apache.org/jira/browse/HBASE-10073 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Affects Versions: 0.96.1 > Environment: Centos6, sun-jdk-64bit-1.7.0.25 >Reporter: Aleksandr Shulman >Assignee: Andrew Purtell > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: 10073.patch > > > Observed behavior: > In my automation, I have a call to hbase zkcli. That call recently broke with > this checkin: > https://github.com/apache/hbase/commit/5af0a60efed91ac2084f25f13edb21db0f510e7c > The error that is reported is: > {code}++ ./hbase zkcli > 11:19:58 Warning: $HADOOP_HOME is deprecated. > 11:19:58 > 11:20:00 Exception in thread "main" java.lang.IllegalAccessError: tried to > access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class > org.slf4j.LoggerFactory > 11:20:00 at org.slf4j.LoggerFactory.(LoggerFactory.java:60) > 11:20:00 at > org.apache.zookeeper.ZooKeeperMain.(ZooKeeperMain.java:50) > 11:20:00 at > org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:78) > 11:20:00 Build step 'Execute shell' marked build as failure{code} > That said, this checkin is perfectly valid as each component should be > allowed to specify its own dependencies. > The issue is a deeper one of dependency mismatches. > Note: This issue only affects hadoop1, not hadoop2. It also appears in trunk, > where there is a similar checkin, but since trunk is not required to work > against hadoop1, this is not an issue for trunk. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10094) Add batching to HLogPerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10094: -- Resolution: Fixed Fix Version/s: 0.99.0 0.96.1 0.98.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Himanshu. Yeah, gives some insight on sync rates. Let me see if can get better reporting, a reporting that will expose clumping of syncs or syncing even though only a little amount of data has been written. This would be good to know too. > Add batching to HLogPerformanceEvaluation > - > > Key: HBASE-10094 > URL: https://issues.apache.org/jira/browse/HBASE-10094 > Project: HBase > Issue Type: Sub-task > Components: Performance, wal >Reporter: stack >Assignee: Himanshu Vashishtha > Fix For: 0.98.0, 0.96.1, 0.99.0 > > Attachments: 10094v2.txt > > > As Himanshu points out in the the parent issue, HLogPE is using an unorthodox > API appending edits to the WAL; it is using an API that is meant for tests > only that does an append immediately followed by a sync call. > In normal deploy, WAL appends are done as a bunch of appends followed by a > sync on the tail of the transaction -- not a sync per append. > This issue is about changing HLogPE to use append and then sync. It also > adds an argument so you can specifying batching of a set of appends before > the sync is called. The latter lets HLogPE mimic multi puts that use the > minibatch... which appends, appends, appends.. and then syncs. > Assigning to Himanshu for review. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10061: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to all four branches. Thanks for the patch Amit! > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.94.15, 0.99.0 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841858#comment-13841858 ] Nick Dimiduk commented on HBASE-10061: -- Patch applies cleanly on all 4 branches. Locally ran -Dtest=TestTableMapReduce vs default hadoop profile on each branch, all passed. > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.94.15, 0.99.0 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10000) Initiate lease recovery for outstanding WAL files at the very beginning of recovery
[ https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841843#comment-13841843 ] Ted Yu commented on HBASE-1: Nicolas: thanks for the valuable comments. Patch v4 should address all your comments. Let me answer the last several here since they're important. bq. may be it should be outside of the loop Done. bq. the previous versions was doing multiple calls. I modified the condition so that calls are made when nbAttempt > 0. bq. but ifFileClosed is not available, we will never succeed. The behavior is not changed: we would rely on recoverLease() to be successful. I modified the condition for the initial sleep according to your comments above. bq. Could we say that 'isFileClosed' will be mandatory there? There hasn't been consensus as to which release would have such pre-requisite. So I want to keep the changes cover all supported hadoop releases. bq. I looked especially at FSHDFSUtils This is the part where more attention should be paid. > Initiate lease recovery for outstanding WAL files at the very beginning of > recovery > --- > > Key: HBASE-1 > URL: https://issues.apache.org/jira/browse/HBASE-1 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.98.1 > > Attachments: 1-recover-ts-with-pb-2.txt, > 1-recover-ts-with-pb-3.txt, 1-recover-ts-with-pb-4.txt, 1-v1.txt, > 1-v4.txt, 1-v5.txt, 1-v6.txt > > > At the beginning of recovery, master can send lease recovery requests > concurrently for outstanding WAL files using a thread pool. > Each split worker would first check whether the WAL file it processes is > closed. > Thanks to Nicolas Liochon and Jeffery discussion with whom gave rise to this > idea. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10000) Initiate lease recovery for outstanding WAL files at the very beginning of recovery
[ https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-1: --- Attachment: 1-recover-ts-with-pb-4.txt > Initiate lease recovery for outstanding WAL files at the very beginning of > recovery > --- > > Key: HBASE-1 > URL: https://issues.apache.org/jira/browse/HBASE-1 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.98.1 > > Attachments: 1-recover-ts-with-pb-2.txt, > 1-recover-ts-with-pb-3.txt, 1-recover-ts-with-pb-4.txt, 1-v1.txt, > 1-v4.txt, 1-v5.txt, 1-v6.txt > > > At the beginning of recovery, master can send lease recovery requests > concurrently for outstanding WAL files using a thread pool. > Each split worker would first check whether the WAL file it processes is > closed. > Thanks to Nicolas Liochon and Jeffery discussion with whom gave rise to this > idea. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10061: - Fix Version/s: 0.99.0 > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.94.15, 0.99.0 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a cluster restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841827#comment-13841827 ] Jimmy Xiang commented on HBASE-10085: - bq. The reason to restart whole cluster is that I need to trigger SSH on both old RSs(source RS and dst RS in a region assignment) to repro the exact issue to verify the fix. That's my understanding too. But sometimes, it takes a while to restart the cluster. Let me think about it. > Some regions aren't re-assigned after a cluster restarts > > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10098) [WINDOWS] pass in native library directory from hadoop for unit tests
[ https://issues.apache.org/jira/browse/HBASE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841828#comment-13841828 ] Hadoop QA commented on HBASE-10098: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617488/hbase-10098_v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8078//console This message is automatically generated. > [WINDOWS] pass in native library directory from hadoop for unit tests > - > > Key: HBASE-10098 > URL: https://issues.apache.org/jira/browse/HBASE-10098 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: hbase-10098_v1.patch, hbase-10098_v2.patch > > > On windows, Hadoop depends on native libraries for doing it's job. The bin > scripts already handle finding hadoop's native libs and adding them to > java.library.path, but for running HBase's unit tests, we need to pass them > in. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10094) Add batching to HLogPerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841821#comment-13841821 ] Himanshu Vashishtha commented on HBASE-10094: - Errr, just ignore my last nit... TestHLog doesn't use default iterations. It's good to go. Thanks. > Add batching to HLogPerformanceEvaluation > - > > Key: HBASE-10094 > URL: https://issues.apache.org/jira/browse/HBASE-10094 > Project: HBase > Issue Type: Sub-task > Components: Performance, wal >Reporter: stack >Assignee: Himanshu Vashishtha > Attachments: 10094v2.txt > > > As Himanshu points out in the the parent issue, HLogPE is using an unorthodox > API appending edits to the WAL; it is using an API that is meant for tests > only that does an append immediately followed by a sync call. > In normal deploy, WAL appends are done as a bunch of appends followed by a > sync on the tail of the transaction -- not a sync per append. > This issue is about changing HLogPE to use append and then sync. It also > adds an argument so you can specifying batching of a set of appends before > the sync is called. The latter lets HLogPE mimic multi puts that use the > minibatch... which appends, appends, appends.. and then syncs. > Assigning to Himanshu for review. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10098) [WINDOWS] pass in native library directory from hadoop for unit tests
[ https://issues.apache.org/jira/browse/HBASE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841820#comment-13841820 ] Nick Dimiduk commented on HBASE-10098: -- lgtm. > [WINDOWS] pass in native library directory from hadoop for unit tests > - > > Key: HBASE-10098 > URL: https://issues.apache.org/jira/browse/HBASE-10098 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: hbase-10098_v1.patch, hbase-10098_v2.patch > > > On windows, Hadoop depends on native libraries for doing it's job. The bin > scripts already handle finding hadoop's native libs and adding them to > java.library.path, but for running HBase's unit tests, we need to pass them > in. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10094) Add batching to HLogPerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841816#comment-13841816 ] Himanshu Vashishtha commented on HBASE-10094: - +1. This would give more insights when comparing various schemes on batching sync calls. minor nit: -long numIterations = 1; +long numIterations = 100; I don't think we need this. It will make TestHLog run longer by almost 90 sec. > Add batching to HLogPerformanceEvaluation > - > > Key: HBASE-10094 > URL: https://issues.apache.org/jira/browse/HBASE-10094 > Project: HBase > Issue Type: Sub-task > Components: Performance, wal >Reporter: stack >Assignee: Himanshu Vashishtha > Attachments: 10094v2.txt > > > As Himanshu points out in the the parent issue, HLogPE is using an unorthodox > API appending edits to the WAL; it is using an API that is meant for tests > only that does an append immediately followed by a sync call. > In normal deploy, WAL appends are done as a bunch of appends followed by a > sync on the tail of the transaction -- not a sync per append. > This issue is about changing HLogPE to use append and then sync. It also > adds an argument so you can specifying batching of a set of appends before > the sync is called. The latter lets HLogPE mimic multi puts that use the > minibatch... which appends, appends, appends.. and then syncs. > Assigning to Himanshu for review. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10099) javadoc warning introduced by LabelExpander 188: warning - @return tag has no arguments
[ https://issues.apache.org/jira/browse/HBASE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841810#comment-13841810 ] Hadoop QA commented on HBASE-10099: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617482/HBASE-10099-trunk-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8077//console This message is automatically generated. > javadoc warning introduced by LabelExpander 188: warning - @return tag has no > arguments > > > Key: HBASE-10099 > URL: https://issues.apache.org/jira/browse/HBASE-10099 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Trivial > Fix For: 0.98.0 > > Attachments: HBASE-10099-trunk-v0.patch, HBASE-10099-trunk-v1.patch, > HBASE-10099-trunk-v2.patch > > > src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java:188: > warning - @return tag has no arguments -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10098) [WINDOWS] pass in native library directory from hadoop for unit tests
[ https://issues.apache.org/jira/browse/HBASE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841809#comment-13841809 ] Enis Soztutar commented on HBASE-10098: --- bq. Okay. Would it be useful to propagate the parent process arguments in addition to what you have here? Makes sense. v2 patch adds that. > [WINDOWS] pass in native library directory from hadoop for unit tests > - > > Key: HBASE-10098 > URL: https://issues.apache.org/jira/browse/HBASE-10098 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: hbase-10098_v1.patch, hbase-10098_v2.patch > > > On windows, Hadoop depends on native libraries for doing it's job. The bin > scripts already handle finding hadoop's native libs and adding them to > java.library.path, but for running HBase's unit tests, we need to pass them > in. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841811#comment-13841811 ] Sergey Shelukhin commented on HBASE-5487: - Ah well, I never got to part 2. Did you guys make progress on this? I may have time to resurrect this again soon. > Generic framework for Master-coordinated tasks > -- > > Key: HBASE-5487 > URL: https://issues.apache.org/jira/browse/HBASE-5487 > Project: HBase > Issue Type: New Feature > Components: master, regionserver, Zookeeper >Affects Versions: 0.94.0 >Reporter: Mubarak Seyed >Assignee: Sergey Shelukhin >Priority: Critical > Attachments: Entity management in Master - part 1.pdf, Entity > management in Master - part 1.pdf, Is the FATE of Assignment Manager > FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, > hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf > > > Need a framework to execute master-coordinated tasks in a fault-tolerant > manner. > Master-coordinated tasks such as online-scheme change and delete-range > (deleting region(s) based on start/end key) can make use of this framework. > The advantages of framework are > 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for > master-coordinated tasks > 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK > 3. Easy to plugin new master-coordinated tasks without adding code to core > components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10098) [WINDOWS] pass in native library directory from hadoop for unit tests
[ https://issues.apache.org/jira/browse/HBASE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-10098: -- Attachment: hbase-10098_v2.patch > [WINDOWS] pass in native library directory from hadoop for unit tests > - > > Key: HBASE-10098 > URL: https://issues.apache.org/jira/browse/HBASE-10098 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: hbase-10098_v1.patch, hbase-10098_v2.patch > > > On windows, Hadoop depends on native libraries for doing it's job. The bin > scripts already handle finding hadoop's native libs and adding them to > java.library.path, but for running HBase's unit tests, we need to pass them > in. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9955) Make hadoop2 the default and deprecate hadoop1
[ https://issues.apache.org/jira/browse/HBASE-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9955: - Attachment: 9955v5.098.txt What I applied to 0.98 (includes little addendum) > Make hadoop2 the default and deprecate hadoop1 > -- > > Key: HBASE-9955 > URL: https://issues.apache.org/jira/browse/HBASE-9955 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Fix For: 0.98.0 > > Attachments: 9955.txt, 9955v2.txt, 9955v3.txt, 9955v4.txt, > 9955v5.098.txt, 9955v5.txt, addendum.txt, addendum.txt > > > See "Hadoop version trunk dependency?" on the dev mailing ilst. Consensus > seems to be forming to do the subject line (Recheck the mail thread before > going ahead). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10061: - Assignee: Amit Sela > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Assignee: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.94.15 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-7667) Support stripe compaction
[ https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HBASE-7667. - Resolution: Fixed Fix Version/s: 0.99.0 0.98.0 All the pertinent patches have been committed for some time (before 98 was branched). > Support stripe compaction > - > > Key: HBASE-7667 > URL: https://issues.apache.org/jira/browse/HBASE-7667 > Project: HBase > Issue Type: New Feature > Components: Compaction >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 0.98.0, 0.99.0 > > Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction > perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe > compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe > compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, > Using stripe compactions.pdf, stripe-cdf.pdf > > > So I was thinking about having many regions as the way to make compactions > more manageable, and writing the level db doc about how level db range > overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, > Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication > factor. > And I suggest the following idea, let's call it stripe compactions. It's a > mix between level db ideas and having many small regions. > It allows us to have a subset of benefits of many regions (wrt reads and > compactions) without many of the drawbacks (managing and current > memstore/etc. limitation). > It also doesn't break seqNum-based file sorting for any one key. > It works like this. > The region key space is separated into configurable number of fixed-boundary > stripes (determined the first time we stripe the data, see below). > All the data from memstores is written to normal files with all keys present > (not striped), similar to L0 in LevelDb, or current files. > Compaction policy does 3 types of compactions. > First is L0 compaction, which takes all L0 files and breaks them down by > stripe. It may be optimized by adding more small files from different > stripes, but the main logical outcome is that there are no more L0 files and > all data is striped. > Second is exactly similar to current compaction, but compacting one single > stripe. In future, nothing prevents us from applying compaction rules and > compacting part of the stripe (e.g. similar to current policy with rations > and stuff, tiers, whatever), but for the first cut I'd argue let it "major > compact" the entire stripe. Or just have the ratio and no more complexity. > Finally, the third addresses the concern of the fixed boundaries causing > stripes to be very unbalanced. > It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the > results out with different boundaries. > There's a tradeoff here - if we always take 2 adjacent stripes, compactions > will be smaller but rebalancing will take ridiculous amount of I/O. > If we take many stripes we are essentially getting into the > epic-major-compaction problem again. Some heuristics will have to be in place. > In general, if, before stripes are determined, we initially let L0 grow > before determining the stripes, we will get better boundaries. > Also, unless unbalancing is really large we don't need to rebalance really. > Obviously this scheme (as well as level) is not applicable for all scenarios, > e.g. if timestamp is your key it completely falls apart. > The end result: > - many small compactions that can be spread out in time. > - reads still read from a small number of files (one stripe + L0). > - region splits become marvelously simple (if we could move files between > regions, no references would be needed). > Main advantage over Level (for HBase) is that default store can still open > the files and get correct results - there are no range overlap shenanigans. > It also needs no metadata, although we may record some for convenience. > It also would appear to not cause as much I/O. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a master restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841805#comment-13841805 ] Jeffrey Zhong commented on HBASE-10085: --- I checked in at the same time as your comments. The committed patch has updated format which is from our Apache template auto formatting. The reason to restart whole cluster is that I need to trigger SSH on both old RSs(source RS and dst RS in a region assignment) to repro the exact issue to verify the fix. > Some regions aren't re-assigned after a master restarts > --- > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9955) Make hadoop2 the default and deprecate hadoop1
[ https://issues.apache.org/jira/browse/HBASE-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-9955: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Resolving. > Make hadoop2 the default and deprecate hadoop1 > -- > > Key: HBASE-9955 > URL: https://issues.apache.org/jira/browse/HBASE-9955 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Fix For: 0.98.0 > > Attachments: 9955.txt, 9955v2.txt, 9955v3.txt, 9955v4.txt, > 9955v5.098.txt, 9955v5.txt, addendum.txt, addendum.txt > > > See "Hadoop version trunk dependency?" on the dev mailing ilst. Consensus > seems to be forming to do the subject line (Recheck the mail thread before > going ahead). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10085) Some regions aren't re-assigned after a cluster restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10085: -- Summary: Some regions aren't re-assigned after a cluster restarts (was: Some regions aren't re-assigned after a master restarts) > Some regions aren't re-assigned after a cluster restarts > > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9955) Make hadoop2 the default and deprecate hadoop1
[ https://issues.apache.org/jira/browse/HBASE-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841803#comment-13841803 ] stack commented on HBASE-9955: -- I applied the addendum to trunk (comment edit in pom.xml). Now let me backport to 0.98. > Make hadoop2 the default and deprecate hadoop1 > -- > > Key: HBASE-9955 > URL: https://issues.apache.org/jira/browse/HBASE-9955 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Fix For: 0.98.0 > > Attachments: 9955.txt, 9955v2.txt, 9955v3.txt, 9955v4.txt, > 9955v5.txt, addendum.txt, addendum.txt > > > See "Hadoop version trunk dependency?" on the dev mailing ilst. Consensus > seems to be forming to do the subject line (Recheck the mail thread before > going ahead). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841802#comment-13841802 ] Nick Dimiduk commented on HBASE-10061: -- Sounds good guys, I'll get this committed this afternoon. This isn't the first bug I've seen related to OSGi classloaders. [~amitsela] any chance you could dream up a unit or integration test that will realistically exercise this scenario? Have a look at TestTableMapReduce and IntegrationTestTableMapReduceUtil for examples. I'm not very familiar with this environment, so I appreciate any advice you can provide. > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.94.15 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10094) Add batching to HLogPerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-10094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841801#comment-13841801 ] Hadoop QA commented on HBASE-10094: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617453/10094v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8075//console This message is automatically generated. > Add batching to HLogPerformanceEvaluation > - > > Key: HBASE-10094 > URL: https://issues.apache.org/jira/browse/HBASE-10094 > Project: HBase > Issue Type: Sub-task > Components: Performance, wal >Reporter: stack >Assignee: Himanshu Vashishtha > Attachments: 10094v2.txt > > > As Himanshu points out in the the parent issue, HLogPE is using an unorthodox > API appending edits to the WAL; it is using an API that is meant for tests > only that does an append immediately followed by a sync call. > In normal deploy, WAL appends are done as a bunch of appends followed by a > sync on the tail of the transaction -- not a sync per append. > This issue is about changing HLogPE to use append and then sync. It also > adds an argument so you can specifying batching of a set of appends before > the sync is called. The latter lets HLogPE mimic multi puts that use the > minibatch... which appends, appends, appends.. and then syncs. > Assigning to Himanshu for review. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-10061: - Fix Version/s: 0.94.15 > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1, 0.94.15 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a master restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841798#comment-13841798 ] Jimmy Xiang commented on HBASE-10085: - Never mind about my previous comment. I can address it in HBASE-10090. Thanks. > Some regions aren't re-assigned after a master restarts > --- > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10098) [WINDOWS] pass in native library directory from hadoop for unit tests
[ https://issues.apache.org/jira/browse/HBASE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841796#comment-13841796 ] Nick Dimiduk commented on HBASE-10098: -- Okay. Would it be useful to propagate the parent process arguments in addition to what you have here? +1. > [WINDOWS] pass in native library directory from hadoop for unit tests > - > > Key: HBASE-10098 > URL: https://issues.apache.org/jira/browse/HBASE-10098 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: hbase-10098_v1.patch > > > On windows, Hadoop depends on native libraries for doing it's job. The bin > scripts already handle finding hadoop's native libs and adding them to > java.library.path, but for running HBase's unit tests, we need to pass them > in. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10000) Initiate lease recovery for outstanding WAL files at the very beginning of recovery
[ https://issues.apache.org/jira/browse/HBASE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841792#comment-13841792 ] Nicolas Liochon commented on HBASE-1: - {code} + if (!exceptions.isEmpty()) { +LOG.debug("Encountered " + exceptions.size() + " exceptions"); +throw exceptions.get(0); + } {code} => Should be info or warning {code} + public static final int LEASE_RECOVERY_UNREQUESTED = 0; {code} => should be a long, no? {code} +if (findIsFileClosedMeth) { + try { +isFileClosedMeth = dfs.getClass().getMethod("isFileClosed", + new Class[]{ Path.class }); + } catch (NoSuchMethodException nsme) { +LOG.debug("isFileClosed not available"); + } finally { +findIsFileClosedMeth = false; + } +} {code} => findIsFileClosedMeth seems to be always true at the beginning, and not read later (i.e. the finally clause is not needed) The code in this method is very complex to read (that's not your fault :-) ), I think if you change it you need to restructure it as well. {code} + if (ts == HConstants.LEASE_RECOVERY_UNREQUESTED) { +startWaiting = EnvironmentEdgeManager.currentTimeMillis(); +recovered = recoverLease(dfs, nbAttempt, p, startWaiting); + } else { +startWaiting = ts; + } {code} => this seems wrong (for each loop, we will reset "startWaiting = ts") or may be it should be outside of the loop I'm not sure of the previous version, but I think we must be ready to do multiple calls to recoverLease, in case the namenode crashed at the wrong time or something alike. With this version, it seems it won't be the case if the calls was made by the master. If I read correctly, the previous versions was doing multiple calls. As well, if I'm not wrong, if the master initiated the recovey but ifFileClosed is not available, we will never succeed. If this case is not covered voluntary this should be documented. {code} // On the first time through wait the short 'firstPause'. if (nbAttempt == 0) { Thread.sleep(firstPause); {code} => Should this be changed if the master initiated the recoverLease? No need to wait 4s. => This code is not needed when isFileClosed is available (as it's cheap, we don't want to wait: we prefer to do the call sooner) What's the target version? Could we say that 'isFileClosed' will be mandatory there? This would simplify the code. (I haven't reviewed everything in details, I looked especially at FSHDFSUtils). > Initiate lease recovery for outstanding WAL files at the very beginning of > recovery > --- > > Key: HBASE-1 > URL: https://issues.apache.org/jira/browse/HBASE-1 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.98.1 > > Attachments: 1-recover-ts-with-pb-2.txt, > 1-recover-ts-with-pb-3.txt, 1-v1.txt, 1-v4.txt, 1-v5.txt, > 1-v6.txt > > > At the beginning of recovery, master can send lease recovery requests > concurrently for outstanding WAL files using a thread pool. > Each split worker would first check whether the WAL file it processes is > closed. > Thanks to Nicolas Liochon and Jeffery discussion with whom gave rise to this > idea. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10085) Some regions aren't re-assigned after a master restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-10085: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~jxiang] for the reviews! I've integrated the fix into trunk, 0.98 and 0.96 branch. The javadoc and findbug warnings are not related to this patch. Thanks. > Some regions aren't re-assigned after a master restarts > --- > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10085) Some regions aren't re-assigned after a master restarts
[ https://issues.apache.org/jira/browse/HBASE-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841793#comment-13841793 ] Jimmy Xiang commented on HBASE-10085: - [~jeffreyz], before you commit it, could you fix the format a little bit? For example: {noformat} +|| !(regionState.isFailedClose() || regionState.isPendingOpenOrOpening() || regionState +.isOffline())) { {noformat} to something like {noformat} +|| !(regionState.isFailedClose() || regionState.isPendingOpenOrOpening() + || regionState.isOffline())) { {noformat} By the way, the new test is a little flaky. Instead of restarting the whole mini cluster, can we just restart the master? Also increase the timeout a little? Thanks. > Some regions aren't re-assigned after a master restarts > --- > > Key: HBASE-10085 > URL: https://issues.apache.org/jira/browse/HBASE-10085 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.96.1 >Reporter: Jeffrey Zhong >Assignee: Jeffrey Zhong > Fix For: 0.98.0, 0.96.1 > > Attachments: hbase-10085.patch > > > We see this issue happened in a cluster restart: > 1) when shutdown a cluster, some regions are in offline state because no > Region servers are available(stop RS and then Master) > 2) When the cluster restarts, the offlined regions are forced to be offline > again and SSH skip re-assigning them by function AM.processServerShutdown as > shown below. > {code} > 2013-12-03 10:41:56,686 INFO > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > Processing 873dbd8c269f44d0aefb0f66c5b53537 in state: M_ZK_REGION_OFFLINE > 2013-12-03 10:41:56,686 DEBUG > [master:h2-ubuntu12-sec-1386048659-hbase-8:6] master.AssignmentManager: > RIT 873dbd8c269f44d0aefb0f66c5b53537 in state=M_ZK_REGION_OFFLINE was on > deadserver; forcing offline > ... > 2013-12-03 10:41:56,739 DEBUG [AM.-pool1-t8] master.AssignmentManager: Force > region state offline {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, > ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > ... > 2013-12-03 10:41:57,223 WARN > [MASTER_SERVER_OPERATIONS-h2-ubuntu12-sec-1386048659-hbase-8:6-3] > master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected > {873dbd8c269f44d0aefb0f66c5b53537 state=OFFLINE, ts=1386067316737, > server=h2-ubuntu12-sec-1386048659-hbase-6.cs1cloud.internal,60020,1386066968696} > > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-3787) Increment is non-idempotent but client retries RPC
[ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-3787: Resolution: Fixed Fix Version/s: 0.99.0 0.98.0 Status: Resolved (was: Patch Available) This was actually committed some time ago (before branching 0.98 I think) > Increment is non-idempotent but client retries RPC > -- > > Key: HBASE-3787 > URL: https://issues.apache.org/jira/browse/HBASE-3787 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.94.4, 0.95.2 >Reporter: dhruba borthakur >Assignee: Sergey Shelukhin >Priority: Blocker > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, > HBASE-3787-v1.patch, HBASE-3787-v10.patch, HBASE-3787-v11.patch, > HBASE-3787-v12.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, > HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch, > HBASE-3787-v6.patch, HBASE-3787-v7.patch, HBASE-3787-v8.patch, > HBASE-3787-v9.patch > > > The HTable.increment() operation is non-idempotent. The client retries the > increment RPC a few times (as specified by configuration) before throwing an > error to the application. This makes it possible that the same increment call > be applied twice at the server. > For increment operations, is it better to use > HConnectionManager.getRegionServerWithoutRetries()? Another option would be > to enhance the IPC module to make the RPC server correctly identify if the > RPC is a retry attempt and handle accordingly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10098) [WINDOWS] pass in native library directory from hadoop for unit tests
[ https://issues.apache.org/jira/browse/HBASE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841789#comment-13841789 ] Enis Soztutar commented on HBASE-10098: --- This is the args that we pass to the maven's child process which is forked to the unit test. I don't think maven passes it's own arguments to the child task. > [WINDOWS] pass in native library directory from hadoop for unit tests > - > > Key: HBASE-10098 > URL: https://issues.apache.org/jira/browse/HBASE-10098 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Enis Soztutar >Assignee: Enis Soztutar > Fix For: 0.98.0, 0.96.2, 0.99.0 > > Attachments: hbase-10098_v1.patch > > > On windows, Hadoop depends on native libraries for doing it's job. The bin > scripts already handle finding hadoop's native libs and adding them to > java.library.path, but for running HBase's unit tests, we need to pass them > in. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9955) Make hadoop2 the default and deprecate hadoop1
[ https://issues.apache.org/jira/browse/HBASE-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841787#comment-13841787 ] Hadoop QA commented on HBASE-9955: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617460/addendum.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8076//console This message is automatically generated. > Make hadoop2 the default and deprecate hadoop1 > -- > > Key: HBASE-9955 > URL: https://issues.apache.org/jira/browse/HBASE-9955 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Fix For: 0.98.0 > > Attachments: 9955.txt, 9955v2.txt, 9955v3.txt, 9955v4.txt, > 9955v5.txt, addendum.txt, addendum.txt > > > See "Hadoop version trunk dependency?" on the dev mailing ilst. Consensus > seems to be forming to do the subject line (Recheck the mail thread before > going ahead). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9829) make the compaction logging less confusing
[ https://issues.apache.org/jira/browse/HBASE-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841782#comment-13841782 ] stack commented on HBASE-9829: -- Fine by me Sergey. You can address on commit. Above are just nits. Patch is nice. > make the compaction logging less confusing > -- > > Key: HBASE-9829 > URL: https://issues.apache.org/jira/browse/HBASE-9829 > Project: HBase > Issue Type: Improvement > Components: Compaction >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Minor > Attachments: HBASE-9829.patch > > > 1) One of the most popular question from HBase users has got to be "I have > scheduled major compactions to run once per week, why are there so many". > We need to somehow tell the user, wherever we log that there is a "major" > compaction, whether it's a major compaction because that's what was in the > request (from regular major compaction or user request), or was it just > promoted because it took all files. Esp. the latter should be clear. > 2) small vs large compaction threads and minor vs major compactions is > confusing. Maybe the threads can be named short and long compactions. > We -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10010) eliminate the put latency spike on the new log file beginning
[ https://issues.apache.org/jira/browse/HBASE-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841777#comment-13841777 ] stack commented on HBASE-10010: --- [~xieliang007] What Himanshu said otherwise looks good to me. If the test changes were mistakenly included, just say, and I'll exclude them from the commit. > eliminate the put latency spike on the new log file beginning > - > > Key: HBASE-10010 > URL: https://issues.apache.org/jira/browse/HBASE-10010 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 0.94.13 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HBase-10010-0.94-v2.txt, HBase-10010-0.94-v3.txt, > HBase-10010-0.94.txt, HBase-10010-trunk-v2.txt, HBase-10010-trunk.txt > > > In deed, the original finding came from fb, see HBASE-6813 for detailed > discussion. > Through this improvement doesn't expect obvious gain on 95th or 99th latency, > it still could make the response time more stable to me. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9829) make the compaction logging less confusing
[ https://issues.apache.org/jira/browse/HBASE-9829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841779#comment-13841779 ] Sergey Shelukhin commented on HBASE-9829: - There are periodic questions on the thread about why major compactions run when they are disabled or not according to time table... this tries to clarify. In the similar vein, "large" and "small" compaction threads get confused with "major" and "minor" compactions, so people assume large thread == major compaction. They are chosen by size, but the reason behind the choice is to remove potentially long-blocking compactions from the thread where many small ones may run (by default, there's only one of each thread, so large compaction would block them), so I think the naming is allowable. Time in the thread name is actually not very useful, other than for grepping by number. Maybe these threads can just be numbered when they are started? "longCompactions-1" is still greppable > make the compaction logging less confusing > -- > > Key: HBASE-9829 > URL: https://issues.apache.org/jira/browse/HBASE-9829 > Project: HBase > Issue Type: Improvement > Components: Compaction >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Minor > Attachments: HBASE-9829.patch > > > 1) One of the most popular question from HBase users has got to be "I have > scheduled major compactions to run once per week, why are there so many". > We need to somehow tell the user, wherever we log that there is a "major" > compaction, whether it's a major compaction because that's what was in the > request (from regular major compaction or user request), or was it just > promoted because it took all files. Esp. the latter should be clear. > 2) small vs large compaction threads and minor vs major compactions is > confusing. Maybe the threads can be named short and long compactions. > We -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10099) javadoc warning introduced by LabelExpander 188: warning - @return tag has no arguments
[ https://issues.apache.org/jira/browse/HBASE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841774#comment-13841774 ] Ted Yu commented on HBASE-10099: Integrated to 0.98 and trunk. Thanks for the patch, Demai. > javadoc warning introduced by LabelExpander 188: warning - @return tag has no > arguments > > > Key: HBASE-10099 > URL: https://issues.apache.org/jira/browse/HBASE-10099 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Trivial > Fix For: 0.98.0 > > Attachments: HBASE-10099-trunk-v0.patch, HBASE-10099-trunk-v1.patch, > HBASE-10099-trunk-v2.patch > > > src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java:188: > warning - @return tag has no arguments -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9718) Add a test scope dependency on org.slf4j:slf4j-api to hbase-client
[ https://issues.apache.org/jira/browse/HBASE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841773#comment-13841773 ] stack commented on HBASE-9718: -- Commit Andrew. We may later find it a problem when some downstream context tries to do something we can only imagine now. There is no 'nice' way of our editing the dependencies our dependencies are including. > Add a test scope dependency on org.slf4j:slf4j-api to hbase-client > -- > > Key: HBASE-9718 > URL: https://issues.apache.org/jira/browse/HBASE-9718 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.98.0 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Minor > Fix For: 0.98.0, 0.96.1 > > Attachments: 9718.patch > > > hbase-client needs a test scope dependency on org.slf4j:slf4j-api in its POM. > Without this change at least Eclipse cannot resolve org.slf4j.Logger from > RecoverableZooKeeper - the ZooKeeper classes use it - and so the > 'hbase-client' project will not build. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10061) TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in thrown NPE
[ https://issues.apache.org/jira/browse/HBASE-10061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841769#comment-13841769 ] Lars Hofhansl commented on HBASE-10061: --- Same here :) > TableMapReduceUtil.findOrCreateJar calls updateMap(null, ) resulting in > thrown NPE > -- > > Key: HBASE-10061 > URL: https://issues.apache.org/jira/browse/HBASE-10061 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.94.12 >Reporter: Amit Sela >Priority: Minor > Fix For: 0.98.0, 0.96.1 > > Attachments: 10061-trunk.txt, 10061-trunk.txt, HBASE-10061.patch > > > TableMapReduceUtil.findOrCreateJar line 596: > jar = getJar(my_class); > updateMap(jar, packagedClasses); > In case getJar returns null, updateMap will throw NPE. > Should check null==jar before calling updateMap. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10099) javadoc warning introduced by LabelExpander 188: warning - @return tag has no arguments
[ https://issues.apache.org/jira/browse/HBASE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-10099: - Attachment: HBASE-10099-trunk-v2.patch change to 'from'. thanks... Demai > javadoc warning introduced by LabelExpander 188: warning - @return tag has no > arguments > > > Key: HBASE-10099 > URL: https://issues.apache.org/jira/browse/HBASE-10099 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Trivial > Fix For: 0.98.0 > > Attachments: HBASE-10099-trunk-v0.patch, HBASE-10099-trunk-v1.patch, > HBASE-10099-trunk-v2.patch > > > src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java:188: > warning - @return tag has no arguments -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10099) javadoc warning introduced by LabelExpander 188: warning - @return tag has no arguments
[ https://issues.apache.org/jira/browse/HBASE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841768#comment-13841768 ] stack commented on HBASE-10099: --- [~ted_yu] Just fix it on commit rather than have [~nidmhbase] go another cycle. > javadoc warning introduced by LabelExpander 188: warning - @return tag has no > arguments > > > Key: HBASE-10099 > URL: https://issues.apache.org/jira/browse/HBASE-10099 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Trivial > Fix For: 0.98.0 > > Attachments: HBASE-10099-trunk-v0.patch, HBASE-10099-trunk-v1.patch > > > src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java:188: > warning - @return tag has no arguments -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10099) javadoc warning introduced by LabelExpander 188: warning - @return tag has no arguments
[ https://issues.apache.org/jira/browse/HBASE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841767#comment-13841767 ] Ted Yu commented on HBASE-10099: +1 nit can be addressed on commit. > javadoc warning introduced by LabelExpander 188: warning - @return tag has no > arguments > > > Key: HBASE-10099 > URL: https://issues.apache.org/jira/browse/HBASE-10099 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Trivial > Fix For: 0.98.0 > > Attachments: HBASE-10099-trunk-v0.patch, HBASE-10099-trunk-v1.patch > > > src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java:188: > warning - @return tag has no arguments -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10099) javadoc warning introduced by LabelExpander 188: warning - @return tag has no arguments
[ https://issues.apache.org/jira/browse/HBASE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841763#comment-13841763 ] Ted Yu commented on HBASE-10099: {code} + * @return KeyValue of the cell visibility expr {code} nit: 'of ' -> 'from' > javadoc warning introduced by LabelExpander 188: warning - @return tag has no > arguments > > > Key: HBASE-10099 > URL: https://issues.apache.org/jira/browse/HBASE-10099 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Trivial > Fix For: 0.98.0 > > Attachments: HBASE-10099-trunk-v0.patch, HBASE-10099-trunk-v1.patch > > > src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java:188: > warning - @return tag has no arguments -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10099) javadoc warning introduced by LabelExpander 188: warning - @return tag has no arguments
[ https://issues.apache.org/jira/browse/HBASE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10099: --- Summary: javadoc warning introduced by LabelExpander 188: warning - @return tag has no arguments (was: javadoc warning instroduced by LabelExpander 188: warning - @return tag has no arguments ) > javadoc warning introduced by LabelExpander 188: warning - @return tag has no > arguments > > > Key: HBASE-10099 > URL: https://issues.apache.org/jira/browse/HBASE-10099 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Trivial > Fix For: 0.98.0 > > Attachments: HBASE-10099-trunk-v0.patch, HBASE-10099-trunk-v1.patch > > > src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java:188: > warning - @return tag has no arguments -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10099) javadoc warning instroduced by LabelExpander 188: warning - @return tag has no arguments
[ https://issues.apache.org/jira/browse/HBASE-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Demai Ni updated HBASE-10099: - Attachment: HBASE-10099-trunk-v1.patch [~yuzhih...@gmail.com], many thanks. I should have paid more attention... Demai > javadoc warning instroduced by LabelExpander 188: warning - @return tag has > no arguments > - > > Key: HBASE-10099 > URL: https://issues.apache.org/jira/browse/HBASE-10099 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0 >Reporter: Demai Ni >Assignee: Demai Ni >Priority: Trivial > Fix For: 0.98.0 > > Attachments: HBASE-10099-trunk-v0.patch, HBASE-10099-trunk-v1.patch > > > src/main/java/org/apache/hadoop/hbase/mapreduce/LabelExpander.java:188: > warning - @return tag has no arguments -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10089) Metrics intern table names cause eventual permgen OOM in 0.94
[ https://issues.apache.org/jira/browse/HBASE-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10089: --- Status: Patch Available (was: Open) > Metrics intern table names cause eventual permgen OOM in 0.94 > - > > Key: HBASE-10089 > URL: https://issues.apache.org/jira/browse/HBASE-10089 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.14, 0.94.0 >Reporter: Dave Latham >Assignee: Ted Yu >Priority: Minor > Fix For: 0.94.15 > > Attachments: 10089-0.94.txt > > > As part of the metrics system introduced in HBASE-4768 there are two places > that hbase uses String interning ( SchemaConfigured and SchemaMetrics ). > This includes interning table names. We have long running environment where > we run regular integration tests on our application using hbase. Those tests > create and drop tables with new names regularly. These leads to filling up > the permgen with interned table names. Workaround is to periodically restart > the region servers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10089) Metrics intern table names cause eventual permgen OOM in 0.94
[ https://issues.apache.org/jira/browse/HBASE-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841745#comment-13841745 ] Ted Yu commented on HBASE-10089: \*Schema\* tests passed based on the patch: {code} Running org.apache.hadoop.hbase.rest.TestSchemaResource Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.774 sec Running org.apache.hadoop.hbase.rest.model.TestColumnSchemaModel Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.299 sec Running org.apache.hadoop.hbase.rest.model.TestTableSchemaModel Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.427 sec Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaConfigured Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.388 sec Running org.apache.hadoop.hbase.regionserver.metrics.TestSchemaMetrics Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.338 sec Running org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.668 sec {code} > Metrics intern table names cause eventual permgen OOM in 0.94 > - > > Key: HBASE-10089 > URL: https://issues.apache.org/jira/browse/HBASE-10089 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.0, 0.94.14 >Reporter: Dave Latham >Assignee: Ted Yu >Priority: Minor > Fix For: 0.94.15 > > Attachments: 10089-0.94.txt > > > As part of the metrics system introduced in HBASE-4768 there are two places > that hbase uses String interning ( SchemaConfigured and SchemaMetrics ). > This includes interning table names. We have long running environment where > we run regular integration tests on our application using hbase. Those tests > create and drop tables with new names regularly. These leads to filling up > the permgen with interned table names. Workaround is to periodically restart > the region servers. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10089) Metrics intern table names cause eventual permgen OOM in 0.94
[ https://issues.apache.org/jira/browse/HBASE-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10089: --- Attachment: 10089-0.94.txt > Metrics intern table names cause eventual permgen OOM in 0.94 > - > > Key: HBASE-10089 > URL: https://issues.apache.org/jira/browse/HBASE-10089 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.0, 0.94.14 >Reporter: Dave Latham >Assignee: Ted Yu >Priority: Minor > Fix For: 0.94.15 > > Attachments: 10089-0.94.txt > > > As part of the metrics system introduced in HBASE-4768 there are two places > that hbase uses String interning ( SchemaConfigured and SchemaMetrics ). > This includes interning table names. We have long running environment where > we run regular integration tests on our application using hbase. Those tests > create and drop tables with new names regularly. These leads to filling up > the permgen with interned table names. Workaround is to periodically restart > the region servers. -- This message was sent by Atlassian JIRA (v6.1#6144)