Re: Thrift versions and generated code
See https://issues.apache.org/jira/browse/HBASE-14172 On Thu, Nov 12, 2015 at 11:46 AM, Josh Elserwrote: > Hi, > > In looking at https://issues.apache.org/jira/browse/HBASE-14800, I saw > that the current libthrift dependency on master was at 0.9.2, but the > generated code still has the 0.9.0 comments. > > Is there a reason for that? Should the libthrift version defined in the > poms be the de-facto version used by that version of HBase? > > Thanks. > > - Josh >
Re: Thrift versions and generated code
Ahh, thanks, gentlemen. Andrew Purtell wrote: Yeah, let's finish that. On Thu, Nov 12, 2015 at 11:49 AM, Ted Yuwrote: See https://issues.apache.org/jira/browse/HBASE-14172 On Thu, Nov 12, 2015 at 11:46 AM, Josh Elser wrote: Hi, In looking at https://issues.apache.org/jira/browse/HBASE-14800, I saw that the current libthrift dependency on master was at 0.9.2, but the generated code still has the 0.9.0 comments. Is there a reason for that? Should the libthrift version defined in the poms be the de-facto version used by that version of HBase? Thanks. - Josh
[jira] [Reopened] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable
[ https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-14498: > Master stuck in infinite loop when all Zookeeper servers are unreachable > > > Key: HBASE-14498 > URL: https://issues.apache.org/jira/browse/HBASE-14498 > Project: HBase > Issue Type: Bug > Components: master >Reporter: Y. SREENIVASULU REDDY >Assignee: Pankaj Kumar >Priority: Blocker > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4 > > Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, > HBASE-14498-V4.patch, HBASE-14498.patch > > > We met a weird scenario in our production environment. > In a HA cluster, > > Active Master (HM1) is not able to connect to any Zookeeper server (due to > > N/w breakdown on master machine network with Zookeeper servers). > {code} > 2015-09-26 15:24:47,508 INFO > [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] > zookeeper.ClientCnxn: Client session timed out, have not heard from server in > 33463ms for sessionid 0x104576b8dda0002, closing socket connection and > attempting reconnect > 2015-09-26 15:24:47,877 INFO > [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] > client.FourLetterWordMain: connecting to ZK-Host1 2181 > 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] > client.FourLetterWordMain: connecting to ZK-Host1 2181 > 2015-09-26 15:24:49,879 WARN > [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] > zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1 > 2015-09-26 15:24:49,879 INFO > [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] > zookeeper.ClientCnxn: Opening socket connection to server > ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown > error) > 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] > zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1 > 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] > zookeeper.ClientCnxn: Opening socket connection to server > ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown > error) > 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] > zookeeper.ClientCnxn: Client session timed out, have not heard from server in > 30023ms for sessionid 0x2045762cc710006, closing socket connection and > attempting reconnect > 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] > zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, > quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, > exception=org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /hbase/master > 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] > client.FourLetterWordMain: connecting to ZK-Host 2181 > 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] > zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host > 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] > zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. > Will not attempt to authenticate using SASL (unknown error) > {code} > > Since HM1 was not able to connect to any ZK, so session timeout didnt > > happen at Zookeeper server side and HM1 didnt abort. > > On Zookeeper session timeout standby master (HM2) registered himself as an > > active master. > > HM2 is keep on waiting for region server to report him as part of active > > master intialization. > {noformat} > 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting > for region servers count to settle; currently checked in 0, slept for 0 ms, > expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval > of 1500 ms. | > org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011) > --- > --- > 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting > for region servers count to settle; currently checked in 0, slept for 483913 > ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, > interval of 1500 ms. | > org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011) > {noformat} > > At other end, region servers are reporting to HM1 on 3 sec interval. Here > > region server retrieve master location from zookeeper only when they > > couldn't connect to Master (ServiceException). > Region Server will not report HM2 as per current design until unless HM1 > abort,so HM2 will exit(InitializationMonitor) and again wait for region > servers in loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14800) Expose checkAndMutate via Thrift2
Josh Elser created HBASE-14800: -- Summary: Expose checkAndMutate via Thrift2 Key: HBASE-14800 URL: https://issues.apache.org/jira/browse/HBASE-14800 Project: HBase Issue Type: Improvement Components: Thrift Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0 Had a user ask why checkAndMutate wasn't exposed via Thrift2. I see no good reason (since checkAndPut and checkAndDelete are already there), so let's add it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14803) Add some debug logs to StoreFileScanner
Jean-Marc Spaggiari created HBASE-14803: --- Summary: Add some debug logs to StoreFileScanner Key: HBASE-14803 URL: https://issues.apache.org/jira/browse/HBASE-14803 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Priority: Minor Fix For: 1.2.0 To validate some behaviors I had to add some logs into StoreFileScanner. I think it can be interesting for other people looking for debuging. So sharing the modifications here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format
Zhan Zhang created HBASE-14801: -- Summary: Enhance the Spark-HBase connector catalog with json format Key: HBASE-14801 URL: https://issues.apache.org/jira/browse/HBASE-14801 Project: HBase Issue Type: Improvement Reporter: Zhan Zhang Assignee: Zhan Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Please look at HBASE-14803
Very simple one. Just adding some debug logs. Helped me to debug something, so might help someone else. Thanks, JMS
Re: Thrift versions and generated code
Yeah, let's finish that. On Thu, Nov 12, 2015 at 11:49 AM, Ted Yuwrote: > See https://issues.apache.org/jira/browse/HBASE-14172 > > On Thu, Nov 12, 2015 at 11:46 AM, Josh Elser wrote: > > > Hi, > > > > In looking at https://issues.apache.org/jira/browse/HBASE-14800, I saw > > that the current libthrift dependency on master was at 0.9.2, but the > > generated code still has the 0.9.0 comments. > > > > Is there a reason for that? Should the libthrift version defined in the > > poms be the de-facto version used by that version of HBase? > > > > Thanks. > > > > - Josh > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Thrift versions and generated code
Hi, In looking at https://issues.apache.org/jira/browse/HBASE-14800, I saw that the current libthrift dependency on master was at 0.9.2, but the generated code still has the 0.9.0 comments. Is there a reason for that? Should the libthrift version defined in the poms be the de-facto version used by that version of HBase? Thanks. - Josh
[jira] [Created] (HBASE-14802) Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers
Ashu Pachauri created HBASE-14802: - Summary: Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers Key: HBASE-14802 URL: https://issues.apache.org/jira/browse/HBASE-14802 Project: HBase Issue Type: Bug Components: master Affects Versions: 2.0.0, 1.2.0, 1.2.1 Reporter: Ashu Pachauri Assignee: Ashu Pachauri The way dead servers are processed is that a ServerCrashProcedure is launched for a server after it is added to the dead servers list. Every time a server is added to the dead list, a counter "numProcessing" is incremented and it is decremented when a crash recovery procedure finishes. Since, adding a dead server and recovering it are two separate events, it can cause inconsistencies. If a master failover occurs in the middle of the crash recovery, the numProcessing counter resets but the ServerCrashProcedure is replayed by the new master. This causes the counter to go negative and makes the master think that dead servers are still in process of recovery. This has ramifications on the balancer that the balancer ceases to run after such a failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
Romil Choksi created HBASE-14804: Summary: HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute Key: HBASE-14804 URL: https://issues.apache.org/jira/browse/HBASE-14804 Project: HBase Issue Type: Bug Components: shell Affects Versions: 1.1.2 Reporter: Romil Choksi I am trying to create a new table and set the NORMALIZATION_ENABLED as true, but seems like the argument NORMALIZATION_ENABLED is being ignored. And the attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on that table hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 'true'} An argument ignored (unknown or overridden): NORMALIZATION_ENABLED 0 row(s) in 4.2670 seconds => Hbase::Table - test-table-4 hbase(main):021:0> desc 'test-table-4' Table test-table-4 is ENABLED test-table-4 COLUMN FAMILIES DESCRIPTION {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0430 seconds However, on doing an alter command on that table we can set the NORMALIZATION_ENABLED attribute for that table hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'} Unknown argument ignored: NORMALIZATION_ENABLED Updating all regions with the new schema... 1/1 regions updated. Done. 0 row(s) in 2.3640 seconds hbase(main):023:0> desc 'test-table-4' Table test-table-4 is ENABLED test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'} COLUMN FAMILIES DESCRIPTION {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 1 row(s) in 0.0190 seconds I think it would be better to have a single step process to enable normalization while creating the table itself, rather than a two step process to alter the table later on to enable normalization -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14805) status should show the master in shell
Enis Soztutar created HBASE-14805: - Summary: status should show the master in shell Key: HBASE-14805 URL: https://issues.apache.org/jira/browse/HBASE-14805 Project: HBase Issue Type: Improvement Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 1.2.0, 1.3.0 {{status 'simple'}} or {{'detailed'}} only shows the regionservers and regions, but not the active master. Actually, there is no way to know about the active masters from the shell it seems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14807) TestWALLockup is flakey
stack created HBASE-14807: - Summary: TestWALLockup is flakey Key: HBASE-14807 URL: https://issues.apache.org/jira/browse/HBASE-14807 Project: HBase Issue Type: Bug Components: flakey, test Reporter: stack Assignee: stack Fails frequently. Looks like this: {code} 2015-11-12 10:38:51,812 DEBUG [Time-limited test] regionserver.HRegion(3882): Found 0 recovered edits file(s) under /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d 2015-11-12 10:38:51,821 DEBUG [Time-limited test] regionserver.FlushLargeStoresPolicy(56): hbase.hregion.percolumnfamilyflush.size.lower.bound is not specified, use global config(16777216) instead 2015-11-12 10:38:51,880 DEBUG [Time-limited test] wal.WALSplitter(729): Wrote region seqId=/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d/recovered.edits/2.seqid to file, newSeqId=2, maxSeqId=0 2015-11-12 10:38:51,881 INFO [Time-limited test] regionserver.HRegion(868): Onlined c8694b53368f3301a8d370089120388d; next sequenceid=2 2015-11-12 10:38:51,994 ERROR [sync.1] wal.FSHLog$SyncRunner(1226): Error syncing, request close of WAL java.io.IOException: FAKE! Failed to replace a bad datanode...SYNC at org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.sync(TestWALLockup.java:162) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1222) at java.lang.Thread.run(Thread.java:745) 2015-11-12 10:38:51,997 DEBUG [Thread-4] regionserver.LogRoller(139): WAL roll requested 2015-11-12 10:38:52,019 DEBUG [flusher] regionserver.FlushLargeStoresPolicy(100): Since none of the CFs were above the size, flushing all. 2015-11-12 10:38:52,192 INFO [Thread-4] regionserver.TestWALLockup$1DodgyFSLog(129): LATCHED java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) at org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:245) 2015-11-12 10:39:18,609 INFO [main] regionserver.TestWALLockup(91): Cleaning test directory: /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.lang.Thread.run(Thread.java:745) {code} ... then times out after being locked up for 30 seconds. Writes 50+MB of logs while spinning. Reported as this: {code} --- Test set: org.apache.hadoop.hbase.regionserver.TestWALLockup --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 198.23 sec <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestWALLockup testLockupWhenSyncInMiddleOfZigZagSetup(org.apache.hadoop.hbase.regionserver.TestWALLockup) Time elapsed: 0.049 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at org.apache.log4j.Category.callAppenders(Category.java:205) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:155) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1386) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1352) at
Re: [VOTE] First release candidate for HBase 1.1.3 (RC0) is available
Thank you Josh for taking time to evaluate this release candidate. A reminder to others that the voting period is scheduled to expire in roughly 24 hours. I myself have not had time this week to evaluate this candidate properly, so I would like to extend the voting period through the weekend. If there are no objections I will tally the vote at 23:59 on Sunday, November 15, Pacific time. Thanks, Nick On Tuesday, November 10, 2015, Josh Elserwrote: > +1 (non-binding) > > * Built from source > * Ran tests (-PrunDevTests). o.a.h.h.r.TestRegionServerHostname was > problematic, might have just been me. > * Checked sigs/xsums > * Checked the compat report (thanks for posting it, Nick) > * Skimmed release notes looking for anything that might introduce new deps > for licensing concerns (found none) > > Nick Dimiduk wrote: > >> I'm happy to announce the first release candidate of HBase 1.1.3 >> (HBase-1.1. >> 3RC0) is available for download at >> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.3RC0/ >> >> Maven artifacts are also available in the staging repository >> https://repository.apache.org/content/repositories/orgapachehbase-1117 >> >> Artifacts are signed with my code signing subkey 0xAD9039071C3489BD, >> available in the Apache keys directory >> https://people.apache.org/keys/committer/ndimiduk.asc >> >> There's also a signed tag for this release at >> >> https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=16e905679e2dd5cb1b05ca8bc34a403e154a395f >> >> The detailed source and binary compatibility report vs 1.1.0 has been >> published for your review, at >> http://people.apache.org/~ndimiduk/1.1.0_1.1.3RC0_compat_report.html >> >> HBase 1.1.3 is the third patch release in the HBase 1.1 line, continuing >> on >> the theme of bringing a stable, reliable database to the Hadoop and NoSQL >> communities. This release includes over 120 bug fixes since the 1.1.2 >> release. Notable correctness fixes >> include HBASE-14474, HBASE-14591, HBASE-14224, >> HBASE-14431, HBASE-14407, HBASE-14313, HBASE-14621, HBASE-14501, and >> HBASE-13250. >> >> The full list of fixes included in this release is available at >> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753=12333152 >> and in the CHANGES.txt file included in the distribution. >> >> Please try out this candidate and vote +/-1 by 23:59 Pacific time on >> Friday, 2015-11-13 as to whether we should release these artifacts as >> HBase >> 1.1.3. >> >> Thanks, >> Nick >> >>