[jira] [Created] (HBASE-10627) A logic mistake in HRegionServer isHealthy
Liu Shaohui created HBASE-10627: --- Summary: A logic mistake in HRegionServer isHealthy Key: HBASE-10627 URL: https://issues.apache.org/jira/browse/HBASE-10627 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Priority: Minor After visiting the isHealthy in HRegionServer, I think there is a logic mistake. {code} // Verify that all threads are alive if (!(leases.isAlive() && cacheFlusher.isAlive() && hlogRoller.isAlive() && this.compactionChecker.isAlive()) < logic wrong here && this.periodicFlusher.isAlive()) { stop("One or more threads are no longer alive -- stop"); return false; } {code} which should be {code} // Verify that all threads are alive if (!(leases.isAlive() && cacheFlusher.isAlive() && hlogRoller.isAlive() && this.compactionChecker.isAlive() && this.periodicFlusher.isAlive())) { stop("One or more threads are no longer alive -- stop"); return false; } {code} Please finger out if i am wrong. Thx -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10628) Fix semantic inconsistency among methods which are exposed to client
Feng Honghua created HBASE-10628: Summary: Fix semantic inconsistency among methods which are exposed to client Key: HBASE-10628 URL: https://issues.apache.org/jira/browse/HBASE-10628 Project: HBase Issue Type: Bug Components: Client, master Reporter: Feng Honghua Assignee: Feng Honghua This serves as a placeholder jira for inconsistency of client methods such as listTables / tableExists / getTableDescriptor described in HBASE-10584 and HBASE-10595, and also some other semantic fix. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Order of the fields in REST JSon calls?
JSON libs are pretty inconsistent about some things, like what to do if there is >1 instance of the same field (some pick one, others create an array). anything that uses a hashmap internally may pick an ordering based on the hash keys, while other libs do things that make no sense whatsoever: http://steveloughran.blogspot.co.uk/2012/02/just-because-you-can-rewrite-your.html I don't think there is a good solution here except avoid some troublespots (those duplicate entries), warn about orderings, maybe even have tests for that. On 26 February 2014 19:09, Jean-Marc Spaggiari wrote: > Hum. I see > > > https://issues.apache.org/jira/browse/HBASE-9435?focusedCommentId=13782477&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13782477 > > I will take a quick look to see if we have very simple option to bypass > current issue with 0.94 without having to make this compatible. Else, will > just indicate this link to 10617. > > > 2014-02-26 14:04 GMT-05:00 Andrew Purtell : > > > On Wed, Feb 26, 2014 at 10:55 AM, Jean-Marc Spaggiari < > > jean-m...@spaggiari.org> wrote: > > > > > I'm not sure. > > > > > > Here is a comment from the patch: "The patch is backward compatible > > except > > > for StorageClusterStatusModel, which is broken anyway. It only shows > one > > > node in the liveNodes field." So it might be? > > > > > > 2014-02-26 13:48 GMT-05:00 Ted Yu : > > > > > > > The API changes from HBASE-9435 are incompatible changes, right ? > > > > > > > Yes > > > > > > -- > > Best regards, > > > >- Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HBASE-10629) Fix incorrect handling of IE that restores current thread's interrupt status within while/for loops
Feng Honghua created HBASE-10629: Summary: Fix incorrect handling of IE that restores current thread's interrupt status within while/for loops Key: HBASE-10629 URL: https://issues.apache.org/jira/browse/HBASE-10629 Project: HBase Issue Type: Bug Components: Client, master, regionserver, Replication Reporter: Feng Honghua Assignee: Feng Honghua There are about three kinds of typical incorrect handling of IE thrown during sleep() in current code base: # Shadow it totally -- Has been fixed by HBASE-10497 # Restore current thread's interrupt status implicitly within while/for loops (Threads.sleep() being called within while/for loops) -- Has been fixed by HBASE-10516 # Restore current thread's interrupt status explicitly within while/for loops (directly interrupt current thread within while/for loops) There are still places with the last kind of handling error, and as HBASE-10497/HBASE-10516, the last kind of errors should be fixed according to their real scenarios case by case. This is created to serve as a parent jira to fix the last kind errors in a systematic manner -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-9469) Synchronous replication
[ https://issues.apache.org/jira/browse/HBASE-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua resolved HBASE-9469. - Resolution: Won't Fix It has less value than expected as described in last comment > Synchronous replication > --- > > Key: HBASE-9469 > URL: https://issues.apache.org/jira/browse/HBASE-9469 > Project: HBase > Issue Type: New Feature >Reporter: Feng Honghua > > Scenario: > A/B clusters with master-master replication, client writes to A cluster and A > pushes all writes to B cluster, and when A cluster is down, client switches > writing to B cluster. > But the client's write switch is unsafe due to the replication between A/B is > asynchronous: a delete to B cluster which aims to delete a put written > earlier can fail due to that put is written to A cluster and isn't > successfully pushed to B before A is down. It can be worse if this delete is > collected(flush and then major compact occurs) before A cluster is up and > that put is eventually pushed to B, the put won't ever be deleted. > Can we provide per-table/per-peer synchronous replication which ships the > according hlog entry of write before responsing write success to client? By > this we can guarantee the client that all write requests for which he got > success response when he wrote to A cluster must already have been in B > cluster as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10630) NullPointerException in ConnectionManager$HConnectionImplementation.locateRegionInMeta() due to missing region info
Ted Yu created HBASE-10630: -- Summary: NullPointerException in ConnectionManager$HConnectionImplementation.locateRegionInMeta() due to missing region info Key: HBASE-10630 URL: https://issues.apache.org/jira/browse/HBASE-10630 Project: HBase Issue Type: Bug Affects Versions: 0.98.0 Reporter: Ted Yu Assignee: Ted Yu During Load And Verify With Chaos Monkey test, we observed: {code} 2014-02-26 16:28:17,964|beaver.machine|INFO|2014-02-26 16:28:17,964 INFO [main] mapreduce.Job: map 71% reduce 0% 2014-02-26 16:28:20,073|beaver.machine|INFO|2014-02-26 16:28:20,073 INFO [main] mapreduce.Job: map 82% reduce 0% 2014-02-26 16:28:20,077|beaver.machine|INFO|2014-02-26 16:28:20,077 INFO [main] mapreduce.Job: Task Id : attempt_1393409213482_0015_m_68_0, Status : FAILED 2014-02-26 16:28:20,099|beaver.machine|INFO|Error: java.lang.NullPointerException 2014-02-26 16:28:20,100|beaver.machine|INFO|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1175) 2014-02-26 16:28:20,100|beaver.machine|INFO|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1038) 2014-02-26 16:28:20,100|beaver.machine|INFO|at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionAll(ConnectionManager.java:986) 2014-02-26 16:28:20,101|beaver.machine|INFO|at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:418) 2014-02-26 16:28:20,101|beaver.machine|INFO|at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:343) 2014-02-26 16:28:20,101|beaver.machine|INFO|at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:296) 2014-02-26 16:28:20,102|beaver.machine|INFO|at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1024) 2014-02-26 16:28:20,102|beaver.machine|INFO|at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1298) 2014-02-26 16:28:20,102|beaver.machine|INFO|at org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$LoadMapper.cleanup(IntegrationTestLoadAndVerify.java:188) 2014-02-26 16:28:20,102|beaver.machine|INFO|at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148) 2014-02-26 16:28:20,103|beaver.machine|INFO|at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 2014-02-26 16:28:20,103|beaver.machine|INFO|at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) 2014-02-26 16:28:20,103|beaver.machine|INFO|at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) 2014-02-26 16:28:20,103|beaver.machine|INFO|at java.security.AccessController.doPrivileged(Native Method) 2014-02-26 16:28:20,104|beaver.machine|INFO|at javax.security.auth.Subject.doAs(Subject.java:396) 2014-02-26 16:28:20,104|beaver.machine|INFO|at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 2014-02-26 16:28:20,104|beaver.machine|INFO|at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 2014-02-26 16:28:20,105|beaver.machine|INFO| 2014-02-26 16:28:20,105|beaver.machine|INFO|Container killed by the ApplicationMaster. {code} Here is related code: {code} // convert the row result into the HRegionLocation we need! location = MetaReader.getRegionLocations(regionInfoRow); HRegionInfo regionInfo = location.getRegionLocation().getRegionInfo(); if (regionInfo == null) { throw new IOException("HRegionInfo was null or empty in " + {code} null check should be performed against location and location.getRegionLocation(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Why doesn't KeyValue.equals/CellComparator compare the values?
On Wed, Feb 26, 2014 at 8:31 PM, Matt Corgan wrote: > But maybe one of the committers could add a sentence to emphasize that > value is excluded. > > We should underline that data is not considered comparing Cells (KeyValues). Apart from the fact that it could make for some interesting performance issues, the system isn't plumbed for dealing with coordinates that differ in their value only. Rather, the mvcc/sequenceid is used splitting Cells whose coordinates are otherwise the same). What was your expectation mighty Cosmin? What you think HBase should do with values that differ in value only? Thanks, St.Ack
[jira] [Resolved] (HBASE-10630) NullPointerException in ConnectionManager$HConnectionImplementation.locateRegionInMeta() due to missing region info
[ https://issues.apache.org/jira/browse/HBASE-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-10630. Resolution: Fixed Hadoop Flags: Reviewed Integrated to 10070 branch. > NullPointerException in > ConnectionManager$HConnectionImplementation.locateRegionInMeta() due to > missing region info > --- > > Key: HBASE-10630 > URL: https://issues.apache.org/jira/browse/HBASE-10630 > Project: HBase > Issue Type: Sub-task >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 10630-v1.txt > > > During Load And Verify With Chaos Monkey test, we observed: > {code} > 2014-02-26 16:28:17,964|beaver.machine|INFO|2014-02-26 16:28:17,964 INFO > [main] mapreduce.Job: map 71% reduce 0% > 2014-02-26 16:28:20,073|beaver.machine|INFO|2014-02-26 16:28:20,073 INFO > [main] mapreduce.Job: map 82% reduce 0% > 2014-02-26 16:28:20,077|beaver.machine|INFO|2014-02-26 16:28:20,077 INFO > [main] mapreduce.Job: Task Id : attempt_1393409213482_0015_m_68_0, Status > : FAILED > 2014-02-26 16:28:20,099|beaver.machine|INFO|Error: > java.lang.NullPointerException > 2014-02-26 16:28:20,100|beaver.machine|INFO|at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1175) > 2014-02-26 16:28:20,100|beaver.machine|INFO|at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1038) > 2014-02-26 16:28:20,100|beaver.machine|INFO|at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionAll(ConnectionManager.java:986) > 2014-02-26 16:28:20,101|beaver.machine|INFO|at > org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:418) > 2014-02-26 16:28:20,101|beaver.machine|INFO|at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:343) > 2014-02-26 16:28:20,101|beaver.machine|INFO|at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:296) > 2014-02-26 16:28:20,102|beaver.machine|INFO|at > org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1024) > 2014-02-26 16:28:20,102|beaver.machine|INFO|at > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1298) > 2014-02-26 16:28:20,102|beaver.machine|INFO|at > org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$LoadMapper.cleanup(IntegrationTestLoadAndVerify.java:188) > 2014-02-26 16:28:20,102|beaver.machine|INFO|at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:148) > 2014-02-26 16:28:20,103|beaver.machine|INFO|at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > 2014-02-26 16:28:20,103|beaver.machine|INFO|at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > 2014-02-26 16:28:20,103|beaver.machine|INFO|at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > 2014-02-26 16:28:20,103|beaver.machine|INFO|at > java.security.AccessController.doPrivileged(Native Method) > 2014-02-26 16:28:20,104|beaver.machine|INFO|at > javax.security.auth.Subject.doAs(Subject.java:396) > 2014-02-26 16:28:20,104|beaver.machine|INFO|at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-02-26 16:28:20,104|beaver.machine|INFO|at > org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) > 2014-02-26 16:28:20,105|beaver.machine|INFO| > 2014-02-26 16:28:20,105|beaver.machine|INFO|Container killed by the > ApplicationMaster. > {code} > Here is related code: > {code} >// convert the row result into the HRegionLocation we need! >location = MetaReader.getRegionLocations(regionInfoRow); >HRegionInfo regionInfo = > location.getRegionLocation().getRegionInfo(); >if (regionInfo == null) { > throw new IOException("HRegionInfo was null or empty in " + > {code} > null check should be performed against location and > location.getRegionLocation(). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10631) Avoid extra seek on FileLink open
Matteo Bertozzi created HBASE-10631: --- Summary: Avoid extra seek on FileLink open Key: HBASE-10631 URL: https://issues.apache.org/jira/browse/HBASE-10631 Project: HBase Issue Type: Bug Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10631-v0.patch There is an extra seek(0) on FileLink open, that we can skip -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10632) Region lost in limbo after ArrayIndexOutOfBoundsException during assignment
Nick Dimiduk created HBASE-10632: Summary: Region lost in limbo after ArrayIndexOutOfBoundsException during assignment Key: HBASE-10632 URL: https://issues.apache.org/jira/browse/HBASE-10632 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: hbase-10070 Reporter: Nick Dimiduk Assignee: Enis Soztutar Fix For: 0.99.0, hbase-10070 Discovered while running IntegrationTestBigLinkedList. Region 24d68aa7239824e42390a77b7212fcbf is scheduled for move from hor13n19 to hor13n13. During the process an exception is thrown. {noformat} 2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:6-4] master.RegionStates: Transitioning {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} will be handled by SSH for hor13n19.gq1.ygridcore.net,60020,1393341563552 2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:6-4] handler.ServerShutdownHandler: Reassigning 7 region(s) that hor13n19.gq1.ygridcore.net,60020,1393341563552 was carrying (and 0 regions(s) that were opening on this server) 2014-02-25 15:30:42,613 INFO [MASTER_SERVER_OPERATIONS-hor13n12:6-4] handler.ServerShutdownHandler: Reassigning region with rs = {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} and deleting zk node if exists 2014-02-25 15:30:42,623 INFO [MASTER_SERVER_OPERATIONS-hor13n12:6-4] master.RegionStates: Transitioned {24d68aa7239824e42390a77b7212fcbf state=OPENING, ts=1393342207107, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} to {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} 2014-02-25 15:30:42,623 DEBUG [AM.ZK.Worker-pool2-t46] master.AssignmentManager: Znode IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. deleted, state: {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} ... 2014-02-25 15:30:43,993 ERROR [MASTER_SERVER_OPERATIONS-hor13n12:6-4] executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.(BaseLoadBalancer.java:250) at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.createCluster(BaseLoadBalancer.java:921) at org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer.roundRobinAssignment(BaseLoadBalancer.java:860) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2482) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:282) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} After that, region is left in limbo and is never reassigned. {noformat} 2014-02-25 15:35:11,581 INFO [FifoRpcScheduler.handler1-thread-6] master.HMaster: Client=hrt_qa//68.142.246.29 move hri=IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf., src=hor13n19.gq1.ygridcore.net,60020,1393341563552, dest=hor13n13.gq1.ygridcore.net,60020,139334275, running balancer 2014-02-25 15:35:11,581 INFO [FifoRpcScheduler.handler1-thread-6] master.AssignmentManager: Ignored moving region not assigned: {ENCODED => 24d68aa7239824e42390a77b7212fcbf, NAME => 'IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf.', STARTKEY => '\x80\x06\x1A', ENDKEY => ''}, {24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552} ... 2014-02-25 15:35:26,586 DEBUG [hor13n12.gq1.ygridcore.net,6,1393341917402-BalancerChore] master.HMaster: Not running balancer because 1 region(s) in transition: {24d68aa7239824e42390a77b7212fcbf={24d68aa7239824e42390a77b7212fcbf state=OFFLINE, ts=1393342242623, server=hor13n19.gq1.ygridcore.net,60020,1393341563552}} ... 2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] master.HMaster: Client=hrt_qa//68.142.246.29 unassign IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf. in current location if it is online and reassign.force=false 2014-02-25 15:35:51,945 DEBUG [FifoRpcScheduler.handler1-thread-16] master.AssignmentManager: Starting unassign of IntegrationTestBigLinkedList,\x80\x06\x1A,1393342105093.24d68aa7239824e42390a77b7212fcbf
[jira] [Created] (HBASE-10633) StoreFileRefresherChore throws ConcurrentModificationException sometimes
Devaraj Das created HBASE-10633: --- Summary: StoreFileRefresherChore throws ConcurrentModificationException sometimes Key: HBASE-10633 URL: https://issues.apache.org/jira/browse/HBASE-10633 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das Assignee: Devaraj Das -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10634) Multiget doesn't fully work
Devaraj Das created HBASE-10634: --- Summary: Multiget doesn't fully work Key: HBASE-10634 URL: https://issues.apache.org/jira/browse/HBASE-10634 Project: HBase Issue Type: Sub-task Reporter: Devaraj Das -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10633) StoreFileRefresherChore throws ConcurrentModificationException sometimes
[ https://issues.apache.org/jira/browse/HBASE-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das resolved HBASE-10633. - Resolution: Fixed Fix Version/s: hbase-10070 Committed. Thanks for the quick review [~enis]. > StoreFileRefresherChore throws ConcurrentModificationException sometimes > > > Key: HBASE-10633 > URL: https://issues.apache.org/jira/browse/HBASE-10633 > Project: HBase > Issue Type: Sub-task >Reporter: Devaraj Das >Assignee: Devaraj Das > Fix For: hbase-10070 > > Attachments: 10633-1.txt > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10635) thrift#TestThriftServer fails due to TTL validity check
Ted Yu created HBASE-10635: -- Summary: thrift#TestThriftServer fails due to TTL validity check Key: HBASE-10635 URL: https://issues.apache.org/jira/browse/HBASE-10635 Project: HBase Issue Type: Test Reporter: Ted Yu >From >https://builds.apache.org/job/HBase-TRUNK/4960/testReport/junit/org.apache.hadoop.hbase.thrift/TestThriftServer/testAll/ > : {code} IOError(message:org.apache.hadoop.hbase.DoNotRetryIOException: TTL for column family columnA must be positive. Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks at org.apache.hadoop.hbase.master.HMaster.sanityCheckTableDescriptor(HMaster.java:1824) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1876) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40470) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2016) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) ) at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.createTable(ThriftServerRunner.java:971) at org.apache.hadoop.hbase.thrift.TestThriftServer.createTestTables(TestThriftServer.java:224) at org.apache.hadoop.hbase.thrift.TestThriftServer.doTestTableCreateDrop(TestThriftServer.java:140) at org.apache.hadoop.hbase.thrift.TestThriftServer.doTestTableCreateDrop(TestThriftServer.java:136) at org.apache.hadoop.hbase.thrift.TestThriftServer.testAll(TestThriftServer.java:115) {code} Looks like ColumnDescriptor contains TTL of -1. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10636) HBaseAdmin.deleteTable isn't 'really' synchronous in that still some cleanup in HMaster after client thinks deleteTable() succeeds
Feng Honghua created HBASE-10636: Summary: HBaseAdmin.deleteTable isn't 'really' synchronous in that still some cleanup in HMaster after client thinks deleteTable() succeeds Key: HBASE-10636 URL: https://issues.apache.org/jira/browse/HBASE-10636 Project: HBase Issue Type: Sub-task Components: Client, master Reporter: Feng Honghua Assignee: Feng Honghua In HBaseAdmin.deleteTable(): {code} public void deleteTable(final TableName tableName) throws IOException { // Wait until all regions deleted for (int tries = 0; tries < (this.numRetries * this.retryLongerMultiplier); tries++) { // let us wait until hbase:meta table is updated and // HMaster removes the table from its HTableDescriptors if (values == null || values.length == 0) { tableExists = false; GetTableDescriptorsResponse htds; MasterKeepAliveConnection master = connection.getKeepAliveMasterService(); try { GetTableDescriptorsRequest req = RequestConverter.buildGetTableDescriptorsRequest(tableName); htds = master.getTableDescriptors(null, req); } catch (ServiceException se) { throw ProtobufUtil.getRemoteException(se); } finally { master.close(); } tableExists = !htds.getTableSchemaList().isEmpty(); if (!tableExists) { break; } } } {code} client thinks deleteTable succeeds once it can't retrieve back the tableDescriptor But in HMaster, the DeleteTableHandler which really deletes the table: {code} protected void handleTableOperation(List regions) throws IOException, KeeperException { // 1. Wait because of region in transition // 2. Remove regions from META LOG.debug("Deleting regions from META"); MetaEditor.deleteRegions(this.server.getCatalogTracker(), regions); // 3. Move the table in /hbase/.tmp MasterFileSystem mfs = this.masterServices.getMasterFileSystem(); Path tempTableDir = mfs.moveTableToTemp(tableName); try { // 4. Delete regions from FS (temp directory) FileSystem fs = mfs.getFileSystem(); for (HRegionInfo hri: regions) { LOG.debug("Archiving region " + hri.getRegionNameAsString() + " from FS"); HFileArchiver.archiveRegion(fs, mfs.getRootDir(), tempTableDir, new Path(tempTableDir, hri.getEncodedName())); } // 5. Delete table from FS (temp directory) if (!fs.delete(tempTableDir, true)) { LOG.error("Couldn't delete " + tempTableDir); } LOG.debug("Table '" + tableName + "' archived!"); } finally { // 6. Update table descriptor cache LOG.debug("Removing '" + tableName + "' descriptor."); this.masterServices.getTableDescriptors().remove(tableName); // 7. Clean up regions of the table in RegionStates. LOG.debug("Removing '" + tableName + "' from region states."); states.tableDeleted(tableName); // 8. If entry for this table in zk, and up in AssignmentManager, remove it. LOG.debug("Marking '" + tableName + "' as deleted."); am.getZKTable().setDeletedTable(tableName); } if (cpHost != null) { cpHost.postDeleteTableHandler(this.tableName); } } {code} Removing regions out of RegionStates, Marking table deleted from ZK, Calling coprocessor's postDeleteTableHandler are all after the table is removed from TableDescriptor cache So client code relying on RegionStates/ZKTable/CP being cleaned up after deleteTable() possibly fail, if client requests hit HMaster before those three cleanup are done... Actually when I add some sleep such as 200ms after below line to simulate a possible slow-running HMaster {code} this.masterServices.getTableDescriptors().remove(tableName); {code} Some unit tests(such as moveRegion / confirming postDeleteTable CP immediately after deleteTable) can't pass no longer -- This message was sent by Atlassian JIRA (v6.1.5#6160)