[jira] [Commented] (HBASE-5843) Improve HBase MTTR - Mean Time To Recover
[ https://issues.apache.org/jira/browse/HBASE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560472#comment-13560472 ] nkeywal commented on HBASE-5843: bq. What is the application bug(AB) mentioned in your design doc? Do you mean hbase bug? or hbase client application code bug? Mainly HBase, but it could be as well a coprocessor issue. HBase can be configured to stop the regionserver if a coprocessor sends unexpected exceptions, but it's quite easy to write buggy stuff, like a coprocessor that takes resources without freeing them. Here you may need to stop the region server. bq. If it is hbase client application code bug, does that need stop/start region server to fix the issue? For a pure client (i.e. a user of the hbase.client package), it would be an HBase bug imho: HBase/a regionserver should be resistant to any client behavior. For a coprocessor, it's client code executed within the regionserver process. Thanks to Java, many coprocessors bugs will have a limited effect, but as said above there are some cases that cannot be handled simply. bq. If it is hbase code bug, do you refer to hbase bug that cause region server einter some bad state like deadlock, and so on? I think that could benefit from restarting region server to fix the problem. Yes. Improve HBase MTTR - Mean Time To Recover - Key: HBASE-5843 URL: https://issues.apache.org/jira/browse/HBASE-5843 Project: HBase Issue Type: Umbrella Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal A part of the approach is described here: https://docs.google.com/document/d/1z03xRoZrIJmg7jsWuyKYl6zNournF_7ZHzdi0qz_B4c/edit The ideal target is: - failure impact client applications only by an added delay to execute a query, whatever the failure. - this delay is always inferior to 1 second. We're not going to achieve that immediately... Priority will be given to the most frequent issues. Short term: - software crash - standard administrative tasks as stop/start of a cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6816) [WINDOWS] line endings on checkout for .sh files
[ https://issues.apache.org/jira/browse/HBASE-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560507#comment-13560507 ] nkeywal commented on HBASE-6816: Installed the patch on unix windows, seems ok. I was surprised because the example in the git documentation explicitly states binary for png jpg. So does https://github.com/Countly/countly-sdk-android/blob/master/.gitattributes for example. So I changed architecture.gif on windows, committed, then read it from Linux. I found my changes. So I'm +1 :-). [WINDOWS] line endings on checkout for .sh files Key: HBASE-6816 URL: https://issues.apache.org/jira/browse/HBASE-6816 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-16_v1.patch, hbase-6816_v1.patch On code checkout from svn or git, we need to ensure that the line endings for .sh files are LF, so that they work with cygwin. This is important for getting src/saveVersion.sh to work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6829) [WINDOWS] Tests should ensure that HLog is closed
[ https://issues.apache.org/jira/browse/HBASE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560519#comment-13560519 ] nkeywal commented on HBASE-6829: fwiw, TestDefaultCompactSelection works for me on Windows before and after the patch. While the patch fixes TestCacheOnWriteInSchema. But what's in TestDefaultCompactSelection makes sense. There are some unused variable in TestDefaultCompactSelection, but they were there before the patch: long tooBig = maxSize + 1; Path oldLogDir = new Path(basedir, HConstants.HREGION_OLDLOGDIR_NAME); So +1 from me as well. [WINDOWS] Tests should ensure that HLog is closed - Key: HBASE-6829 URL: https://issues.apache.org/jira/browse/HBASE-6829 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-6829_v1-0.94.patch, hbase-6829_v1-trunk.patch, hbase-6829_v2-0.94.patch, hbase-6829_v2-trunk.patch, hbase-6829_v3-0.94.patch, hbase-6829_v3-trunk.patch, hbase-6829_v4-trunk.patch, hbase-6829_v4-trunk.patch TestCacheOnWriteInSchema and TestCompactSelection fails with {code} java.io.IOException: Target HLog directory already exists: ./target/test-data/2d814e66-75d3-4c1b-92c7-a49d9972e8fd/TestCacheOnWriteInSchema/logs at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:385) at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:316) at org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema.setUp(TestCacheOnWriteInSchema.java:162) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6832) [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing
[ https://issues.apache.org/jira/browse/HBASE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560529#comment-13560529 ] nkeywal commented on HBASE-6832: for EnvironmentEdgeManager, should we not always initialize the 'time' at 10 or something like this? I can imagine other piece of code doing minus something. Initializing it to something high enough could save us from some burden later. For the fix, except this non critical comment above, I'm ok, but I wonder if the root issue (strange time counter on windows) won't shows up in production. That's another subject, though. [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing Key: HBASE-6832 URL: https://issues.apache.org/jira/browse/HBASE-6832 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-6832_v1-0.94.patch, hbase-6832_v1-trunk.patch, hbase-6832_v4-0.94.patch, hbase-6832_v4-trunk.patch, hbase-6832_v5-trunk.patch TestRegionObserverBypass.testMulti() fails with {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.checkRowAndDelete(TestRegionObserverBypass.java:173) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.testMulti(TestRegionObserverBypass.java:166) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6825) [WINDOWS] Java NIO socket channels does not work with Windows ipv6
[ https://issues.apache.org/jira/browse/HBASE-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560536#comment-13560536 ] nkeywal commented on HBASE-6825: If I'm not mistaken, the test uses a fixed port 8502, this should be changed, if not we can have random failures when running the test suites. For the issue itself, why should it not make it to the main code? I mean, we test that, on windows, a critical feature works. If not we will have issues. This should be in main, not in test, no? I like the way the test is done, btw: we don't explicitly test the jdk1.7, so it means that if it's fixed in a later jdk 1.6 patch the code will still be right. And actually, this seems to be fixed in 1.6 u34 says http://www.oracle.com/technetwork/java/javase/documentation/overview-156328.html. If it's the case, we could put it as a requirement and we're done (that's acceptable for 0.96 imho. May be not for 0.94). [WINDOWS] Java NIO socket channels does not work with Windows ipv6 -- Key: HBASE-6825 URL: https://issues.apache.org/jira/browse/HBASE-6825 Project: HBase Issue Type: Sub-task Affects Versions: 0.94.3, 0.96.0 Environment: JDK6 on windows for ipv6. Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-6825_v3-0.94.patch, hbase-6825_v3-trunk.patch While running the test TestAdmin.testCheckHBaseAvailableClosesConnection(), I noticed that it takes very long, since it sleeps for 2sec * 500, because of zookeeper retries. The root cause of the problem is that ZK uses Java NIO to create ServerSorcket's from ServerSocketChannels. Under windows, the ipv4 and ipv6 is implemented independently, and Java seems that it cannot reuse the same socket channel for both ipv4 and ipv6 sockets. We are getting java.net.SocketException: Address family not supported by protocol family exceptions. When, ZK client resolves localhost, it gets both v4 127.0.0.1 and v6 ::1 address, but the socket channel cannot bind to both v4 and v6. The problem is reported as: http://bugs.sun.com/view_bug.do?bug_id=6230761 http://stackoverflow.com/questions/1357091/binding-an-ipv6-server-socket-on-windows Although the JDK bug is reported as resolved, I have tested with jdk1.6.0_33 without any success. Although JDK7 seems to have fixed this problem. In ZK, we can replace the ClientCnxnSocket implementation from ClientCnxnSocketNIO to a non-NIO one, but I am not sure that would be the way to go. Disabling ipv6 resolution of localhost is one other approach. I'll test it to see whether it will be any good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6821) [WINDOWS] .META. table name causes file system problems in windows
[ https://issues.apache.org/jira/browse/HBASE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560545#comment-13560545 ] nkeywal commented on HBASE-6821: Hum, can't test on Windows, I can't delete the bad directory of the initial test now. It works on linux. It's nothing else than a hack, but it's simple and we're in the tests; so to me it's ok. The root issue it much more of a problem :-(, and can't be fixed without discussions (and thinking :-)). In the meantime, we can commit this patch imho. +1 so. [WINDOWS] .META. table name causes file system problems in windows -- Key: HBASE-6821 URL: https://issues.apache.org/jira/browse/HBASE-6821 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-4388-root.dir.tgz, hbase-6821_v2_0.94.patch, hbase-6821_v2-trunk.patch, TestMetaMigrationConvertToPB.tgz TestMetaMigrationRemovingHTD untars a cluster dir having a .META. subdirectory. This causes mvn clean to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5664) CP hooks in Scan flow for fast forward when filter filters out a row
[ https://issues.apache.org/jira/browse/HBASE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560560#comment-13560560 ] Anoop Sam John commented on HBASE-5664: --- [~apurtell] Comments from your side? CP hooks in Scan flow for fast forward when filter filters out a row Key: HBASE-5664 URL: https://issues.apache.org/jira/browse/HBASE-5664 Project: HBase Issue Type: Improvement Components: Coprocessors, Filters Affects Versions: 0.92.1 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.96.0, 0.94.5 Attachments: HBASE-5664_94.patch, HBASE-5664_94_V2.patch, HBASE-5664_Trunk.patch, HBASE-5664_Trunk_V2.patch In HRegion.nextInternal(int limit, String metric) We have while(true) loop so as to fetch a next result which satisfies filter condition. When Filter filters out the current fetched row we call nextRow(byte [] currentRow) before going with the next row. {code} if (results.isEmpty() || filterRow()) { // this seems like a redundant step - we already consumed the row // there're no left overs. // the reasons for calling this method are: // 1. reset the filters. // 2. provide a hook to fast forward the row (used by subclasses) nextRow(currentRow); {code} // 2. provide a hook to fast forward the row (used by subclasses) We can provide same feature of fast forward support for the CP also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7637) hbase-hadoop1-compat conflicts with -Dhadoop.profile=2.0
[ https://issues.apache.org/jira/browse/HBASE-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560565#comment-13560565 ] nkeywal commented on HBASE-7637: You're right, it's not much of an issue. The patch works with 2.0 and 1.0. Don't forget the clean between the two as I initially did :-) +1 so. hbase-hadoop1-compat conflicts with -Dhadoop.profile=2.0 Key: HBASE-7637 URL: https://issues.apache.org/jira/browse/HBASE-7637 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.96.0 Reporter: nkeywal Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7637-0.patch I'm unclear on the root cause / fix. Here is the scenario: {noformat} mvn clean package install -Dhadoop.profile=2.0 -DskipTests bin/start-hbase.sh {noformat} fails with {noformat} Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.metrics2.lib.MetricMutable at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) {noformat} doing {noformat} rm -rf hbase-hadoop1-compat/target/ {noformat} makes it work. In the pom.xml, we never reference hadoop2-compat. But doing so does not help: hadoop1-compat is compiled and takes precedence over hadoop2... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition and snapshot data loss
[ https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560637#comment-13560637 ] Jonathan Hsieh commented on HBASE-7643: --- Looks good. Just have a few comment suggestions. Mention cleaner dir deletion race? {code} + if (i 0) { +// Ensure that the archive directory exists +// (we're in a retry loop, so don't worry too much about the exception) +try { {code} Is there corresponding java doc that needs to be changed? {code} public boolean moveAndClose(Path dest) throws IOException { this.close(); Path p = this.getPath(); - return !fs.rename(p, dest); + return fs.rename(p, dest); } {code} nit: comment about 1 ms sleep between cleaner runs.. {code} +Stoppable stoppable = new StoppableImplementation(); +HFileCleaner cleaner = new HFileCleaner(1, stoppable, conf, fs, archiveDir); + {code} I buy this but needed to think a bit to figure out why this is correct. Add a comment? (the invariant is that the file is in on or the other place, and if it failes in one we check the other). {code} + try { +HFileArchiver.archiveRegion(conf, fs, rootDir, sourceRegionDir.getParent(), sourceRegionDir); +assertTrue(fs.exists(archiveFile)); +assertFalse(fs.exists(sourceFile)); + } catch (IOException e) { +assertFalse(fs.exists(archiveFile)); +assertTrue(fs.exists(sourceFile)); + } {code} HFileArchiver.resolveAndArchive() race condition and snapshot data loss --- Key: HBASE-7643 URL: https://issues.apache.org/jira/browse/HBASE-7643 Project: HBase Issue Type: Bug Affects Versions: hbase-6055, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch * The master have an hfile cleaner thread (that is responsible for cleaning the /hbase/.archive dir) ** /hbase/.archive/table/region/family/hfile ** if the family/region/family directory is empty the cleaner removes it * The master can archive files (from another thread, e.g. DeleteTableHandler) * The region can archive files (from another server/process, e.g. compaction) The simplified file archiving code looks like this: {code} HFileArchiver.resolveAndArchive(...) { // ensure that the archive dir exists fs.mkdir(archiveDir); // move the file to the archiver success = fs.rename(originalPath/fileName, archiveDir/fileName) // if the rename is failed, delete the file without archiving if (!success) fs.delete(originalPath/fileName); } {code} Since there's no synchronization between HFileArchiver.resolveAndArchive() and the cleaner run (different process, thread, ...) you can end up in the situation where you are moving something in a directory that doesn't exists. {code} fs.mkdir(archiveDir); // HFileCleaner chore starts at this point // and the archiveDirectory that we just ensured to be present gets removed. // The rename at this point will fail since the parent directory is missing. success = fs.rename(originalPath/fileName, archiveDir/fileName) {code} The bad thing of deleting the file without archiving is that if you've a snapshot that relies on the file to be present, or you've a clone table that relies on that file is that you're losing data. Possible solutions * Create a ZooKeeper lock, to notify the master (Hey I'm archiving something, wait a bit) * Add a RS - Master call to let the master removes files and avoid this kind of situations * Avoid to remove empty directories from the archive if the table exists or is not disabled * Add a try catch around the fs.rename The last one, the easiest one, looks like: {code} for (int i = 0; i retries; ++i) { // ensure archive directory to be present fs.mkdir(archiveDir); // possible race - // try to archive file success = fs.rename(originalPath/fileName, archiveDir/fileName); if (success) break; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel scanner seek in StoreScanner's constructor
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang xie updated HBASE-7495: - Attachment: HBASE-7495.txt parallel scanner seek in StoreScanner's constructor --- Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: liang xie Assignee: liang xie Attachments: HBASE-7495.txt, HBASE-7495.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel scanner seek in StoreScanner's constructor
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang xie updated HBASE-7495: - Status: Patch Available (was: Open) parallel scanner seek in StoreScanner's constructor --- Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: liang xie Assignee: liang xie Attachments: HBASE-7495.txt, HBASE-7495.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liang xie updated HBASE-7495: - Summary: parallel seek in StoreScanner (was: parallel scanner seek in StoreScanner's constructor) parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: liang xie Assignee: liang xie Attachments: HBASE-7495.txt, HBASE-7495.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7293) [replication] Remove dead sinks from ReplicationSource.currentPeers and pick new ones
[ https://issues.apache.org/jira/browse/HBASE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560638#comment-13560638 ] Hudson commented on HBASE-7293: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #368 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/368/]) HBASE-7293 [replication] Remove dead sinks from ReplicationSource.currentPeers and pick new ones (Revision 1437240) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java [replication] Remove dead sinks from ReplicationSource.currentPeers and pick new ones - Key: HBASE-7293 URL: https://issues.apache.org/jira/browse/HBASE-7293 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Jean-Daniel Cryans Assignee: Lars Hofhansl Fix For: 0.96.0, 0.94.5 Attachments: 7293-0.94.txt, 7293-0.94-v2.txt, 7293-0.96.txt I happened to look at a log today where I saw a lot lines like this: {noformat} 2012-12-06 23:29:08,318 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave cluster looks down: This server is in the failed servers list: sv4r20s49/10.4.20.49:10304 2012-12-06 23:29:15,987 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't replicate because of a local or network error: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:519) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:484) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:416) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:462) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1150) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1000) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy14.replicateLogEntries(Unknown Source) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:627) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365) 2012-12-06 23:29:15,988 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave cluster looks down: Connection refused {noformat} What struck me as weird is this had been going on for some days, I would expect the RS to find new servers if it wasn't able to replicate. But the reality is that only a few of the chosen sink RS were down so eventually the source hits one that's good and is never able to refresh its list of servers. We should remove the dead servers, it's spammy and probably adds some slave lag. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560639#comment-13560639 ] Hudson commented on HBASE-6466: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #368 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/368/]) HBASE-6466 Revert, TestLogRolling failed twice on trunk build (Revision 1437274) HBASE-6466 Enable multi-thread for memstore flush (Chunhui) (Revision 1437252) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java tedyu : Files : * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6466-v6.patch, HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, HBASE-6466-v4.patch, HBASE-6466-v5.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7646) Make forkedProcessTimeoutInSeconds configurable
[ https://issues.apache.org/jira/browse/HBASE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560640#comment-13560640 ] Hudson commented on HBASE-7646: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #368 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/368/]) HBASE-7646 Make forkedProcessTimeoutInSeconds configurable (Revision 1437130) Result = FAILURE jxiang : Files : * /hbase/trunk/pom.xml Make forkedProcessTimeoutInSeconds configurable --- Key: HBASE-7646 URL: https://issues.apache.org/jira/browse/HBASE-7646 Project: HBase Issue Type: Bug Components: build Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Fix For: 0.96.0, 0.94.5 Attachments: 0.94-7646.patch, trunk-7646.patch Command line property surefire.timeout somehow doesn't work. It may be because forkedProcessTimeoutInSeconds is hard-coded to 900. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7588) Fix two findbugs warning in MemStoreFlusher
[ https://issues.apache.org/jira/browse/HBASE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560641#comment-13560641 ] Hudson commented on HBASE-7588: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #368 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/368/]) HBASE-7588 Fix two findbugs warning in MemStoreFlusher; REAPPLIED (Revision 1437154) HBASE-7588 Fix two findbugs warning in MemStoreFlusher; REVERTED (Revision 1437121) HBASE-7588 Fix two findbugs warning in MemStoreFlusher (Revision 1437119) Result = FAILURE stack : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java stack : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java stack : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java Fix two findbugs warning in MemStoreFlusher --- Key: HBASE-7588 URL: https://issues.apache.org/jira/browse/HBASE-7588 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7588-v0-trunk.patch, HBASE-7588-v1-trunk.patch, HBASE-7588-v2-trunk.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7637) hbase-hadoop1-compat conflicts with -Dhadoop.profile=2.0
[ https://issues.apache.org/jira/browse/HBASE-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-7637: --- Attachment: nomodules.patch hbase-hadoop1-compat conflicts with -Dhadoop.profile=2.0 Key: HBASE-7637 URL: https://issues.apache.org/jira/browse/HBASE-7637 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.96.0 Reporter: nkeywal Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7637-0.patch, nomodules.patch I'm unclear on the root cause / fix. Here is the scenario: {noformat} mvn clean package install -Dhadoop.profile=2.0 -DskipTests bin/start-hbase.sh {noformat} fails with {noformat} Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.metrics2.lib.MetricMutable at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) {noformat} doing {noformat} rm -rf hbase-hadoop1-compat/target/ {noformat} makes it work. In the pom.xml, we never reference hadoop2-compat. But doing so does not help: hadoop1-compat is compiled and takes precedence over hadoop2... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7637) hbase-hadoop1-compat conflicts with -Dhadoop.profile=2.0
[ https://issues.apache.org/jira/browse/HBASE-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560669#comment-13560669 ] nkeywal commented on HBASE-7637: It's possible to do this (cf. nomodules.patch) in the main pom.xml (on top of what you did). It has one advantage: you don't download (or need) the hadoop version you don'tuse. Without it, even if you build hbase for hadoop 1, you take hadoop 2 as well. hbase-hadoop1-compat conflicts with -Dhadoop.profile=2.0 Key: HBASE-7637 URL: https://issues.apache.org/jira/browse/HBASE-7637 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.96.0 Reporter: nkeywal Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7637-0.patch, nomodules.patch I'm unclear on the root cause / fix. Here is the scenario: {noformat} mvn clean package install -Dhadoop.profile=2.0 -DskipTests bin/start-hbase.sh {noformat} fails with {noformat} Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.metrics2.lib.MetricMutable at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) {noformat} doing {noformat} rm -rf hbase-hadoop1-compat/target/ {noformat} makes it work. In the pom.xml, we never reference hadoop2-compat. But doing so does not help: hadoop1-compat is compiled and takes precedence over hadoop2... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560676#comment-13560676 ] Hadoop QA commented on HBASE-7495: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566118/HBASE-7495.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegionBusyWait org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.regionserver.TestHRegion org.apache.hadoop.hbase.regionserver.wal.TestLogRolling org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction org.apache.hadoop.hbase.TestAcidGuarantees org.apache.hadoop.hbase.TestLocalHBaseCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4145//console This message is automatically generated. parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: liang xie Assignee: liang xie Attachments: HBASE-7495.txt, HBASE-7495.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560710#comment-13560710 ] chunhui shen commented on HBASE-6466: - TestLogRolling#testLogRollOnDatanodeDeath() is failed in trunk build 3779 and 3780 by {code}assertTrue(LowReplication Roller should've been disabled,!log.isLowReplicationRollEnabled()); {code} lowReplicationRollEnabled will only be set false in FSHlog#checkLowReplication(); FSHlog#checkLowReplication() will only called by FSHlog#syncer, however it is skipped when rolling log {code} if (!this.logRollRunning) { checkLowReplication(); ... } {code} Therefore, I could only think one reason for this failed test. Log is rolling when calling syncer(). From the logs, I could only find HDFS pipeline error detected. Found 1 replicas but expecting no less than 2 replicas(logged by the FSHlog#checkLowReplication()) 3 times, but need at least 4 times to pass the test. It's easy to reproduce the failed test with the following change in FSHlog {code} --- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java (revision 1437274) +++ hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java (working copy) @@ -501,6 +501,10 @@ byte [][] regionsToFlush = null; try { this.logRollRunning = true; +try { + Thread.sleep(1500); +} catch (InterruptedException e) { +} boolean isClosed = closed; if (isClosed || !closeBarrier.beginOp()) { LOG.debug(HLog + (isClosed ? closed : closing) + . Skipping rolling of writer); {code} In addition, with patch v6, pass the test TestLogRolling 50 times on my local PC. Attaching patchV7, change a little in the TestLogRolling Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6466-v6.patch, HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, HBASE-6466-v4.patch, HBASE-6466-v5.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7221) RowKey utility class for rowkey construction
[ https://issues.apache.org/jira/browse/HBASE-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560714#comment-13560714 ] Doug Meil commented on HBASE-7221: -- This parsing approach... {code} byte[] key = RowKey.format(%16x%4d%8d, hashVal, intVal, longVal); {code} ... seems a lot less understandable to me than the proposal. It also doesn't address reading components back, which is why the RowKey (aka FixedLengthKey/ComponentKey) needs to have state. I don't think it's enough just to have a builder pattern, people need some way of reading and processing the key. It's not just about the writes. RowKey utility class for rowkey construction Key: HBASE-7221 URL: https://issues.apache.org/jira/browse/HBASE-7221 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: HBASE_7221.patch, hbase-common_hbase_7221_2.patch, hbase-common_hbase_7221_v3.patch A common question in the dist-lists is how to construct rowkeys, particularly composite keys. Put/Get/Scan specifies byte[] as the rowkey, but it's up to you to sensibly populate that byte-array, and that's where things tend to go off the rails. The intent of this RowKey utility class isn't meant to add functionality into Put/Get/Scan, but rather make it simpler for folks to construct said arrays. Example: {code} RowKey key = RowKey.create(RowKey.SIZEOF_MD5_HASH + RowKey.SIZEOF_LONG); key.addHash(a); key.add(b); byte bytes[] = key.getBytes(); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-6466: Attachment: 6466-v7.patch Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6466-v6.patch, 6466-v7.patch, HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, HBASE-6466-v4.patch, HBASE-6466-v5.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7637) hbase-hadoop1-compat conflicts with -Dhadoop.profile=2.0
[ https://issues.apache.org/jira/browse/HBASE-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560715#comment-13560715 ] Hadoop QA commented on HBASE-7637: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566119/nomodules.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4146//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4146//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4146//console This message is automatically generated. hbase-hadoop1-compat conflicts with -Dhadoop.profile=2.0 Key: HBASE-7637 URL: https://issues.apache.org/jira/browse/HBASE-7637 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.96.0 Reporter: nkeywal Assignee: Elliott Clark Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7637-0.patch, nomodules.patch I'm unclear on the root cause / fix. Here is the scenario: {noformat} mvn clean package install -Dhadoop.profile=2.0 -DskipTests bin/start-hbase.sh {noformat} fails with {noformat} Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.metrics2.lib.MetricMutable at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) {noformat} doing {noformat} rm -rf hbase-hadoop1-compat/target/ {noformat} makes it work. In the pom.xml, we never reference hadoop2-compat. But doing so does not help: hadoop1-compat is compiled and takes precedence over hadoop2... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7650) bin/hbase-config.sh is not executable
nkeywal created HBASE-7650: -- Summary: bin/hbase-config.sh is not executable Key: HBASE-7650 URL: https://issues.apache.org/jira/browse/HBASE-7650 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.94.4, 0.96.0 Reporter: nkeywal And it strange that everything seems to work despite this... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-7650) bin/hbase-config.sh is not executable
[ https://issues.apache.org/jira/browse/HBASE-7650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal resolved HBASE-7650. Resolution: Invalid must be sourced only... bin/hbase-config.sh is not executable - Key: HBASE-7650 URL: https://issues.apache.org/jira/browse/HBASE-7650 Project: HBase Issue Type: Bug Components: scripts Affects Versions: 0.96.0, 0.94.4 Reporter: nkeywal And it strange that everything seems to work despite this... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560745#comment-13560745 ] Hadoop QA commented on HBASE-6466: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566127/6466-v7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestLocalHBaseCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4147//console This message is automatically generated. Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6466-v6.patch, 6466-v7.patch, HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, HBASE-6466-v4.patch, HBASE-6466-v5.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7495) parallel seek in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560791#comment-13560791 ] ramkrishna.s.vasudevan commented on HBASE-7495: --- [~xieliang007] How will the ordering be maintained. Do we need to ensure the ordering of the kvs? Just asking? parallel seek in StoreScanner - Key: HBASE-7495 URL: https://issues.apache.org/jira/browse/HBASE-7495 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.94.3, 0.96.0 Reporter: liang xie Assignee: liang xie Attachments: HBASE-7495.txt, HBASE-7495.txt seems there's a potential improvable space before doing scanner.next: {code:title=StoreScanner.java|borderStyle=solid} if (explicitColumnQuery lazySeekEnabledGlobally) { for (KeyValueScanner scanner : scanners) { scanner.requestSeek(matcher.getStartKey(), false, true); } } else { for (KeyValueScanner scanner : scanners) { scanner.seek(matcher.getStartKey()); } } {code} we can do scanner.requestSeek or scanner.seek in parallel, instead of current serialization, to reduce latency for special case. Any ideas on it ? I'll have a try if the comments/suggestions are positive:) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7403) Online Merge
[ https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560793#comment-13560793 ] ramkrishna.s.vasudevan commented on HBASE-7403: --- I dont have much comments here. clarified my doubts with Chunhui. Overall the functionality seems fine and the scenarios have been taken care. So +1 from me. Thanks Chunhui. Online Merge Key: HBASE-7403 URL: https://issues.apache.org/jira/browse/HBASE-7403 Project: HBase Issue Type: New Feature Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch, hbase-7403-trunkv11.patch, hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf The feature of this online merge: 1.Online,no necessary to disable table 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 3.Easy to call merege request, no need to input a long region name, only encoded name enough 4.No limit when operation, you don't need to tabke care the events like Server Dead, Balance, Split, Disabing/Enabing table, no need to take care whether you send a wrong merge request, it has alread done for you 5.Only little offline time for two merging regions Usage: 1.Tool: bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] table-name region-encodedname-1 region-encodedname-2 2.API: static void MergeManager#createMergeRequest We need merge in the following cases: 1.Region hole or region overlap, can’t be fix by hbck 2.Region become empty because of TTL and not reasonable Rowkey design 3.Region is always empty or very small because of presplit when create table 4.Too many empty or small regions would reduce the system performance(e.g. mslab) Current merge tools only support offline and are not able to redo if exception is thrown in the process of merging, causing a dirty data For online system, we need a online merge. This implement logic of this patch for Online Merge is : For example, merge regionA and regionB into regionC 1.Offline the two regions A and B 2.Merge the two regions in the HDFS(Create regionC’s directory, move regionA’s and regionB’s file to regionC’s directory, delete regionA’s and regionB’s directory) 3.Add the merged regionC to .META. 4.Assign the merged regionC As design of this patch , once we do the merge work in the HDFS,we could redo it until successful if it throws exception or abort or server restart, but couldn’t be rolled back. It depends on Use zookeeper to record the transaction journal state, make redo easier Use zookeeper to send/receive merge request Merge transaction is executed on the master Support calling merge request through API or shell tool About the merge process, please see the attachment and patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7648) TestAcidGuarantees.testMixedAtomicity hangs sometimes
[ https://issues.apache.org/jira/browse/HBASE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7648: --- Resolution: Fixed Fix Version/s: 0.94.5 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) TestAcidGuarantees.testMixedAtomicity hangs sometimes - Key: HBASE-7648 URL: https://issues.apache.org/jira/browse/HBASE-7648 Project: HBase Issue Type: Bug Components: test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0, 0.94.5 Attachments: 0.94-7648.patch, trunk-7648.patch java.lang.RuntimeException: Deferred at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:76) at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.waitFor(MultithreadedTestUtil.java:69) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:301) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:244) at org.apache.hadoop.hbase.TestAcidGuarantees.testMixedAtomicity(TestAcidGuarantees.java:343) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hbase.NotServingRegionException): org.apache.hadoop.hbase.NotServingRegionException: Region is not online: TestAcidGuarantees,,135776964.317288e8ca738963ca5e273fc56750fd. at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3211) at org.apache.hadoop.hbase.regionserver.HRegionServer.flushRegion(HRegionServer.java:2963) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1021) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy23.flushRegion(Unknown Source) at org.apache.hadoop.hbase.client.HBaseAdmin.flush(HBaseAdmin.java:1248) at org.apache.hadoop.hbase.client.HBaseAdmin.flush(HBaseAdmin.java:1230) at org.apache.hadoop.hbase.TestAcidGuarantees$1.doAnAction(TestAcidGuarantees.java:272) at org.apache.hadoop.hbase.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:145) at org.apache.hadoop.hbase.MultithreadedTestUtil$TestThread.run(MultithreadedTestUtil.java:121) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For
[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560848#comment-13560848 ] Ted Yu commented on HBASE-6466: --- I ran TestLogRolling using patch v7 locally and it passed: Running org.apache.hadoop.hbase.regionserver.wal.TestLogRolling 2013-01-23 09:22:46.693 java[8875:1703] Unable to load realm info from SCDynamicStore Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 141.023 sec Integrated to trunk again. Let's see what Jenkins tells us. Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6466-v6.patch, 6466-v7.patch, HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, HBASE-6466-v4.patch, HBASE-6466-v5.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition and snapshot data loss
[ https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-7643: --- Attachment: HBASE-7653-p4-v4.patch added more comments, as Jon suggested. {quote} Is there corresponding java doc that needs to be changed? {quote} No, the javadoc was wrong before HFileArchiver.resolveAndArchive() race condition and snapshot data loss --- Key: HBASE-7643 URL: https://issues.apache.org/jira/browse/HBASE-7643 Project: HBase Issue Type: Bug Affects Versions: hbase-6055, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch * The master have an hfile cleaner thread (that is responsible for cleaning the /hbase/.archive dir) ** /hbase/.archive/table/region/family/hfile ** if the family/region/family directory is empty the cleaner removes it * The master can archive files (from another thread, e.g. DeleteTableHandler) * The region can archive files (from another server/process, e.g. compaction) The simplified file archiving code looks like this: {code} HFileArchiver.resolveAndArchive(...) { // ensure that the archive dir exists fs.mkdir(archiveDir); // move the file to the archiver success = fs.rename(originalPath/fileName, archiveDir/fileName) // if the rename is failed, delete the file without archiving if (!success) fs.delete(originalPath/fileName); } {code} Since there's no synchronization between HFileArchiver.resolveAndArchive() and the cleaner run (different process, thread, ...) you can end up in the situation where you are moving something in a directory that doesn't exists. {code} fs.mkdir(archiveDir); // HFileCleaner chore starts at this point // and the archiveDirectory that we just ensured to be present gets removed. // The rename at this point will fail since the parent directory is missing. success = fs.rename(originalPath/fileName, archiveDir/fileName) {code} The bad thing of deleting the file without archiving is that if you've a snapshot that relies on the file to be present, or you've a clone table that relies on that file is that you're losing data. Possible solutions * Create a ZooKeeper lock, to notify the master (Hey I'm archiving something, wait a bit) * Add a RS - Master call to let the master removes files and avoid this kind of situations * Avoid to remove empty directories from the archive if the table exists or is not disabled * Add a try catch around the fs.rename The last one, the easiest one, looks like: {code} for (int i = 0; i retries; ++i) { // ensure archive directory to be present fs.mkdir(archiveDir); // possible race - // try to archive file success = fs.rename(originalPath/fileName, archiveDir/fileName); if (success) break; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition and snapshot data loss
[ https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560861#comment-13560861 ] Jonathan Hsieh commented on HBASE-7643: --- v4 lgtm. please fix line length complaints before commit. HFileArchiver.resolveAndArchive() race condition and snapshot data loss --- Key: HBASE-7643 URL: https://issues.apache.org/jira/browse/HBASE-7643 Project: HBase Issue Type: Bug Affects Versions: hbase-6055, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch * The master have an hfile cleaner thread (that is responsible for cleaning the /hbase/.archive dir) ** /hbase/.archive/table/region/family/hfile ** if the family/region/family directory is empty the cleaner removes it * The master can archive files (from another thread, e.g. DeleteTableHandler) * The region can archive files (from another server/process, e.g. compaction) The simplified file archiving code looks like this: {code} HFileArchiver.resolveAndArchive(...) { // ensure that the archive dir exists fs.mkdir(archiveDir); // move the file to the archiver success = fs.rename(originalPath/fileName, archiveDir/fileName) // if the rename is failed, delete the file without archiving if (!success) fs.delete(originalPath/fileName); } {code} Since there's no synchronization between HFileArchiver.resolveAndArchive() and the cleaner run (different process, thread, ...) you can end up in the situation where you are moving something in a directory that doesn't exists. {code} fs.mkdir(archiveDir); // HFileCleaner chore starts at this point // and the archiveDirectory that we just ensured to be present gets removed. // The rename at this point will fail since the parent directory is missing. success = fs.rename(originalPath/fileName, archiveDir/fileName) {code} The bad thing of deleting the file without archiving is that if you've a snapshot that relies on the file to be present, or you've a clone table that relies on that file is that you're losing data. Possible solutions * Create a ZooKeeper lock, to notify the master (Hey I'm archiving something, wait a bit) * Add a RS - Master call to let the master removes files and avoid this kind of situations * Avoid to remove empty directories from the archive if the table exists or is not disabled * Add a try catch around the fs.rename The last one, the easiest one, looks like: {code} for (int i = 0; i retries; ++i) { // ensure archive directory to be present fs.mkdir(archiveDir); // possible race - // try to archive file success = fs.rename(originalPath/fileName, archiveDir/fileName); if (success) break; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition and snapshot data loss
[ https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-7643: --- Attachment: HBASE-7653-p4-v5.patch HFileArchiver.resolveAndArchive() race condition and snapshot data loss --- Key: HBASE-7643 URL: https://issues.apache.org/jira/browse/HBASE-7643 Project: HBase Issue Type: Bug Affects Versions: hbase-6055, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, HBASE-7653-p4-v5.patch * The master have an hfile cleaner thread (that is responsible for cleaning the /hbase/.archive dir) ** /hbase/.archive/table/region/family/hfile ** if the family/region/family directory is empty the cleaner removes it * The master can archive files (from another thread, e.g. DeleteTableHandler) * The region can archive files (from another server/process, e.g. compaction) The simplified file archiving code looks like this: {code} HFileArchiver.resolveAndArchive(...) { // ensure that the archive dir exists fs.mkdir(archiveDir); // move the file to the archiver success = fs.rename(originalPath/fileName, archiveDir/fileName) // if the rename is failed, delete the file without archiving if (!success) fs.delete(originalPath/fileName); } {code} Since there's no synchronization between HFileArchiver.resolveAndArchive() and the cleaner run (different process, thread, ...) you can end up in the situation where you are moving something in a directory that doesn't exists. {code} fs.mkdir(archiveDir); // HFileCleaner chore starts at this point // and the archiveDirectory that we just ensured to be present gets removed. // The rename at this point will fail since the parent directory is missing. success = fs.rename(originalPath/fileName, archiveDir/fileName) {code} The bad thing of deleting the file without archiving is that if you've a snapshot that relies on the file to be present, or you've a clone table that relies on that file is that you're losing data. Possible solutions * Create a ZooKeeper lock, to notify the master (Hey I'm archiving something, wait a bit) * Add a RS - Master call to let the master removes files and avoid this kind of situations * Avoid to remove empty directories from the archive if the table exists or is not disabled * Add a try catch around the fs.rename The last one, the easiest one, looks like: {code} for (int i = 0; i retries; ++i) { // ensure archive directory to be present fs.mkdir(archiveDir); // possible race - // try to archive file success = fs.rename(originalPath/fileName, archiveDir/fileName); if (success) break; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7622) Add table descriptor verification after snapshot restore
[ https://issues.apache.org/jira/browse/HBASE-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560867#comment-13560867 ] Matteo Bertozzi commented on HBASE-7622: I'm going to commit this to the snapshots branch, if there're no objections Add table descriptor verification after snapshot restore Key: HBASE-7622 URL: https://issues.apache.org/jira/browse/HBASE-7622 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: hbase-6055 Attachments: HBASE-7622-v0.patch, HBASE-7622-v1.patch, HBASE-7622-v2.patch Add the schema verification not only based on disk data, but also on the HTableDescriptor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6832) [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing
[ https://issues.apache.org/jira/browse/HBASE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560868#comment-13560868 ] Ted Yu commented on HBASE-6832: --- Patch looks good to me. [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing Key: HBASE-6832 URL: https://issues.apache.org/jira/browse/HBASE-6832 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-6832_v1-0.94.patch, hbase-6832_v1-trunk.patch, hbase-6832_v4-0.94.patch, hbase-6832_v4-trunk.patch, hbase-6832_v5-trunk.patch TestRegionObserverBypass.testMulti() fails with {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.checkRowAndDelete(TestRegionObserverBypass.java:173) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.testMulti(TestRegionObserverBypass.java:166) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6821) [WINDOWS] .META. table name causes file system problems in windows
[ https://issues.apache.org/jira/browse/HBASE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560872#comment-13560872 ] Ted Yu commented on HBASE-6821: --- For TestMetaMigrationConvertToPB.README, year in license header is not needed. Other than the above, +1. [WINDOWS] .META. table name causes file system problems in windows -- Key: HBASE-6821 URL: https://issues.apache.org/jira/browse/HBASE-6821 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-4388-root.dir.tgz, hbase-6821_v2_0.94.patch, hbase-6821_v2-trunk.patch, TestMetaMigrationConvertToPB.tgz TestMetaMigrationRemovingHTD untars a cluster dir having a .META. subdirectory. This causes mvn clean to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6731) Port HBASE-6537 'Race between balancer and disable table can lead to inconsistent cluster' to 0.92
[ https://issues.apache.org/jira/browse/HBASE-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] rajeshbabu updated HBASE-6731: -- Attachment: HBASE-6731.patch Same as HBASE-6537 patch. Port HBASE-6537 'Race between balancer and disable table can lead to inconsistent cluster' to 0.92 -- Key: HBASE-6731 URL: https://issues.apache.org/jira/browse/HBASE-6731 Project: HBase Issue Type: Bug Reporter: Ted Yu Fix For: 0.92.3 Attachments: HBASE-6731.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7329) remove flush-related records from WAL and make locking more granular
[ https://issues.apache.org/jira/browse/HBASE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560889#comment-13560889 ] ramkrishna.s.vasudevan commented on HBASE-7329: --- Had time check this patch today. Nice one. Covering all cases. One question, What was the motivation in introducing a barrier? What was the major problem wrt to close operation prior to this patch. Thanks Sergey. remove flush-related records from WAL and make locking more granular Key: HBASE-7329 URL: https://issues.apache.org/jira/browse/HBASE-7329 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.96.0 Attachments: 7329-findbugs.diff, 7329-v7.txt, HBASE-7329-v0.patch, HBASE-7329-v0.patch, HBASE-7329-v0-tmp.patch, HBASE-7329-v1.patch, HBASE-7329-v1.patch, HBASE-7329-v2.patch, HBASE-7329-v3.patch, HBASE-7329-v4.patch, HBASE-7329-v5.patch, HBASE-7329-v6.patch, HBASE-7329-v6.patch Comments from many people in HBASE-6466 and HBASE-6980 indicate that flush records in WAL are not useful. If so, they should be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6731) Port HBASE-6537 'Race between balancer and disable table can lead to inconsistent cluster' to 0.92
[ https://issues.apache.org/jira/browse/HBASE-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-6731: - Assignee: rajeshbabu Port HBASE-6537 'Race between balancer and disable table can lead to inconsistent cluster' to 0.92 -- Key: HBASE-6731 URL: https://issues.apache.org/jira/browse/HBASE-6731 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: rajeshbabu Fix For: 0.92.3 Attachments: HBASE-6731.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6731) Port HBASE-6537 'Race between balancer and disable table can lead to inconsistent cluster' to 0.92
[ https://issues.apache.org/jira/browse/HBASE-6731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560893#comment-13560893 ] ramkrishna.s.vasudevan commented on HBASE-6731: --- +1 on patch. Port HBASE-6537 'Race between balancer and disable table can lead to inconsistent cluster' to 0.92 -- Key: HBASE-6731 URL: https://issues.apache.org/jira/browse/HBASE-6731 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: rajeshbabu Fix For: 0.92.3 Attachments: HBASE-6731.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7622) Add table descriptor verification after snapshot restore
[ https://issues.apache.org/jira/browse/HBASE-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560897#comment-13560897 ] Ted Yu commented on HBASE-7622: --- @Matteo: I ran the tests again and they passed: Running org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient 2013-01-23 10:06:13.436 java[9234:1203] Unable to load realm info from SCDynamicStore Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 229.729 sec Running org.apache.hadoop.hbase.snapshot.TestRestoreFlushSnapshotFromClient 2013-01-23 10:10:03.758 java[9267:1203] Unable to load realm info from SCDynamicStore Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.49 sec Go for it. Add table descriptor verification after snapshot restore Key: HBASE-7622 URL: https://issues.apache.org/jira/browse/HBASE-7622 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: hbase-6055 Attachments: HBASE-7622-v0.patch, HBASE-7622-v1.patch, HBASE-7622-v2.patch Add the schema verification not only based on disk data, but also on the HTableDescriptor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7521) fix HBASE-6060 (regions stuck in opening state) in 0.94
[ https://issues.apache.org/jira/browse/HBASE-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560898#comment-13560898 ] ramkrishna.s.vasudevan commented on HBASE-7521: --- [~rajesh23] Could you take a look at this patch with the code base that you have. It will be easy to see if anything is missed out because Sergey has rebased the earlier patches that we had posted that time. I wil also have a look at this tomorrow during day time. fix HBASE-6060 (regions stuck in opening state) in 0.94 --- Key: HBASE-7521 URL: https://issues.apache.org/jira/browse/HBASE-7521 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7521-original-patch-ported-v0.patch, HBASE-7521-v0.patch, HBASE-7521-v1.patch Discussion in HBASE-6060 implies that the fix there does not work on 0.94. Still, we may want to fix the issue in 0.94 (via some different fix) because the regions stuck in opening for ridiculous amounts of time is not a good thing to have. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7329) remove flush-related records from WAL and make locking more granular
[ https://issues.apache.org/jira/browse/HBASE-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560906#comment-13560906 ] Sergey Shelukhin commented on HBASE-7329: - Hmm... as I mentioned above actually I am not sure whether safety on close is necessary. This was just to preserve the logic of existing behavior. cacheFlushLock taken on close interacted with things as such - close would wait for flush; close would wait for log rolling; log rolling would wait for close and then exit because .closed is set; cache flush will wait for close and then proceed(?). Judging by lack of bugs from the later case, I am assuming it is deliberately, or by coincidence, ensured externally that it doesn't happen. With barrier, close would still wait for both operations. Both flush and log roll will not start if close has started. remove flush-related records from WAL and make locking more granular Key: HBASE-7329 URL: https://issues.apache.org/jira/browse/HBASE-7329 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.96.0 Attachments: 7329-findbugs.diff, 7329-v7.txt, HBASE-7329-v0.patch, HBASE-7329-v0.patch, HBASE-7329-v0-tmp.patch, HBASE-7329-v1.patch, HBASE-7329-v1.patch, HBASE-7329-v2.patch, HBASE-7329-v3.patch, HBASE-7329-v4.patch, HBASE-7329-v5.patch, HBASE-7329-v6.patch, HBASE-7329-v6.patch Comments from many people in HBASE-6466 and HBASE-6980 indicate that flush records in WAL are not useful. If so, they should be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7648) TestAcidGuarantees.testMixedAtomicity hangs sometimes
[ https://issues.apache.org/jira/browse/HBASE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560907#comment-13560907 ] Hudson commented on HBASE-7648: --- Integrated in HBase-TRUNK #3783 (See [https://builds.apache.org/job/HBase-TRUNK/3783/]) HBASE-7648 TestAcidGuarantees.testMixedAtomicity hangs sometimes (Revision 1437538) Result = FAILURE jxiang : Files : * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java TestAcidGuarantees.testMixedAtomicity hangs sometimes - Key: HBASE-7648 URL: https://issues.apache.org/jira/browse/HBASE-7648 Project: HBase Issue Type: Bug Components: test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0, 0.94.5 Attachments: 0.94-7648.patch, trunk-7648.patch java.lang.RuntimeException: Deferred at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:76) at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.waitFor(MultithreadedTestUtil.java:69) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:301) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:244) at org.apache.hadoop.hbase.TestAcidGuarantees.testMixedAtomicity(TestAcidGuarantees.java:343) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hbase.NotServingRegionException): org.apache.hadoop.hbase.NotServingRegionException: Region is not online: TestAcidGuarantees,,135776964.317288e8ca738963ca5e273fc56750fd. at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3211) at org.apache.hadoop.hbase.regionserver.HRegionServer.flushRegion(HRegionServer.java:2963) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1021) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy23.flushRegion(Unknown Source) at org.apache.hadoop.hbase.client.HBaseAdmin.flush(HBaseAdmin.java:1248) at org.apache.hadoop.hbase.client.HBaseAdmin.flush(HBaseAdmin.java:1230) at org.apache.hadoop.hbase.TestAcidGuarantees$1.doAnAction(TestAcidGuarantees.java:272) at org.apache.hadoop.hbase.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:145) at
[jira] [Reopened] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reopened HBASE-7268: - Doesn't cover all cases, saw that it still removes from cache on error from different server... Original patch is valid, I'll make addendum patch correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7268: Summary: correct local region location cache information can be overwritten (or deleted) w/stale information from an old server (was: correct local region location cache information can be overwritten w/stale information from an old server) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server -- Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7114) Increment does not extend Mutation but probably should
[ https://issues.apache.org/jira/browse/HBASE-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560914#comment-13560914 ] Ted Yu commented on HBASE-7114: --- Stable annotation is only in 0.96 code base. I agree with Devaraj that we should tackle this in 0.96 Increment does not extend Mutation but probably should -- Key: HBASE-7114 URL: https://issues.apache.org/jira/browse/HBASE-7114 Project: HBase Issue Type: Bug Components: Client Reporter: Andrew Purtell Priority: Minor Increment is the only operation in the class of mutators that does not extend Mutation. It mostly duplicates what Mutation provides, but not quite. The signatures for setWriteToWAL and getFamilyMap are slightly different. This can be inconvenient because it requires special case code and therefore could be considered an API design nit. Unfortunately it is not a simple change: The interface is marked stable and the internals of the family map are different from other mutation types. The latter is why I suspect this was not addressed when Mutation was introduced. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition and snapshot data loss
[ https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560920#comment-13560920 ] Hadoop QA commented on HBASE-7643: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566150/HBASE-7653-p4-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4148//console This message is automatically generated. HFileArchiver.resolveAndArchive() race condition and snapshot data loss --- Key: HBASE-7643 URL: https://issues.apache.org/jira/browse/HBASE-7643 Project: HBase Issue Type: Bug Affects Versions: hbase-6055, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, HBASE-7653-p4-v5.patch * The master have an hfile cleaner thread (that is responsible for cleaning the /hbase/.archive dir) ** /hbase/.archive/table/region/family/hfile ** if the family/region/family directory is empty the cleaner removes it * The master can archive files (from another thread, e.g. DeleteTableHandler) * The region can archive files (from another server/process, e.g. compaction) The simplified file archiving code looks like this: {code} HFileArchiver.resolveAndArchive(...) { // ensure that the archive dir exists fs.mkdir(archiveDir); // move the file to the archiver success = fs.rename(originalPath/fileName, archiveDir/fileName) // if the rename is failed, delete the file without archiving if (!success) fs.delete(originalPath/fileName); } {code} Since there's no synchronization between HFileArchiver.resolveAndArchive() and the cleaner run (different process, thread, ...) you can end up in the situation where you are moving something in a directory that doesn't exists. {code} fs.mkdir(archiveDir); // HFileCleaner chore starts at this point // and the archiveDirectory that we just ensured to be present gets removed. // The rename at this point will fail since the parent directory is missing. success = fs.rename(originalPath/fileName, archiveDir/fileName) {code} The bad thing of deleting the file without archiving is that if you've a snapshot that relies on the file to be present, or you've a clone table that relies on that file is that you're losing data. Possible solutions * Create a ZooKeeper lock, to notify the master (Hey I'm archiving something, wait a bit) * Add a RS - Master call to let the master removes files and avoid this kind of
[jira] [Updated] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss
[ https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7643: -- Summary: HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss (was: HFileArchiver.resolveAndArchive() race condition and snapshot data loss) Hadoop Flags: Reviewed HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss --- Key: HBASE-7643 URL: https://issues.apache.org/jira/browse/HBASE-7643 Project: HBase Issue Type: Bug Affects Versions: hbase-6055, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, HBASE-7653-p4-v5.patch * The master have an hfile cleaner thread (that is responsible for cleaning the /hbase/.archive dir) ** /hbase/.archive/table/region/family/hfile ** if the family/region/family directory is empty the cleaner removes it * The master can archive files (from another thread, e.g. DeleteTableHandler) * The region can archive files (from another server/process, e.g. compaction) The simplified file archiving code looks like this: {code} HFileArchiver.resolveAndArchive(...) { // ensure that the archive dir exists fs.mkdir(archiveDir); // move the file to the archiver success = fs.rename(originalPath/fileName, archiveDir/fileName) // if the rename is failed, delete the file without archiving if (!success) fs.delete(originalPath/fileName); } {code} Since there's no synchronization between HFileArchiver.resolveAndArchive() and the cleaner run (different process, thread, ...) you can end up in the situation where you are moving something in a directory that doesn't exists. {code} fs.mkdir(archiveDir); // HFileCleaner chore starts at this point // and the archiveDirectory that we just ensured to be present gets removed. // The rename at this point will fail since the parent directory is missing. success = fs.rename(originalPath/fileName, archiveDir/fileName) {code} The bad thing of deleting the file without archiving is that if you've a snapshot that relies on the file to be present, or you've a clone table that relies on that file is that you're losing data. Possible solutions * Create a ZooKeeper lock, to notify the master (Hey I'm archiving something, wait a bit) * Add a RS - Master call to let the master removes files and avoid this kind of situations * Avoid to remove empty directories from the archive if the table exists or is not disabled * Add a try catch around the fs.rename The last one, the easiest one, looks like: {code} for (int i = 0; i retries; ++i) { // ensure archive directory to be present fs.mkdir(archiveDir); // possible race - // try to archive file success = fs.rename(originalPath/fileName, archiveDir/fileName); if (success) break; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7648) TestAcidGuarantees.testMixedAtomicity hangs sometimes
[ https://issues.apache.org/jira/browse/HBASE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560932#comment-13560932 ] Hudson commented on HBASE-7648: --- Integrated in HBase-0.94 #754 (See [https://builds.apache.org/job/HBase-0.94/754/]) HBASE-7648 TestAcidGuarantees.testMixedAtomicity hangs sometimes (Revision 1437539) Result = FAILURE jxiang : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java TestAcidGuarantees.testMixedAtomicity hangs sometimes - Key: HBASE-7648 URL: https://issues.apache.org/jira/browse/HBASE-7648 Project: HBase Issue Type: Bug Components: test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.96.0, 0.94.5 Attachments: 0.94-7648.patch, trunk-7648.patch java.lang.RuntimeException: Deferred at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:76) at org.apache.hadoop.hbase.MultithreadedTestUtil$TestContext.waitFor(MultithreadedTestUtil.java:69) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:301) at org.apache.hadoop.hbase.TestAcidGuarantees.runTestAtomicity(TestAcidGuarantees.java:244) at org.apache.hadoop.hbase.TestAcidGuarantees.testMixedAtomicity(TestAcidGuarantees.java:343) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:24) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hbase.NotServingRegionException): org.apache.hadoop.hbase.NotServingRegionException: Region is not online: TestAcidGuarantees,,135776964.317288e8ca738963ca5e273fc56750fd. at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3211) at org.apache.hadoop.hbase.regionserver.HRegionServer.flushRegion(HRegionServer.java:2963) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1021) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy23.flushRegion(Unknown Source) at org.apache.hadoop.hbase.client.HBaseAdmin.flush(HBaseAdmin.java:1248) at org.apache.hadoop.hbase.client.HBaseAdmin.flush(HBaseAdmin.java:1230) at org.apache.hadoop.hbase.TestAcidGuarantees$1.doAnAction(TestAcidGuarantees.java:272) at org.apache.hadoop.hbase.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:145) at
[jira] [Updated] (HBASE-7622) Add table descriptor verification after snapshot restore
[ https://issues.apache.org/jira/browse/HBASE-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-7622: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed to the snapshots branch Add table descriptor verification after snapshot restore Key: HBASE-7622 URL: https://issues.apache.org/jira/browse/HBASE-7622 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: hbase-6055 Attachments: HBASE-7622-v0.patch, HBASE-7622-v1.patch, HBASE-7622-v2.patch Add the schema verification not only based on disk data, but also on the HTableDescriptor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7622) Add table descriptor verification after snapshot restore
[ https://issues.apache.org/jira/browse/HBASE-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-7622: --- Status: Patch Available (was: Open) Add table descriptor verification after snapshot restore Key: HBASE-7622 URL: https://issues.apache.org/jira/browse/HBASE-7622 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: hbase-6055 Attachments: HBASE-7622-v0.patch, HBASE-7622-v1.patch, HBASE-7622-v2.patch Add the schema verification not only based on disk data, but also on the HTableDescriptor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss
[ https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560939#comment-13560939 ] Hadoop QA commented on HBASE-7643: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566153/HBASE-7653-p4-v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestLocalHBaseCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4149//console This message is automatically generated. HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss --- Key: HBASE-7643 URL: https://issues.apache.org/jira/browse/HBASE-7643 Project: HBase Issue Type: Bug Affects Versions: hbase-6055, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, HBASE-7653-p4-v5.patch * The master have an hfile cleaner thread (that is responsible for cleaning the /hbase/.archive dir) ** /hbase/.archive/table/region/family/hfile ** if the family/region/family directory is empty the cleaner removes it * The master can archive files (from another thread, e.g. DeleteTableHandler) * The region can archive files (from another server/process, e.g. compaction) The simplified file archiving code looks like this: {code} HFileArchiver.resolveAndArchive(...) { // ensure that the archive dir exists fs.mkdir(archiveDir); // move the file to the archiver success = fs.rename(originalPath/fileName, archiveDir/fileName) // if the rename is failed, delete the file without archiving if (!success) fs.delete(originalPath/fileName); } {code} Since there's no synchronization between HFileArchiver.resolveAndArchive() and the cleaner run (different process, thread, ...) you can end up in the situation where you are moving something in a directory that doesn't exists. {code} fs.mkdir(archiveDir); // HFileCleaner chore starts at this point // and the archiveDirectory that we just ensured to be present gets removed. // The rename at this point will fail since the parent directory is missing. success = fs.rename(originalPath/fileName, archiveDir/fileName) {code} The bad thing of deleting the file without archiving is that if you've a snapshot that relies on the file to be present, or you've a clone table that relies on that file is that you're losing data. Possible solutions * Create a ZooKeeper lock, to notify the master (Hey I'm archiving something,
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560950#comment-13560950 ] Ted Yu commented on HBASE-7268: --- From https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/366/testReport/junit/org.apache.hadoop.hbase.util/TestMiniClusterLoadSequential/loadTest_0_/, I found: {code} 2013-01-22 03:16:55,763 ERROR [HBaseWriterThread_6] server.NIOServerCnxnFactory$1(44): Thread Thread[HBaseWriterThread_6,5,main] died java.lang.NullPointerException at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.deleteCachedLocation(HConnectionManager.java:1783) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.updateCachedLocations(HConnectionManager.java:1825) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.access$1300(HConnectionManager.java:515) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$Process.processBatchCallback(HConnectionManager.java:2035) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$Process.access$900(HConnectionManager.java:1874) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1863) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1842) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:882) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:692) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667) at org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.insert(MultiThreadedWriter.java:175) at org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.run(MultiThreadedWriter.java:145) {code} Looks like oldLocation was null in the following check: {code} isStaleDelete = (source != null) !oldLocation.equals(source); {code} Can you include the fix in the addendum ? correct local region location cache information can be overwritten (or deleted) w/stale information from an old server -- Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-7382: --- Attachment: HBASE-7382-trunk.patch Patch which forward port the multi functionality. It doesn't include the compatibility issues which was there in 6775. Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6832) [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing
[ https://issues.apache.org/jira/browse/HBASE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560966#comment-13560966 ] Enis Soztutar commented on HBASE-6832: -- bq. for EnvironmentEdgeManager, should we not always initialize the 'time' at 10 or something like this? I can imagine other piece of code doing minus something. Initializing it to something high enough could save us from some burden later. Makes sense. I changed it so that it starts with currentTimeMillis by default. bq. For the fix, except this non critical comment above, I'm ok, but I wonder if the root issue (strange time counter on windows) won't shows up in production. That's another subject, though. Opened HBASE-6833 for that, although the fix is not that clear at this point. [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing Key: HBASE-6832 URL: https://issues.apache.org/jira/browse/HBASE-6832 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-6832_v1-0.94.patch, hbase-6832_v1-trunk.patch, hbase-6832_v4-0.94.patch, hbase-6832_v4-trunk.patch, hbase-6832_v5-trunk.patch TestRegionObserverBypass.testMulti() fails with {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.checkRowAndDelete(TestRegionObserverBypass.java:173) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.testMulti(TestRegionObserverBypass.java:166) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6832) [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing
[ https://issues.apache.org/jira/browse/HBASE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6832: - Attachment: hbase-6832_v6-trunk.patch Updated patch with N's suggestions. [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing Key: HBASE-6832 URL: https://issues.apache.org/jira/browse/HBASE-6832 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-6832_v1-0.94.patch, hbase-6832_v1-trunk.patch, hbase-6832_v4-0.94.patch, hbase-6832_v4-trunk.patch, hbase-6832_v5-trunk.patch, hbase-6832_v6-trunk.patch TestRegionObserverBypass.testMulti() fails with {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.checkRowAndDelete(TestRegionObserverBypass.java:173) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.testMulti(TestRegionObserverBypass.java:166) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560971#comment-13560971 ] Himanshu Vashishtha commented on HBASE-7382: ran jenkins with this job; TestHbck failed which looks unrelated. Ran it locally and it passed. Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7605) TestMiniClusterLoadSequential fails in trunk build on hadoop 2
[ https://issues.apache.org/jira/browse/HBASE-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560974#comment-13560974 ] Ted Yu commented on HBASE-7605: --- {code} 2013-01-22 03:16:55,763 ERROR [HBaseWriterThread_6] server.NIOServerCnxnFactory$1(44): Thread Thread[HBaseWriterThread_6,5,main] died java.lang.NullPointerException at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.deleteCachedLocation(HConnectionManager.java:1783) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.updateCachedLocations(HConnectionManager.java:1825) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.access$1300(HConnectionManager.java:515) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$Process.processBatchCallback(HConnectionManager.java:2035) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$Process.access$900(HConnectionManager.java:1874) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1863) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1842) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:882) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:692) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667) at org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.insert(MultiThreadedWriter.java:175) at org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.run(MultiThreadedWriter.java:145) {code} Fixing the NullPointer exception, I was able to see the test pass against hadoop 2.0: Running org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential 2013-01-23 11:09:20.078 java[10447:1203] Unable to load realm info from SCDynamicStore 2013-01-23 11:09:20.155 java[10447:1203] Unable to load realm info from SCDynamicStore Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 57.738 sec TestMiniClusterLoadSequential fails in trunk build on hadoop 2 -- Key: HBASE-7605 URL: https://issues.apache.org/jira/browse/HBASE-7605 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Priority: Critical Fix For: 0.96.0 From HBase-TRUNK-on-Hadoop-2.0.0 #354: loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[1](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[2](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[3](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7382: -- Status: Patch Available (was: Open) Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560981#comment-13560981 ] Sergey Shelukhin commented on HBASE-7268: - Sure. I actually misread the logs for what I thought was missing, it does remove based on incorrect location but it's a forced remove. I'll add null check and make the logs/docs more clear. correct local region location cache information can be overwritten (or deleted) w/stale information from an old server -- Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7605) TestMiniClusterLoadSequential fails in trunk build on hadoop 2
[ https://issues.apache.org/jira/browse/HBASE-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560985#comment-13560985 ] stack commented on HBASE-7605: -- [~ted_yu] is that NPE because of another issue? Should we close out this one then? TestMiniClusterLoadSequential fails in trunk build on hadoop 2 -- Key: HBASE-7605 URL: https://issues.apache.org/jira/browse/HBASE-7605 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Priority: Critical Fix For: 0.96.0 From HBase-TRUNK-on-Hadoop-2.0.0 #354: loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[1](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[2](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[3](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6816) [WINDOWS] line endings on checkout for .sh files
[ https://issues.apache.org/jira/browse/HBASE-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar resolved HBASE-6816. -- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Committed this. Thanks for the review Nicolas. [WINDOWS] line endings on checkout for .sh files Key: HBASE-6816 URL: https://issues.apache.org/jira/browse/HBASE-6816 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: hbase-16_v1.patch, hbase-6816_v1.patch On code checkout from svn or git, we need to ensure that the line endings for .sh files are LF, so that they work with cygwin. This is important for getting src/saveVersion.sh to work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6832) [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing
[ https://issues.apache.org/jira/browse/HBASE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6832: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed v6, which has trivial changes to v5. Thanks for the reviews. [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing Key: HBASE-6832 URL: https://issues.apache.org/jira/browse/HBASE-6832 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Fix For: 0.96.0 Attachments: hbase-6832_v1-0.94.patch, hbase-6832_v1-trunk.patch, hbase-6832_v4-0.94.patch, hbase-6832_v4-trunk.patch, hbase-6832_v5-trunk.patch, hbase-6832_v6-trunk.patch TestRegionObserverBypass.testMulti() fails with {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.checkRowAndDelete(TestRegionObserverBypass.java:173) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.testMulti(TestRegionObserverBypass.java:166) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7605) TestMiniClusterLoadSequential fails in trunk build on hadoop 2
[ https://issues.apache.org/jira/browse/HBASE-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560995#comment-13560995 ] Ted Yu commented on HBASE-7605: --- Once addendum for HBASE-7268 goes in and this test passes on hadoop 2.0, I will resolve this issue. TestMiniClusterLoadSequential fails in trunk build on hadoop 2 -- Key: HBASE-7605 URL: https://issues.apache.org/jira/browse/HBASE-7605 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Priority: Critical Fix For: 0.96.0 From HBase-TRUNK-on-Hadoop-2.0.0 #354: loadTest[0](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[1](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[2](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds loadTest[3](org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential): test timed out after 12 milliseconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6829) [WINDOWS] Tests should ensure that HLog is closed
[ https://issues.apache.org/jira/browse/HBASE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6829: - Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed v4. Thanks for reviews guys. [WINDOWS] Tests should ensure that HLog is closed - Key: HBASE-6829 URL: https://issues.apache.org/jira/browse/HBASE-6829 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Fix For: 0.96.0 Attachments: hbase-6829_v1-0.94.patch, hbase-6829_v1-trunk.patch, hbase-6829_v2-0.94.patch, hbase-6829_v2-trunk.patch, hbase-6829_v3-0.94.patch, hbase-6829_v3-trunk.patch, hbase-6829_v4-trunk.patch, hbase-6829_v4-trunk.patch TestCacheOnWriteInSchema and TestCompactSelection fails with {code} java.io.IOException: Target HLog directory already exists: ./target/test-data/2d814e66-75d3-4c1b-92c7-a49d9972e8fd/TestCacheOnWriteInSchema/logs at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:385) at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:316) at org.apache.hadoop.hbase.regionserver.TestCacheOnWriteInSchema.setUp(TestCacheOnWriteInSchema.java:162) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6825) [WINDOWS] Java NIO socket channels does not work with Windows ipv6
[ https://issues.apache.org/jira/browse/HBASE-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561015#comment-13561015 ] Enis Soztutar commented on HBASE-6825: -- bq. If I'm not mistaken, the test uses a fixed port 8502, this should be changed, if not we can have random failures when running the test suites. That part of the test mimics the test in the java bug report (http://bugs.sun.com/view_bug.do?bug_id=6230761). I think it checks for BindException's and just passes, assuming an optimistic test. I agree that I should change the fixed port. bq. For the issue itself, why should it not make it to the main code? I mean, we test that, on windows, a critical feature works. If not we will have issues. This should be in main, not in test, no? Sorry, I failed to give enough context here. In the original patches, we did have a change for passing -Djava.net.preferIPv4Stack=true in the bin/hbase.cmd script. But I removed that one so that this patch would not depend on HBASE-6815. In the actual patch for HBASE-6815, we are passing preferipv4 to the hbase daemons through hbase.cmd script. In hbase-env.cmd: {code} +@rem Extra Java runtime options. +@rem Below are what we set by default. May only work with SUN JVM. +@rem For more on why as well as other possible settings, +@rem see http://wiki.apache.org/hadoop/PerformanceTuning +@rem JDK6 on Windows has a known bug for IPv6, use preferIPv4Stack unless JDK7. +@rem @rem See TestIPv6NIOServerSocketChannel. +set HBASE_OPTS=-XX:+UseConcMarkSweepGC -Djava.net.preferIPv4Stack=true {code} bq. And actually, this seems to be fixed in 1.6 u34 says http://www.oracle.com/technetwork/java/javase/documentation/overview-156328.html. Let me test this one. After HBASE-7301, we are running the tests with ipv4 on linux anyway, so we might as well commit this one regardless. [WINDOWS] Java NIO socket channels does not work with Windows ipv6 -- Key: HBASE-6825 URL: https://issues.apache.org/jira/browse/HBASE-6825 Project: HBase Issue Type: Sub-task Affects Versions: 0.94.3, 0.96.0 Environment: JDK6 on windows for ipv6. Reporter: Enis Soztutar Assignee: Enis Soztutar Attachments: hbase-6825_v3-0.94.patch, hbase-6825_v3-trunk.patch While running the test TestAdmin.testCheckHBaseAvailableClosesConnection(), I noticed that it takes very long, since it sleeps for 2sec * 500, because of zookeeper retries. The root cause of the problem is that ZK uses Java NIO to create ServerSorcket's from ServerSocketChannels. Under windows, the ipv4 and ipv6 is implemented independently, and Java seems that it cannot reuse the same socket channel for both ipv4 and ipv6 sockets. We are getting java.net.SocketException: Address family not supported by protocol family exceptions. When, ZK client resolves localhost, it gets both v4 127.0.0.1 and v6 ::1 address, but the socket channel cannot bind to both v4 and v6. The problem is reported as: http://bugs.sun.com/view_bug.do?bug_id=6230761 http://stackoverflow.com/questions/1357091/binding-an-ipv6-server-socket-on-windows Although the JDK bug is reported as resolved, I have tested with jdk1.6.0_33 without any success. Although JDK7 seems to have fixed this problem. In ZK, we can replace the ClientCnxnSocket implementation from ClientCnxnSocketNIO to a non-NIO one, but I am not sure that would be the way to go. Disabling ipv6 resolution of localhost is one other approach. I'll test it to see whether it will be any good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7594) TestLocalHBaseCluster failing on ubuntu2
[ https://issues.apache.org/jira/browse/HBASE-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-7594: -- Resolution: Fixed Fix Version/s: 0.96.0 Status: Resolved (was: Patch Available) Committed v5 patch with a tiny change to update Javadoc in HBaseTestingUtility for the new method. TestLocalHBaseCluster failing on ubuntu2 Key: HBASE-7594 URL: https://issues.apache.org/jira/browse/HBASE-7594 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.96.0 Attachments: 7594-1.patch, 7594-2.patch, 7594-3.patch, 7594-4.patch, 7594-5.patch {noformat} java.io.IOException: java.io.IOException: java.io.IOException: java.lang.InstantiationException: org.apache.hadoop.io.RawComparator at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:612) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:533) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4092) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4042) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:427) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:130) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: java.lang.InstantiationException: org.apache.hadoop.io.RawComparator at org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:450) at org.apache.hadoop.hbase.regionserver.HStore.init(HStore.java:215) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3060) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:585) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:583) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more Caused by: java.io.IOException: java.lang.InstantiationException: org.apache.hadoop.io.RawComparator at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.createComparator(FixedFileTrailer.java:607) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.createComparator(FixedFileTrailer.java:615) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.init(HFileReaderV2.java:115) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:564) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:599) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1294) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:525) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:628) at org.apache.hadoop.hbase.regionserver.HStore$1.call(HStore.java:426) at org.apache.hadoop.hbase.regionserver.HStore$1.call(HStore.java:422) ... 8 more Caused by: java.lang.InstantiationException: org.apache.hadoop.io.RawComparator at java.lang.Class.newInstance0(Class.java:340) at java.lang.Class.newInstance(Class.java:308) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.createComparator(FixedFileTrailer.java:605) ... 17 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7522) Tests should not be writing under /tmp/
[ https://issues.apache.org/jira/browse/HBASE-7522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561018#comment-13561018 ] Andrew Purtell commented on HBASE-7522: --- TestLocalHBaseCluster fixed in HBASE-7594 Tests should not be writing under /tmp/ --- Key: HBASE-7522 URL: https://issues.apache.org/jira/browse/HBASE-7522 Project: HBase Issue Type: Bug Affects Versions: 0.96.0, 0.94.5 Reporter: Enis Soztutar As per the discussion http://mail-archives.apache.org/mod_mbox/hbase-dev/201301.mbox/%3CCA%2BRK%3D_BmV%3Dvwws4VeDJVPt6hY7NKCDEafex3XTNam630pQRBbA%40mail.gmail.com%3E, tests should not be writing under /tmp/ directory. TestStoreFile is one of the offending ones. Some of them will be fixed at HBASE-6824. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6466) Enable multi-thread for memstore flush
[ https://issues.apache.org/jira/browse/HBASE-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561029#comment-13561029 ] Hudson commented on HBASE-6466: --- Integrated in HBase-TRUNK #3784 (See [https://builds.apache.org/job/HBase-TRUNK/3784/]) HBASE-6466 Enable multi-thread for memstore flush (Chunhui) (Revision 1437591) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Threads.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java Enable multi-thread for memstore flush -- Key: HBASE-6466 URL: https://issues.apache.org/jira/browse/HBASE-6466 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Priority: Critical Fix For: 0.96.0 Attachments: 6466-v6.patch, 6466-v7.patch, HBASE-6466.patch, HBASE-6466v2.patch, HBASE-6466v3.1.patch, HBASE-6466v3.patch, HBASE-6466-v4.patch, HBASE-6466-v4.patch, HBASE-6466-v5.patch If the KV is large or Hlog is closed with high-pressure putting, we found memstore is often above the high water mark and block the putting. So should we enable multi-thread for Memstore Flush? Some performance test data for reference, 1.test environment : random writting;upper memstore limit 5.6GB;lower memstore limit 4.8GB;400 regions per regionserver;row len=50 bytes, value len=1024 bytes;5 regionserver, 300 ipc handler per regionserver;5 client, 50 thread handler per client for writing 2.test results: one cacheFlush handler, tps: 7.8k/s per regionserver, Flush:10.1MB/s per regionserver, appears many aboveGlobalMemstoreLimit blocking two cacheFlush handlers, tps: 10.7k/s per regionserver, Flush:12.46MB/s per regionserver, 200 thread handler per client two cacheFlush handlers, tps:16.1k/s per regionserver, Flush:18.6MB/s per regionserver -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6832) [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing
[ https://issues.apache.org/jira/browse/HBASE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561051#comment-13561051 ] Hadoop QA commented on HBASE-6832: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566167/hbase-6832_v6-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 21 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4151//console This message is automatically generated. [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing Key: HBASE-6832 URL: https://issues.apache.org/jira/browse/HBASE-6832 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Fix For: 0.96.0 Attachments: hbase-6832_v1-0.94.patch, hbase-6832_v1-trunk.patch, hbase-6832_v4-0.94.patch, hbase-6832_v4-trunk.patch, hbase-6832_v5-trunk.patch, hbase-6832_v6-trunk.patch TestRegionObserverBypass.testMulti() fails with {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.checkRowAndDelete(TestRegionObserverBypass.java:173) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.testMulti(TestRegionObserverBypass.java:166) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561053#comment-13561053 ] Hadoop QA commented on HBASE-7382: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566165/HBASE-7382-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestLocalHBaseCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4150//console This message is automatically generated. Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6821) [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems in windows
[ https://issues.apache.org/jira/browse/HBASE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6821: - Summary: [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems in windows (was: [WINDOWS] .META. table name causes file system problems in windows) [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems in windows - Key: HBASE-6821 URL: https://issues.apache.org/jira/browse/HBASE-6821 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-4388-root.dir.tgz, hbase-6821_v2_0.94.patch, hbase-6821_v2-trunk.patch, TestMetaMigrationConvertToPB.tgz TestMetaMigrationRemovingHTD untars a cluster dir having a .META. subdirectory. This causes mvn clean to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6821) [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems on windows
[ https://issues.apache.org/jira/browse/HBASE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-6821: - Summary: [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems on windows (was: [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems in windows) [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems on windows - Key: HBASE-6821 URL: https://issues.apache.org/jira/browse/HBASE-6821 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Attachments: hbase-4388-root.dir.tgz, hbase-6821_v2_0.94.patch, hbase-6821_v2-trunk.patch, TestMetaMigrationConvertToPB.tgz TestMetaMigrationRemovingHTD untars a cluster dir having a .META. subdirectory. This causes mvn clean to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6821) [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems on windows
[ https://issues.apache.org/jira/browse/HBASE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar resolved HBASE-6821. -- Resolution: Fixed Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Committed this together with TestMetaMigrationConvertToPB.tgz. Removed the year in copyright notice per Ted's suggestion. Thanks for the reviews. [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems on windows - Key: HBASE-6821 URL: https://issues.apache.org/jira/browse/HBASE-6821 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Fix For: 0.96.0 Attachments: hbase-4388-root.dir.tgz, hbase-6821_v2_0.94.patch, hbase-6821_v2-trunk.patch, TestMetaMigrationConvertToPB.tgz TestMetaMigrationRemovingHTD untars a cluster dir having a .META. subdirectory. This causes mvn clean to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561065#comment-13561065 ] Ted Yu commented on HBASE-7382: --- @Himanshu: Can you take a look at the javadoc and fidnbugs warnings ? Thanks Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7268: Attachment: HBASE-7268-addendum-v0.patch correct local region location cache information can be overwritten (or deleted) w/stale information from an old server -- Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-addendum-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7268: Status: Patch Available (was: Reopened) Added patch. correct local region location cache information can be overwritten (or deleted) w/stale information from an old server -- Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-addendum-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6821) [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems on windows
[ https://issues.apache.org/jira/browse/HBASE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561085#comment-13561085 ] Hudson commented on HBASE-6821: --- Integrated in HBase-TRUNK #3785 (See [https://builds.apache.org/job/HBase-TRUNK/3785/]) HBASE-6821. [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems on windows (Revision 1437718) Result = FAILURE enis : Files : * /hbase/trunk/hbase-server/src/test/data/TestMetaMigrationConvertToPB.README * /hbase/trunk/hbase-server/src/test/data/TestMetaMigrationConvertToPB.tgz * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaMigrationConvertingToPB.java [WINDOWS] In TestMetaMigrationConvertingToPB .META. table name causes file system problems on windows - Key: HBASE-6821 URL: https://issues.apache.org/jira/browse/HBASE-6821 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Fix For: 0.96.0 Attachments: hbase-4388-root.dir.tgz, hbase-6821_v2_0.94.patch, hbase-6821_v2-trunk.patch, TestMetaMigrationConvertToPB.tgz TestMetaMigrationRemovingHTD untars a cluster dir having a .META. subdirectory. This causes mvn clean to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7594) TestLocalHBaseCluster failing on ubuntu2
[ https://issues.apache.org/jira/browse/HBASE-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561086#comment-13561086 ] Hudson commented on HBASE-7594: --- Integrated in HBase-TRUNK #3785 (See [https://builds.apache.org/job/HBase-TRUNK/3785/]) HBASE-7594. TestLocalHBaseCluster failing on ubuntu2 (Revision 1437658) Result = FAILURE apurtell : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/FixedFileTrailer.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestLocalHBaseCluster.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java TestLocalHBaseCluster failing on ubuntu2 Key: HBASE-7594 URL: https://issues.apache.org/jira/browse/HBASE-7594 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.96.0 Attachments: 7594-1.patch, 7594-2.patch, 7594-3.patch, 7594-4.patch, 7594-5.patch {noformat} java.io.IOException: java.io.IOException: java.io.IOException: java.lang.InstantiationException: org.apache.hadoop.io.RawComparator at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:612) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:533) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4092) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4042) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:427) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:130) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: java.lang.InstantiationException: org.apache.hadoop.io.RawComparator at org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:450) at org.apache.hadoop.hbase.regionserver.HStore.init(HStore.java:215) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3060) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:585) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:583) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more Caused by: java.io.IOException: java.lang.InstantiationException: org.apache.hadoop.io.RawComparator at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.createComparator(FixedFileTrailer.java:607) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.createComparator(FixedFileTrailer.java:615) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.init(HFileReaderV2.java:115) at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:564) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:599) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1294) at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:525) at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:628) at org.apache.hadoop.hbase.regionserver.HStore$1.call(HStore.java:426) at org.apache.hadoop.hbase.regionserver.HStore$1.call(HStore.java:422) ... 8 more Caused by: java.lang.InstantiationException: org.apache.hadoop.io.RawComparator at java.lang.Class.newInstance0(Class.java:340) at java.lang.Class.newInstance(Class.java:308) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.createComparator(FixedFileTrailer.java:605) ... 17 more {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6832) [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing
[ https://issues.apache.org/jira/browse/HBASE-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561087#comment-13561087 ] Hudson commented on HBASE-6832: --- Integrated in HBase-TRUNK #3785 (See [https://builds.apache.org/job/HBase-TRUNK/3785/]) HBASE-6832. [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing (Revision 1437643) Result = FAILURE enis : Files : * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/IncrementingEnvironmentEdge.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverBypass.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestScannerSelectionUsingTTL.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestLogsCleaner.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestIncrementingEnvironmentEdge.java [WINDOWS] Tests should use explicit timestamp for Puts, and not rely on implicit RS timing Key: HBASE-6832 URL: https://issues.apache.org/jira/browse/HBASE-6832 Project: HBase Issue Type: Bug Affects Versions: 0.94.3, 0.96.0 Reporter: Enis Soztutar Assignee: Enis Soztutar Labels: windows Fix For: 0.96.0 Attachments: hbase-6832_v1-0.94.patch, hbase-6832_v1-trunk.patch, hbase-6832_v4-0.94.patch, hbase-6832_v4-trunk.patch, hbase-6832_v5-trunk.patch, hbase-6832_v6-trunk.patch TestRegionObserverBypass.testMulti() fails with {code} java.lang.AssertionError: expected:1 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.checkRowAndDelete(TestRegionObserverBypass.java:173) at org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass.testMulti(TestRegionObserverBypass.java:166) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6816) [WINDOWS] line endings on checkout for .sh files
[ https://issues.apache.org/jira/browse/HBASE-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561089#comment-13561089 ] Hudson commented on HBASE-6816: --- Integrated in HBase-TRUNK #3785 (See [https://builds.apache.org/job/HBase-TRUNK/3785/]) HBASE-6816. [WINDOWS] line endings on checkout for .sh files (Revision 1437642) Result = FAILURE enis : Files : * /hbase/trunk/.gitattributes * /hbase/trunk/src/site/resources/images/hbase_logo.svg [WINDOWS] line endings on checkout for .sh files Key: HBASE-6816 URL: https://issues.apache.org/jira/browse/HBASE-6816 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: hbase-16_v1.patch, hbase-6816_v1.patch On code checkout from svn or git, we need to ensure that the line endings for .sh files are LF, so that they work with cygwin. This is important for getting src/saveVersion.sh to work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561099#comment-13561099 ] Ted Yu commented on HBASE-7382: --- I looked at the javadoc warnings and only saw warnings about Bytes.java Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1356#comment-1356 ] Ted Yu commented on HBASE-7268: --- Looks good to me. I ran TestMiniClusterLoadSequential and TestMiniClusterLoadParallel against hadoop 2.0 - they passed. correct local region location cache information can be overwritten (or deleted) w/stale information from an old server -- Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-addendum-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561131#comment-13561131 ] Himanshu Vashishtha commented on HBASE-7382: How do you dig/fix findbugs warnings Ted? Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7651) RegionServerSnapshotManager does not accept subsquent snapshots if previous fails with NotServingRegionException.
Jonathan Hsieh created HBASE-7651: - Summary: RegionServerSnapshotManager does not accept subsquent snapshots if previous fails with NotServingRegionException. Key: HBASE-7651 URL: https://issues.apache.org/jira/browse/HBASE-7651 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-7290 Reporter: Jonathan Hsieh Priority: Blocker I've reproduced this problem consistently on a 20 node cluster. The first run fails on a node (jon-snaphots-2 in this case) to take snapshot due to a NotServingRegionException (this is acceptable) {code} 2013-01-23 13:32:48,631 DEBUG org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher: accepting received exception org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs.abort(ZKProcedureCoordinatorRpcs.java:240) at org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs$1.nodeCreated(ZKProcedureCoordinatorRpcs.java:182) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:294) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:343) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:107) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:123) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-01-23 13:32:48,631 DEBUG org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher: Recieved error, notifying listeners... 2013-01-23 13:32:48,730 ERROR org.apache.hadoop.hbase.procedure.Procedure: Procedure 'pe-6' execution failed! org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:84) at org.apache.hadoop.hbase.procedure.Procedure.waitForLatch(Procedure.java:357) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:203) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:343) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:107) at
[jira] [Assigned] (HBASE-7651) RegionServerSnapshotManager does not accept subsquent snapshots if previous fails with NotServingRegionException.
[ https://issues.apache.org/jira/browse/HBASE-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-7651: - Assignee: Jonathan Hsieh RegionServerSnapshotManager does not accept subsquent snapshots if previous fails with NotServingRegionException. - Key: HBASE-7651 URL: https://issues.apache.org/jira/browse/HBASE-7651 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-7290 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker I've reproduced this problem consistently on a 20 node cluster. The first run fails on a node (jon-snaphots-2 in this case) to take snapshot due to a NotServingRegionException (this is acceptable) {code} 2013-01-23 13:32:48,631 DEBUG org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher: accepting received exception org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs.abort(ZKProcedureCoordinatorRpcs.java:240) at org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs$1.nodeCreated(ZKProcedureCoordinatorRpcs.java:182) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:294) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:343) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:107) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:123) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-01-23 13:32:48,631 DEBUG org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher: Recieved error, notifying listeners... 2013-01-23 13:32:48,730 ERROR org.apache.hadoop.hbase.procedure.Procedure: Procedure 'pe-6' execution failed! org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:84) at org.apache.hadoop.hbase.procedure.Procedure.waitForLatch(Procedure.java:357) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:203) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at
[jira] [Commented] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561137#comment-13561137 ] Ted Yu commented on HBASE-7382: --- One way is to go through hbase-server findbugs xml, looking for the files (and lines) touched by your patch. The other way is to diff https://builds.apache.org/job/PreCommit-HBASE-Build/4149/artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.xml with https://builds.apache.org/job/PreCommit-HBASE-Build/4150/artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.xml This should narrow your search. Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7643) HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss
[ https://issues.apache.org/jira/browse/HBASE-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561143#comment-13561143 ] Matteo Bertozzi commented on HBASE-7643: committed p4-v5 to the snapshot branch, to have more coverage (jenkins, test rig, ...) I'll commit it to trunk in a couple of days if everything is fine and there're no objections. HFileArchiver.resolveAndArchive() race condition may lead to snapshot data loss --- Key: HBASE-7643 URL: https://issues.apache.org/jira/browse/HBASE-7643 Project: HBase Issue Type: Bug Affects Versions: hbase-6055, 0.96.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 0.96.0, 0.94.5 Attachments: HBASE-7653-p4-v0.patch, HBASE-7653-p4-v1.patch, HBASE-7653-p4-v2.patch, HBASE-7653-p4-v3.patch, HBASE-7653-p4-v4.patch, HBASE-7653-p4-v5.patch * The master have an hfile cleaner thread (that is responsible for cleaning the /hbase/.archive dir) ** /hbase/.archive/table/region/family/hfile ** if the family/region/family directory is empty the cleaner removes it * The master can archive files (from another thread, e.g. DeleteTableHandler) * The region can archive files (from another server/process, e.g. compaction) The simplified file archiving code looks like this: {code} HFileArchiver.resolveAndArchive(...) { // ensure that the archive dir exists fs.mkdir(archiveDir); // move the file to the archiver success = fs.rename(originalPath/fileName, archiveDir/fileName) // if the rename is failed, delete the file without archiving if (!success) fs.delete(originalPath/fileName); } {code} Since there's no synchronization between HFileArchiver.resolveAndArchive() and the cleaner run (different process, thread, ...) you can end up in the situation where you are moving something in a directory that doesn't exists. {code} fs.mkdir(archiveDir); // HFileCleaner chore starts at this point // and the archiveDirectory that we just ensured to be present gets removed. // The rename at this point will fail since the parent directory is missing. success = fs.rename(originalPath/fileName, archiveDir/fileName) {code} The bad thing of deleting the file without archiving is that if you've a snapshot that relies on the file to be present, or you've a clone table that relies on that file is that you're losing data. Possible solutions * Create a ZooKeeper lock, to notify the master (Hey I'm archiving something, wait a bit) * Add a RS - Master call to let the master removes files and avoid this kind of situations * Avoid to remove empty directories from the archive if the table exists or is not disabled * Add a try catch around the fs.rename The last one, the easiest one, looks like: {code} for (int i = 0; i retries; ++i) { // ensure archive directory to be present fs.mkdir(archiveDir); // possible race - // try to archive file success = fs.rename(originalPath/fileName, archiveDir/fileName); if (success) break; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561144#comment-13561144 ] Hadoop QA commented on HBASE-7268: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566190/HBASE-7268-addendum-v0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4152//console This message is automatically generated. correct local region location cache information can be overwritten (or deleted) w/stale information from an old server -- Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-addendum-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7651) RegionServerSnapshotManager does not accept subsquent snapshots if previous fails with NotServingRegionException.
[ https://issues.apache.org/jira/browse/HBASE-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561156#comment-13561156 ] Jonathan Hsieh commented on HBASE-7651: --- NSRE's are possible with this snapshotting implementation (master gets a list of regions/regionservers to care about, regions move, and then the snapshotting request is sent to the rs's.) Restarting the particular node (jon-snapshot-2 from the example) fixes the problem but when the next NSRE pops up elsewhere we get stuck again. RegionServerSnapshotManager does not accept subsquent snapshots if previous fails with NotServingRegionException. - Key: HBASE-7651 URL: https://issues.apache.org/jira/browse/HBASE-7651 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-7290 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker I've reproduced this problem consistently on a 20 node cluster. The first run fails on a node (jon-snaphots-2 in this case) to take snapshot due to a NotServingRegionException (this is acceptable) {code} 2013-01-23 13:32:48,631 DEBUG org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher: accepting received exception org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs.abort(ZKProcedureCoordinatorRpcs.java:240) at org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs$1.nodeCreated(ZKProcedureCoordinatorRpcs.java:182) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:294) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:343) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:107) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:123) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-01-23 13:32:48,631 DEBUG org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher: Recieved error, notifying listeners... 2013-01-23 13:32:48,730 ERROR org.apache.hadoop.hbase.procedure.Procedure: Procedure 'pe-6' execution failed! org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:84) at org.apache.hadoop.hbase.procedure.Procedure.waitForLatch(Procedure.java:357) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:203) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at
[jira] [Updated] (HBASE-7114) Increment does not extend Mutation but probably should
[ https://issues.apache.org/jira/browse/HBASE-7114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-7114: -- Affects Version/s: 0.96.0 Increment does not extend Mutation but probably should -- Key: HBASE-7114 URL: https://issues.apache.org/jira/browse/HBASE-7114 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.0 Reporter: Andrew Purtell Priority: Minor Increment is the only operation in the class of mutators that does not extend Mutation. It mostly duplicates what Mutation provides, but not quite. The signatures for setWriteToWAL and getFamilyMap are slightly different. This can be inconvenient because it requires special case code and therefore could be considered an API design nit. Unfortunately it is not a simple change: The interface is marked stable and the internals of the family map are different from other mutation types. The latter is why I suspect this was not addressed when Mutation was introduced. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5664) CP hooks in Scan flow for fast forward when filter filters out a row
[ https://issues.apache.org/jira/browse/HBASE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561170#comment-13561170 ] Andrew Purtell commented on HBASE-5664: --- +1 if this is what you need Anoop. CP hooks in Scan flow for fast forward when filter filters out a row Key: HBASE-5664 URL: https://issues.apache.org/jira/browse/HBASE-5664 Project: HBase Issue Type: Improvement Components: Coprocessors, Filters Affects Versions: 0.92.1 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.96.0, 0.94.5 Attachments: HBASE-5664_94.patch, HBASE-5664_94_V2.patch, HBASE-5664_Trunk.patch, HBASE-5664_Trunk_V2.patch In HRegion.nextInternal(int limit, String metric) We have while(true) loop so as to fetch a next result which satisfies filter condition. When Filter filters out the current fetched row we call nextRow(byte [] currentRow) before going with the next row. {code} if (results.isEmpty() || filterRow()) { // this seems like a redundant step - we already consumed the row // there're no left overs. // the reasons for calling this method are: // 1. reset the filters. // 2. provide a hook to fast forward the row (used by subclasses) nextRow(currentRow); {code} // 2. provide a hook to fast forward the row (used by subclasses) We can provide same feature of fast forward support for the CP also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7403) Online Merge
[ https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561190#comment-13561190 ] Ted Yu commented on HBASE-7403: --- @Chunhui: Looks like you have 2 +1's already. Online Merge Key: HBASE-7403 URL: https://issues.apache.org/jira/browse/HBASE-7403 Project: HBase Issue Type: New Feature Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.96.0, 0.94.5 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch, hbase-7403-trunkv11.patch, hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, merge region.pdf The feature of this online merge: 1.Online,no necessary to disable table 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90 3.Easy to call merege request, no need to input a long region name, only encoded name enough 4.No limit when operation, you don't need to tabke care the events like Server Dead, Balance, Split, Disabing/Enabing table, no need to take care whether you send a wrong merge request, it has alread done for you 5.Only little offline time for two merging regions Usage: 1.Tool: bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] table-name region-encodedname-1 region-encodedname-2 2.API: static void MergeManager#createMergeRequest We need merge in the following cases: 1.Region hole or region overlap, can’t be fix by hbck 2.Region become empty because of TTL and not reasonable Rowkey design 3.Region is always empty or very small because of presplit when create table 4.Too many empty or small regions would reduce the system performance(e.g. mslab) Current merge tools only support offline and are not able to redo if exception is thrown in the process of merging, causing a dirty data For online system, we need a online merge. This implement logic of this patch for Online Merge is : For example, merge regionA and regionB into regionC 1.Offline the two regions A and B 2.Merge the two regions in the HDFS(Create regionC’s directory, move regionA’s and regionB’s file to regionC’s directory, delete regionA’s and regionB’s directory) 3.Add the merged regionC to .META. 4.Assign the merged regionC As design of this patch , once we do the merge work in the HDFS,we could redo it until successful if it throws exception or abort or server restart, but couldn’t be rolled back. It depends on Use zookeeper to record the transaction journal state, make redo easier Use zookeeper to send/receive merge request Merge transaction is executed on the master Support calling merge request through API or shell tool About the merge process, please see the attachment and patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7651) RegionServerSnapshotManager does not accept subsquent snapshots if previous fails with NotServingRegionException.
[ https://issues.apache.org/jira/browse/HBASE-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561198#comment-13561198 ] Ted Yu commented on HBASE-7651: --- Line 343 in RegionServerSnapshotManager#waitForOutstandingTasks(): {code} LOG.warn(cancelling region task); f.cancel(true); {code} Shall we pass false to cancel() ? RegionServerSnapshotManager does not accept subsquent snapshots if previous fails with NotServingRegionException. - Key: HBASE-7651 URL: https://issues.apache.org/jira/browse/HBASE-7651 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-7290 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker I've reproduced this problem consistently on a 20 node cluster. The first run fails on a node (jon-snaphots-2 in this case) to take snapshot due to a NotServingRegionException (this is acceptable) {code} 2013-01-23 13:32:48,631 DEBUG org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher: accepting received exception org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:184) at org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs.abort(ZKProcedureCoordinatorRpcs.java:240) at org.apache.hadoop.hbase.procedure.ZKProcedureCoordinatorRpcs$1.nodeCreated(ZKProcedureCoordinatorRpcs.java:182) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:294) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:343) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:107) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:123) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-01-23 13:32:48,631 DEBUG org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher: Recieved error, notifying listeners... 2013-01-23 13:32:48,730 ERROR org.apache.hadoop.hbase.procedure.Procedure: Procedure 'pe-6' execution failed! org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via jon-snapshots-2.ent.cloudera.com,22101,1358976524369:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: org.apache.hadoop.hbase.NotServingRegionException: TestTable,0002493652,1358976652443.b858147ad87a7812ac9a73dd8fef36ad. is closing at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:84) at org.apache.hadoop.hbase.procedure.Procedure.waitForLatch(Procedure.java:357) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:203) at org.apache.hadoop.hbase.procedure.Procedure.call(Procedure.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by:
[jira] [Commented] (HBASE-5664) CP hooks in Scan flow for fast forward when filter filters out a row
[ https://issues.apache.org/jira/browse/HBASE-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561203#comment-13561203 ] Ted Yu commented on HBASE-5664: --- Integrated to trunk. Thanks for the patch, Anoop. Thanks for the review, Andy. CP hooks in Scan flow for fast forward when filter filters out a row Key: HBASE-5664 URL: https://issues.apache.org/jira/browse/HBASE-5664 Project: HBase Issue Type: Improvement Components: Coprocessors, Filters Affects Versions: 0.92.1 Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 0.96.0, 0.94.5 Attachments: HBASE-5664_94.patch, HBASE-5664_94_V2.patch, HBASE-5664_Trunk.patch, HBASE-5664_Trunk_V2.patch In HRegion.nextInternal(int limit, String metric) We have while(true) loop so as to fetch a next result which satisfies filter condition. When Filter filters out the current fetched row we call nextRow(byte [] currentRow) before going with the next row. {code} if (results.isEmpty() || filterRow()) { // this seems like a redundant step - we already consumed the row // there're no left overs. // the reasons for calling this method are: // 1. reset the filters. // 2. provide a hook to fast forward the row (used by subclasses) nextRow(currentRow); {code} // 2. provide a hook to fast forward the row (used by subclasses) We can provide same feature of fast forward support for the CP also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7649) client retry timeout doesn't need to do x2 fallback when going to different server
[ https://issues.apache.org/jira/browse/HBASE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7649: Attachment: HBASE-7649-v0.patch Attaching patch that tracks retries by server. client retry timeout doesn't need to do x2 fallback when going to different server -- Key: HBASE-7649 URL: https://issues.apache.org/jira/browse/HBASE-7649 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7649-v0.patch See HBASE-7520. When we go to server A, get a bunch of failures, then finally learn the region is on B it doesn't make sense to wait for 30 seconds before going to B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7649) client retry timeout doesn't need to do x2 fallback when going to different server
[ https://issues.apache.org/jira/browse/HBASE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7649: Status: Patch Available (was: Open) client retry timeout doesn't need to do x2 fallback when going to different server -- Key: HBASE-7649 URL: https://issues.apache.org/jira/browse/HBASE-7649 Project: HBase Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7649-v0.patch See HBASE-7520. When we go to server A, get a bunch of failures, then finally learn the region is on B it doesn't make sense to wait for 30 seconds before going to B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten (or deleted) w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561224#comment-13561224 ] Sergey Shelukhin commented on HBASE-7268: - should this be ok to commit? correct local region location cache information can be overwritten (or deleted) w/stale information from an old server -- Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: 7268-v6.patch, 7268-v8.patch, HBASE-7268-addendum-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v2-plus-masterTs.patch, HBASE-7268-v3.patch, HBASE-7268-v4.patch, HBASE-7268-v5.patch, HBASE-7268-v6.patch, HBASE-7268-v7.patch, HBASE-7268-v8.patch, HBASE-7268-v9.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7382) Port ZK.multi support from HBASE-6775 to 0.96
[ https://issues.apache.org/jira/browse/HBASE-7382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561236#comment-13561236 ] Ted Yu commented on HBASE-7382: --- The 3 additional findbugs warnings are the following: {code} BugInstance type=HE_EQUALS_USE_HASHCODE priority=1 abbrev=HE category=BAD_PRACTICE Class classname=org.apache.hadoop.hbase.zookeeper.ZKUtil$ZKUtilOp$CreateAndFailSilent ... BugInstance type=HE_EQUALS_USE_HASHCODE priority=1 abbrev=HE category=BAD_PRACTICE Class classname=org.apache.hadoop.hbase.zookeeper.ZKUtil$ZKUtilOp$DeleteNodeFailSilent ... BugInstance type=HE_EQUALS_USE_HASHCODE priority=1 abbrev=HE category=BAD_PRACTICE Class classname=org.apache.hadoop.hbase.zookeeper.ZKUtil$ZKUtilOp$SetData ... {code} Please refer to newPatchFindbugsWarningshbase-server.xml from PreCommit build 4150 for details. Port ZK.multi support from HBASE-6775 to 0.96 - Key: HBASE-7382 URL: https://issues.apache.org/jira/browse/HBASE-7382 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Gregory Chanan Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.96.0 Attachments: HBASE-7382-trunk.patch HBASE-6775 adds support for ZK.multi ZKUtil and uses it for the 0.92/0.94 compatibility fix implemented in HBASE-6710. ZK.multi support is most likely useful in 0.96, but since HBASE-6710 is not relevant for 0.96, perhaps we should find another use case first before we port. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira