[jira] [Created] (HBASE-26812) ShortCircuitingClusterConnection fails to close RegionScanners when making short-circuited calls
Lars Hofhansl created HBASE-26812: - Summary: ShortCircuitingClusterConnection fails to close RegionScanners when making short-circuited calls Key: HBASE-26812 URL: https://issues.apache.org/jira/browse/HBASE-26812 Project: HBase Issue Type: Bug Affects Versions: 2.4.9 Reporter: Lars Hofhansl Just ran into this on the Phoenix side. We retrieve a Connection via {{RegionCoprocessorEnvironment.createConnection... getTable(...)}}. And then call get on that table. The Get's key happens to local. Now each call to table.get() leaves an open StoreScanner around forever. (verified with a memory profiler). There references are held via RegionScannerImpl.storeHeap.scannersForDelayedClose. Eventially the RegionServer goes a GC of death. The reason appears to be that in this case there is currentCall context. Some time in 2.x the Rpc handler/call was made responsible for closing open region scanners, but we forgot to handle {{ShortCircuitingClusterConnection}} It's not immediately clear how to fix this. But it does make ShortCircuitingClusterConnection useless and dangerous. If you use it, you *will* create a giant memory leak. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-24742) Improve performance of SKIP vs SEEK logic
[ https://issues.apache.org/jira/browse/HBASE-24742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-24742. --- Resolution: Fixed Also pushed to branch-2 and master. > Improve performance of SKIP vs SEEK logic > - > > Key: HBASE-24742 > URL: https://issues.apache.org/jira/browse/HBASE-24742 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.4.0 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0 > > Attachments: 24742-master.txt, hbase-1.6-regression-flame-graph.png, > hbase-24742-branch-1.txt > > > In our testing of HBase 1.3 against the current tip of branch-1 we saw a 30% > slowdown in scanning scenarios. > We tracked it back to HBASE-17958 and HBASE-19863. > Both add comparisons to one of the tightest HBase has. > [~bharathv] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24742) Improve performance of SKIP vs SEEK logic
[ https://issues.apache.org/jira/browse/HBASE-24742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reopened HBASE-24742: --- Lemme put this into branch-2 and master as well. > Improve performance of SKIP vs SEEK logic > - > > Key: HBASE-24742 > URL: https://issues.apache.org/jira/browse/HBASE-24742 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.4.0 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl >Priority: Major > Fix For: 1.7.0 > > Attachments: hbase-1.6-regression-flame-graph.png, > hbase-24742-branch-1.txt > > > In our testing of HBase 1.3 against the current tip of branch-1 we saw a 30% > slowdown in scanning scenarios. > We tracked it back to HBASE-17958 and HBASE-19863. > Both add comparisons to one of the tightest HBase has. > [~bharathv] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24742) Improve performance of SKIP vs SEEK logic
[ https://issues.apache.org/jira/browse/HBASE-24742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-24742. --- Resolution: Fixed > Improve performance of SKIP vs SEEK logic > - > > Key: HBASE-24742 > URL: https://issues.apache.org/jira/browse/HBASE-24742 > Project: HBase > Issue Type: Bug > Components: Performance, regionserver >Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.4.0 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl >Priority: Major > Fix For: 1.7.0 > > Attachments: hbase-1.6-regression-flame-graph.png, > hbase-24742-branch-1.txt > > > In our testing of HBase 1.3 against the current tip of branch-1 we saw a 30% > slowdown in scanning scenarios. > We tracked it back to HBASE-17958 and HBASE-19863. > Both add comparisons to one of the tightest HBase has. > [~bharathv] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24742) Improve performance of SKIP vs SEEK logic
Lars Hofhansl created HBASE-24742: - Summary: Improve performance of SKIP vs SEEK logic Key: HBASE-24742 URL: https://issues.apache.org/jira/browse/HBASE-24742 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl In our testing of HBase 1.3 against the current tip of branch-1 we saw a 30% slowdown in scanning scenarios. We tracked it back to HBASE-17958 and HBASE-19863. Both add comparisons to one of the tightest HBase has. [~bharathv] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-9272) A parallel, unordered scanner
[ https://issues.apache.org/jira/browse/HBASE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-9272. -- Resolution: Won't Fix > A parallel, unordered scanner > - > > Key: HBASE-9272 > URL: https://issues.apache.org/jira/browse/HBASE-9272 > Project: HBase > Issue Type: New Feature > Reporter: Lars Hofhansl >Priority: Minor > Attachments: 9272-0.94-v2.txt, 9272-0.94-v3.txt, 9272-0.94-v4.txt, > 9272-0.94.txt, 9272-trunk-v2.txt, 9272-trunk-v3.txt, 9272-trunk-v3.txt, > 9272-trunk-v4.txt, 9272-trunk.txt, ParallelClientScanner.java, > ParallelClientScanner.java > > > The contract of ClientScanner is to return rows in sort order. That limits > the order in which region can be scanned. > I propose a simple ParallelScanner that does not have this requirement and > queries regions in parallel, return whatever gets returned first. > This is generally useful for scans that filter a lot of data on the server, > or in cases where the client can very quickly react to the returned data. > I have a simple prototype (doesn't do error handling right, and might be a > bit heavy on the synchronization side - it used a BlockingQueue to hand data > between the client using the scanner and the threads doing the scanning, it > also could potentially starve some scanners long enugh to time out at the > server). > On the plus side, it's only a 130 lines of code. :) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-6970) hbase-deamon.sh creates/updates pid file even when that start failed.
[ https://issues.apache.org/jira/browse/HBASE-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-6970. -- Resolution: Won't Fix > hbase-deamon.sh creates/updates pid file even when that start failed. > - > > Key: HBASE-6970 > URL: https://issues.apache.org/jira/browse/HBASE-6970 > Project: HBase > Issue Type: Bug > Components: Usability >Reporter: Lars Hofhansl >Priority: Major > > We just ran into a strange issue where could neither start nor stop services > with hbase-deamon.sh. > The problem is this: > {code} > nohup nice -n $HBASE_NICENESS "$HBASE_HOME"/bin/hbase \ > --config "${HBASE_CONF_DIR}" \ > $command "$@" $startStop > "$logout" 2>&1 < /dev/null & > echo $! > $pid > {code} > So the pid file is created or updated even when the start of the service > failed. The next stop command will then fail, because the pid file has the > wrong pid in it. > Edit: Spelling and more spelling errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-13751) Refactoring replication WAL reading logic as WAL Iterator
[ https://issues.apache.org/jira/browse/HBASE-13751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-13751. --- Resolution: Won't Fix > Refactoring replication WAL reading logic as WAL Iterator > - > > Key: HBASE-13751 > URL: https://issues.apache.org/jira/browse/HBASE-13751 > Project: HBase > Issue Type: Brainstorming > Reporter: Lars Hofhansl >Priority: Major > > The current replication code is all over the place. > A simple refactoring that we could consider is to factor out the part that > reads from the WALs. Could be a simple iterator interface with one additional > wrinkle: The iterator needs to be able to provide the position (file and > offset) of the last read edit. > Once we have this, we use this as a building block to many other changes in > the replication code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-14014) Explore row-by-row grouping options
[ https://issues.apache.org/jira/browse/HBASE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14014. --- Resolution: Won't Fix > Explore row-by-row grouping options > --- > > Key: HBASE-14014 > URL: https://issues.apache.org/jira/browse/HBASE-14014 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Lars Hofhansl >Priority: Major > > See discussion in parent. > We need to considering the following attributes of WALKey: > * The cluster ids > * Table Name > * write time (here we could use the latest of any batch) > * seqNum > As long as we preserve these we can rearrange the cells between WALEdits. > Since seqNum is unique this will be a challenge. Currently it is not used, > but we shouldn't design anything that prevents us guaranteeing better > ordering guarantees using seqNum. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-14509) Configurable sparse indexes?
[ https://issues.apache.org/jira/browse/HBASE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14509. --- Resolution: Won't Fix > Configurable sparse indexes? > > > Key: HBASE-14509 > URL: https://issues.apache.org/jira/browse/HBASE-14509 > Project: HBase > Issue Type: Brainstorming > Reporter: Lars Hofhansl >Priority: Major > > This idea just popped up today and I wanted to record it for discussion: > What if we kept sparse column indexes per region or HFile or per configurable > range? > I.e. For any given CQ we record the lowest and highest value for a particular > range (HFile, Region, or a custom range like the Phoenix guide post). > By tweaking the size of these ranges we can control the size of the index, vs > its selectivity. > For example if we kept it by HFile we can almost instantly decide whether we > need scan a particular HFile at all to find a particular value in a Cell. > We can also collect min/max values for each n MB of data, for example when we > can the region the first time. Assuming ranges are large enough we can always > keep the index in memory together with the region. > Kind of a sparse local index. Might much easier than the buddy region stuff > we've been discussing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23364) HRegionServer sometimes does not shut down.
[ https://issues.apache.org/jira/browse/HBASE-23364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-23364. --- Fix Version/s: 1.6.0 2.3.0 3.0.0 Resolution: Fixed Committed to branch-1, branch-2, and master. > HRegionServer sometimes does not shut down. > --- > > Key: HBASE-23364 > URL: https://issues.apache.org/jira/browse/HBASE-23364 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 1.6.0 >Reporter: Lars Hofhansl > Assignee: Lars Hofhansl >Priority: Major > Fix For: 3.0.0, 2.3.0, 1.6.0 > > Attachments: 23364-branch-1.txt > > > Note that I initially assumed this to be a Phoenix bug. But I tracked it down > to HBase. > > I noticed that recently only. Latest build from HBase's branch-1 and latest > build from Phoenix' 4.x-HBase-1.5. I don't know, yet, whether it's a Phoenix > or an HBase issues. > Just filing it here for later reference. > jstack show this thread as the only non-daemon thread: > {code:java} > "pool-11-thread-1" #470 prio=5 os_prio=0 tid=0x558a709a4800 nid=0x238e > waiting on condition [0x7f213ad68000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00058eafece8> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} > No other information. Somebody created a thread pool somewhere and forgot to > set the threads to daemon or is not shutting down the pool properly. > Edit: I looked for other reference of the locked objects in the stack dump, > but didn't find any. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23279) Switch default block encoding to ROW_INDEX_V1
Lars Hofhansl created HBASE-23279: - Summary: Switch default block encoding to ROW_INDEX_V1 Key: HBASE-23279 URL: https://issues.apache.org/jira/browse/HBASE-23279 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Currently we set both block encoding and compression to NONE. ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles are slightly larger about 3% or so). I think that would a better default than NONE. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23240) branch-1 master and regionservers do not start when compiled against Hadoop 3.2.1
Lars Hofhansl created HBASE-23240: - Summary: branch-1 master and regionservers do not start when compiled against Hadoop 3.2.1 Key: HBASE-23240 URL: https://issues.apache.org/jira/browse/HBASE-23240 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1679) at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:339) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:572) at org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:174) at org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:156) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:127) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-22457) Harden rhe HBase HFile reader reference counting
Lars Hofhansl created HBASE-22457: - Summary: Harden rhe HBase HFile reader reference counting Key: HBASE-22457 URL: https://issues.apache.org/jira/browse/HBASE-22457 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl The problem that any coprocessor hook that replaces a passed scanner without closing it can cause an incorrect reference count. This was bad and wrong before of course, but now it has pretty bad consequences, since an incorrect reference could will prevent HFiles from being archived indefinitely. All hooks that are passed a scanner and return a scanner are suspect, since the returned scanner may or may not close the passed scanner: * preCompact * preCompactScannerOpen * preFlush * preFlushScannerOpen * preScannerOpen * preStoreScannerOpen * preStoreFileReaderOpen...? (not sure about this one, it could mess with the reader) I sampled the Phoenix and also Tephra code, and found a few instances where this is happening. And for those I filed issued: TEPHRA-300, PHOENIX-5291 (We're not using Tephra) The Phoenix ones should be rare. In our case we are seeing readers with refCount > 1000. Perhaps there are other issues, a path where not all exceptions are caught and scanner is left open that way perhaps. (Generally I am not a fan of reference counting in complex systems - it's too easy to miss something. But that's a different discussion. :) ). Let's brainstorm some way in which we can harden this. [~ram_krish], [~anoop.hbase], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22385) Consider "programmatic" HFiles
Lars Hofhansl created HBASE-22385: - Summary: Consider "programmatic" HFiles Key: HBASE-22385 URL: https://issues.apache.org/jira/browse/HBASE-22385 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl For various use case (among other there is mass deletes) it would be great if HBase had a mechanism for programmatic HFiles. I.e. HFiles (with HFileScanner and Reader) that produce KeyValue just like any other old HFile, but the key values produced are generated or produced by some other means rather than being physically read from some storage medium. In fact this could be a generalization for the various HFiles we have: (Normal) HFiles, HFileLinks, HalfStoreFiles, etc. A simple way could be to allow for storing a classname into the HFile. Upon reading the HFile HBase would instantiate an instance of that class and that instance is responsible for all further interaction with that HFile. For normal HFiles it would just be the normal HFileReader. (Remember this is Brainstorming) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22235) OperationStatus.{SUCCESS|FAILURE|NOT_RUN} are not visible to 3rd party coprocessors
Lars Hofhansl created HBASE-22235: - Summary: OperationStatus.{SUCCESS|FAILURE|NOT_RUN} are not visible to 3rd party coprocessors Key: HBASE-22235 URL: https://issues.apache.org/jira/browse/HBASE-22235 Project: HBase Issue Type: Bug Components: Coprocessors Reporter: Lars Hofhansl This looks like an oversight. preBatchMutate is useless for some operation due to this. See also TEPHRA-299. This looks like an oversight. MiniBatchOperationInProgress has limited visibility for coprocessors. OperationStatus and OperationStatusCode should have the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21856) Consider Causal Replication
Lars Hofhansl created HBASE-21856: - Summary: Consider Causal Replication Key: HBASE-21856 URL: https://issues.apache.org/jira/browse/HBASE-21856 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl We've had various effort to improve the ordering guarantees for HBase, most notably Serial Replication. I think in many cases guaranteeing a Total Replication Order is not required, but a simpler Causal Replication Order is sufficient. Specifically we would guarantee causal ordering for a single Rowkey. Any changes to a Row - Puts, Deletes, etc) would be replicated in the exact order in which they occurred in the source system. Unlike total ordering this can be accomplished with only local region server control. I don't have a full design in mind, let's discuss here. It should be sufficient to to the following: # RegionServers only adopt the replication queues from other RegionServers for regions they (now) own. This requires log splitting for replication. # RegionServer ship all edits for queues adopted from other servers before any of their "own" edits are shipped. It's probably a bit more involved, but should be much cheaper that the total ordering provided by serial replication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21590) Optimize trySkipToNextColumn in StoreScanner a bit
Lars Hofhansl created HBASE-21590: - Summary: Optimize trySkipToNextColumn in StoreScanner a bit Key: HBASE-21590 URL: https://issues.apache.org/jira/browse/HBASE-21590 Project: HBase Issue Type: Task Reporter: Lars Hofhansl See latest comment on HBASE-17958 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19034) Implement "optimize SEEK to SKIP" in storefile scanner
[ https://issues.apache.org/jira/browse/HBASE-19034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-19034. --- Resolution: Won't Fix Closing as "Won't Fix" as it turns out that the doing this optimization at the StoreFile (or HFile) Scanner level misses the most important opportunity for optimization - it's too far down the stack. > Implement "optimize SEEK to SKIP" in storefile scanner > -- > > Key: HBASE-19034 > URL: https://issues.apache.org/jira/browse/HBASE-19034 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Priority: Major > > {code} > protected boolean trySkipToNextRow(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey = getNextIndexedKey(); > if (nextIndexedKey != null && nextIndexedKey != > KeyValueScanner.NO_NEXT_INDEXED_KEY > && matcher.compareKeyForNextRow(nextIndexedKey, cell) >= 0) { > this.heap.next(); > ++kvsScanned; > } else { > return false; > } > } while ((nextCell = this.heap.peek()) != null && > CellUtil.matchingRows(cell, nextCell)); > return true; > } > {code} > When SQM return a SEEK_NEXT_ROW, the store scanner will seek to the cell from > next row. HBASE-13109 optimized the SEEK to SKIP when we can read the cell in > current loaded block. So it will skip by call heap.next to the cell from next > row. But the problem is it compare too many times with the nextIndexedKey in > the while loop. We plan move the compare outside the loop to reduce compare > times. One problem is the nextIndexedKey maybe changed when call heap.peek, > because the current storefile scanner was changed. So my proposal is to move > the "optimize SEEK to SKIP" to storefile scanner. When we call seek for > storefile scanner, it may real seek or implement seek by several times skip. > Any suggestions are welcomed. Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work
[ https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl reopened HBASE-20993: --- > [Auth] IPC client fallback to simple auth allowed doesn't work > -- > > Key: HBASE-20993 > URL: https://issues.apache.org/jira/browse/HBASE-20993 > Project: HBase > Issue Type: Bug > Components: Client, IPC/RPC, security >Affects Versions: 1.2.6, 1.3.2, 1.2.7, 1.4.7 >Reporter: Reid Chan >Assignee: Jack Bearden >Priority: Critical > Fix For: 1.5.0, 1.4.8 > > Attachments: HBASE-20993.001.patch, > HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, > HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, > HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, > HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, > HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, > HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, > HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt > > > It is easily reproducible. > client's hbase-site.xml: hadoop.security.authentication:kerberos, > hbase.security.authentication:kerberos, > hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal > are right set > A simple auth hbase cluster, a kerberized hbase client application. > application trying to r/w/c/d table will have following exception: > {code} > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873) > at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738) > at > org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4297) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4289) > at > org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsyncV2(HBaseAdmin.java:753) > at > org.apache.hadoop.hbase.client.HBaseA
[jira] [Created] (HBASE-21166) Creating a CoprocessorHConnection re-retrieves the cluster id from ZK
Lars Hofhansl created HBASE-21166: - Summary: Creating a CoprocessorHConnection re-retrieves the cluster id from ZK Key: HBASE-21166 URL: https://issues.apache.org/jira/browse/HBASE-21166 Project: HBase Issue Type: Bug Affects Versions: 1.5.0 Reporter: Lars Hofhansl CoprocessorHConnections are created for example during a call of CoprocessorHost$Environent.getTable(...). The region server already know the cluster id, yet, we're resolving it over and over again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20446) Allow building HBase 1.x against Hadoop 3.1.x
[ https://issues.apache.org/jira/browse/HBASE-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-20446. --- Resolution: Fixed Release Note: Finally committed this. > Allow building HBase 1.x against Hadoop 3.1.x > - > > Key: HBASE-20446 > URL: https://issues.apache.org/jira/browse/HBASE-20446 > Project: HBase > Issue Type: Improvement > Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Minor > Fix For: 1.5.0 > > Attachments: 20446.txt > > > Simple change, just leaving it here in case somebody needs this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21137) After HBASE-20940 any local index query will open all HFiles of every Region involved in the query
Lars Hofhansl created HBASE-21137: - Summary: After HBASE-20940 any local index query will open all HFiles of every Region involved in the query Key: HBASE-21137 URL: https://issues.apache.org/jira/browse/HBASE-21137 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl See HBASE-20940. [~vishk], [~apurtell] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21033) Separate StoreHeap from StoreFileHeap
Lars Hofhansl created HBASE-21033: - Summary: Separate StoreHeap from StoreFileHeap Key: HBASE-21033 URL: https://issues.apache.org/jira/browse/HBASE-21033 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Currently KeyValueHeap is used for both, heaps of StoreScanners at the Region level as well as heaps of StoreFileScanners (and a MemstoreScanner) at the Store level. This is various problems: # Some incorrect method usage can only be deduced at runtime via runtime exception. # In profiling sessions it's hard to distinguish the two. # It's just not clean :) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-6562) Fake KVs are sometimes passed to filters
[ https://issues.apache.org/jira/browse/HBASE-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-6562. -- Resolution: Fixed In 1.4+ this should be fixed. > Fake KVs are sometimes passed to filters > > > Key: HBASE-6562 > URL: https://issues.apache.org/jira/browse/HBASE-6562 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl >Priority: Minor > Attachments: 6562-0.94-v1.txt, 6562-0.96-v1.txt, 6562-v2.txt, > 6562-v3.txt, 6562-v4.txt, 6562-v5.txt, 6562.txt, minimalTest.java > > > In internal tests at Salesforce we found that fake row keys sometimes are > passed to filters (Filter.filterRowKey(...) specifically). > The KVs are eventually filtered by the StoreScanner/ScanQueryMatcher, but the > row key is passed to filterRowKey in RegionScannImpl *before* that happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20459) Majority of scan time in HBase-1 spent in size estimation
Lars Hofhansl created HBASE-20459: - Summary: Majority of scan time in HBase-1 spent in size estimation Key: HBASE-20459 URL: https://issues.apache.org/jira/browse/HBASE-20459 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Attachments: Screenshot_20180419_162559.png -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20446) Allow building HBase 1.x against Hadoop 3.1.0
Lars Hofhansl created HBASE-20446: - Summary: Allow building HBase 1.x against Hadoop 3.1.0 Key: HBASE-20446 URL: https://issues.apache.org/jira/browse/HBASE-20446 Project: HBase Issue Type: Improvement Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 20446.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-19631) Allow building HBase 1.5.x against Hadoop 3.0.0
[ https://issues.apache.org/jira/browse/HBASE-19631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-19631. --- Resolution: Fixed Committed to hbase-1 branch. > Allow building HBase 1.5.x against Hadoop 3.0.0 > --- > > Key: HBASE-19631 > URL: https://issues.apache.org/jira/browse/HBASE-19631 > Project: HBase > Issue Type: Sub-task > Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Major > Fix For: 1.5.0 > > Attachments: 19631.txt > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19631) Allow building HBase 1.5.x against Hadoop 3.0.0
Lars Hofhansl created HBASE-19631: - Summary: Allow building HBase 1.5.x against Hadoop 3.0.0 Key: HBASE-19631 URL: https://issues.apache.org/jira/browse/HBASE-19631 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-15453) [Performance] Considering reverting HBASE-10015 - reinstate synchronized in StoreScanner
[ https://issues.apache.org/jira/browse/HBASE-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-15453. --- Resolution: Won't Fix Lemme just close. In 1.3+ it's not an issue anyway (the need to synchronize is gone there) > [Performance] Considering reverting HBASE-10015 - reinstate synchronized in > StoreScanner > > > Key: HBASE-15453 > URL: https://issues.apache.org/jira/browse/HBASE-15453 > Project: HBase > Issue Type: Improvement > Components: Performance > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl >Priority: Critical > Attachments: 15453-0.98.txt > > > In HBASE-10015 back then I found that intrinsic locks (synchronized) in > StoreScanner are slower that explicit locks. > I was surprised by this. To make sure I added a simple perf test and many > folks ran it on their machines. All found that explicit locks were faster. > Now... I just ran that test again. On the latest JDK8 I find that now the > intrinsic locks are significantly faster: > (OpenJDK Runtime Environment (build 1.8.0_72-b15)) > Explicit locks: > 10 runs mean:2223.6 sigma:72.29412147609237 > Intrinsic locks: > 10 runs mean:1865.3 sigma:32.63755505548784 > I confirmed the same with timing some Phoenix scans. We can save a bunch of > time by changing this back > Arrghhh... So maybe it's time to revert this now...? > (Note that in trunk due to [~ram_krish]'s work, we do not lock in > StoreScanner anymore) > I'll attach the perf test and a patch that changes lock to synchronized, if > some folks could run this on 0.98, that'd be great. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-13094) Consider Filters that are evaluated before deletes and see delete markers
[ https://issues.apache.org/jira/browse/HBASE-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-13094. --- Resolution: Won't Fix No interest. Closing. > Consider Filters that are evaluated before deletes and see delete markers > - > > Key: HBASE-13094 > URL: https://issues.apache.org/jira/browse/HBASE-13094 > Project: HBase > Issue Type: Brainstorming > Components: regionserver, Scanners >Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Attachments: 13094-0.98.txt > > > That would be good for full control filtering of all cells, such as needed > for some transaction implementations. > [~ghelmling] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19534) Document risks of RegionObserver.preStoreScannerOpen
Lars Hofhansl created HBASE-19534: - Summary: Document risks of RegionObserver.preStoreScannerOpen Key: HBASE-19534 URL: https://issues.apache.org/jira/browse/HBASE-19534 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl We just had an outage because we used preStoreScannerOpen, which, in our case, created a new StoreScanner. In HBase versions before 1.3 this caused a definite memory leak, a reference to the old StoreScanner (if not null) would be held by the store until the region is closed. The 1.3 and later there's no such leak, but still the old scanner is not properly closed. This should be added to the Javadoc and the ZooKeeperScanPolicyObserver example should be fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19458) Allow building HBase 1.3.x against Hadoop 2.8.2
Lars Hofhansl created HBASE-19458: - Summary: Allow building HBase 1.3.x against Hadoop 2.8.2 Key: HBASE-19458 URL: https://issues.apache.org/jira/browse/HBASE-19458 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18228) HBCK improvements
Lars Hofhansl created HBASE-18228: - Summary: HBCK improvements Key: HBASE-18228 URL: https://issues.apache.org/jira/browse/HBASE-18228 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl We just had a prod issue and running HBCK the way we did actually causes more problems. In part HBCK did stuff we did not expect, in part we had little visibility into what HBCK was doing, and in part the logging was confusing. I'm proposing 2 improvements: 1. A dry-run mode. Run, and just list what would have been done. 2. An interactive mode. Run, and for each action request Y/N user input. So that a user can opt-out of stuff. [~jmhsieh], FYI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18165) Predicate based deletion during major compactions
Lars Hofhansl created HBASE-18165: - Summary: Predicate based deletion during major compactions Key: HBASE-18165 URL: https://issues.apache.org/jira/browse/HBASE-18165 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl In many cases it is expensive to place a delete per version, column, or family. HBase should have way to specify a predicate and remove all Cells matching the predicate during the next compactions (major and minor). Nothing more concrete. The tricky part would be to know when it is safe to remove the predicate, i.e. when we can be sure that all Cells matching the predicate actually have been removed. Could potentially use HBASE-12859 for that. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18000) Make sure we always return the scanner id with ScanResponse
Lars Hofhansl created HBASE-18000: - Summary: Make sure we always return the scanner id with ScanResponse Key: HBASE-18000 URL: https://issues.apache.org/jira/browse/HBASE-18000 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Some external tooling (like OpenTSDB) relies on the scanner id to tie asynchronous responses back to their requests. (see comments on HBASE-17489) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [ANNOUNCE] Apache HBase 1.3.1 is now available for download
Hi Mikhail, I don't see a 1.3.1 tag, yet. Thanks. -- Lars From: Mikhail AntonovTo: "u...@hbase.apache.org" ; "dev@hbase.apache.org" Sent: Friday, April 21, 2017 1:02 AM Subject: [ANNOUNCE] Apache HBase 1.3.1 is now available for download The HBase team is happy to announce the immediate availability of HBase 1.3 .1! Apache HBase is an open-source, distributed, versioned, non-relational database. Apache HBase gives you low latency random access to billions of rows with millions of columns atop non-specialized hardware. To learn more about HBase, see https://hbase.apache.org/. HBase 1.3.1 is the first maintenance release in the HBase 1.3.z release line, continuing on the theme of bringing a stable, reliable database to the Hadoop and NoSQL communities. This release includes 68 bugfixes and improvements since the initial 1.3.0 release. Notable fixes include: [HBASE-16630] - Fragmentation in long running Bucket Cache [HBASE-17059] - SimpleLoadBalancer schedules large amount of invalid region moves [HBASE-17060] - Compute region locality in parallel at startup [HBASE-17227] - FSHLog may roll a new writer successfully with unflushed entries [HBASE-17265, HBASE-17275] - Region assignment fixes. And other important fixes, including the areas of load balancing, region assignment, replication and write-ahead log. The full list of resolved issues is available at https://s.apache.org/hbase-1.3.1-jira-releasenotes Download through an ASF mirror near you: http://www.apache.org/dyn/closer.lua/hbase/1.3.1 The relevant checksums files are available at: https://www.apache.org/dist/hbase/1.3.1/hbase-1.3.1-src.tar.gz.mds https://www.apache.org/dist/hbase/1.3.1/hbase-1.3.1-bin.tar.gz.mds Project members signature keys can be found at https://www.apache.org/dist/hbase/KEYS PGP signatures are available at: https://www.apache.org/dist/hbase/1.3.1/hbase-1.3.1-src.tar.gz.asc https://www.apache.org/dist/hbase/1.3.1/hbase-1.3.1-bin.tar.gz.asc For instructions on verifying ASF release downloads, please see https://www.apache.org/dyn/closer.cgi#verify Questions, comments and problems are always welcome at: dev@hbase.apache.org . Thank you! The HBase Dev Team
[jira] [Created] (HBASE-17893) Allow HBase to build against Hadoop 2.8.0
Lars Hofhansl created HBASE-17893: - Summary: Allow HBase to build against Hadoop 2.8.0 Key: HBASE-17893 URL: https://issues.apache.org/jira/browse/HBASE-17893 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project hbase-assembly: Error rendering velocity resource. Error invoking method 'get(java.lang.Integer)' in java.util.ArrayList at META-INF/LICENSE.vm[line 1671, column 8]: InvocationTargetException: Index: 0, Size: 0 -> [Help 1] {code} The in the generated LICENSE. {code} This product includes Nimbus JOSE+JWT licensed under the The Apache Software License, Version 2.0. ${dep.licenses[0].comments} Please check this License for acceptability here: https://www.apache.org/legal/resolved If it is okay, then update the list named 'non_aggregate_fine' in the LICENSE.vm file. If it isn't okay, then revert the change that added the dependency. More info on the dependency: com.nimbusds nimbus-jose-jwt 3.9 maven central search g:com.nimbusds AND a:nimbus-jose-jwt AND v:3.9 project website https://bitbucket.org/connect2id/nimbus-jose-jwt project source https://bitbucket.org/connect2id/nimbus-jose-jwt {code} Maybe the problem is just that it says: Apache _Software_ License -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-9739) HBaseClient does not behave nicely when the called thread is interrupted
[ https://issues.apache.org/jira/browse/HBASE-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-9739. -- Resolution: Won't Fix Old issue. No update. Closing. > HBaseClient does not behave nicely when the called thread is interrupted > > > Key: HBASE-9739 > URL: https://issues.apache.org/jira/browse/HBASE-9739 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > > Just ran into a scenario where HBaseClient became permanently useless after > we interrupted the using thread. > The problem is here: > {code} > } catch(IOException e) { > markClosed(e); > {code} > In sendParam(...). > If the IOException is caused by an interrupt we should not close the > connection. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-10145) Table creation should proceed in the presence of a stale znode
[ https://issues.apache.org/jira/browse/HBASE-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-10145. --- Resolution: Won't Fix Closing old issue. > Table creation should proceed in the presence of a stale znode > -- > > Key: HBASE-10145 > URL: https://issues.apache.org/jira/browse/HBASE-10145 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl >Priority: Minor > > HBASE-7600 fixed a race condition where concurrent attempts to create the > same table could succeed. > An unfortunate side effect is that it is now impossible to create a table as > long as the table's znode is around, which is an issue when a cluster was > wiped at the HDFS level. > Minor issue as we have discussed this many times before, but it ought to be > possible to check whether the table directory exists and if not either create > it or remove the corresponding znode. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-10028) Cleanup metrics documentation
[ https://issues.apache.org/jira/browse/HBASE-10028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-10028. --- Resolution: Won't Fix Closing old issue. > Cleanup metrics documentation > - > > Key: HBASE-10028 > URL: https://issues.apache.org/jira/browse/HBASE-10028 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > > The current documentation of the metrics is incomplete and at point incorrect > (HDFS latencies are in ns rather than ms for example). > We should clean this up and add other related metrics as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-6492) Remove Reflection based Hadoop abstractions
[ https://issues.apache.org/jira/browse/HBASE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-6492. -- Resolution: Won't Fix Closing old issue. > Remove Reflection based Hadoop abstractions > --- > > Key: HBASE-6492 > URL: https://issues.apache.org/jira/browse/HBASE-6492 > Project: HBase > Issue Type: Improvement > Reporter: Lars Hofhansl > > In 0.96 we now have the Hadoop1-compat and Hadoop2-compat projects. > The reflection we're using to deal with different versions of Hadoop should > be removed in favour of using the compact projects. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-5475) Allow importtsv and Import to work truly offline when using bulk import option
[ https://issues.apache.org/jira/browse/HBASE-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-5475. -- Resolution: Won't Fix Closing old issue. > Allow importtsv and Import to work truly offline when using bulk import option > -- > > Key: HBASE-5475 > URL: https://issues.apache.org/jira/browse/HBASE-5475 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Reporter: Lars Hofhansl >Priority: Minor > > Currently importtsv (and now also Import with HBASE-5440) support using > HFileOutputFormat for later bulk loading. > However, currently that cannot be without having access to the table we're > going to import to, because both importtsv and Import need to lookup the > split points, and find the compression setting. > It would be nice if there would be an offline way to provide the split point > and compression setting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-5311) Allow inmemory Memstore compactions
[ https://issues.apache.org/jira/browse/HBASE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-5311. -- Resolution: Won't Fix Closing old issue. > Allow inmemory Memstore compactions > --- > > Key: HBASE-5311 > URL: https://issues.apache.org/jira/browse/HBASE-5311 > Project: HBase > Issue Type: Improvement > Reporter: Lars Hofhansl > Attachments: InternallyLayeredMap.java > > > Just like we periodically compact the StoreFiles we should also periodically > compact the MemStore. > During these compactions we eliminate deleted cells, expired cells, cells to > removed because of version count, etc, before we even do a memstore flush. > Besides the optimization that we could get from this, it should also allow us > to remove the special handling of ICV, Increment, and Append (all of which > use upsert logic to avoid accumulating excessive cells in the Memstore). > Not targeting this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17877) Replace/improve HBase's byte[] comparator
Lars Hofhansl created HBASE-17877: - Summary: Replace/improve HBase's byte[] comparator Key: HBASE-17877 URL: https://issues.apache.org/jira/browse/HBASE-17877 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl [~vik.karma] did some extensive tests and found that Hadoop's version is faster - dramatically faster in some cases. Patch forthcoming. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-17440) [0.98] Make sure DelayedClosing chore is stopped as soon as an HConnection is closed
[ https://issues.apache.org/jira/browse/HBASE-17440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-17440. --- Resolution: Fixed Fix Version/s: 0.98.25 Done. > [0.98] Make sure DelayedClosing chore is stopped as soon as an HConnection is > closed > > > Key: HBASE-17440 > URL: https://issues.apache.org/jira/browse/HBASE-17440 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.24 > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 0.98.25 > > Attachments: 17440.txt > > > We're seeing many issue with run-away ZK client connection in long running > app servers. 10k or more send or event threads are happening frequently. > While I looked around in the code I noticed that DelayedClosing closing is > not immediately ended when an HConnection is closed, when there's an issue > with HBase or ZK and client reconnect in a tight loop, this can lead > temporarily to very many threads running. These will all get cleaned out > after at most 60s, but during that time a lot of threads can be created. > The fix is a one-liner. We'll likely file other issues soon. > Interestingly branch-1 and beyond do not have this chore anymore, although - > at least in branch-1 and later - I still see the ZooKeeperAliveConnection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17440) [0.98] Make sure DelayedClosing chore is stopped as soon as an HConnection is closed
Lars Hofhansl created HBASE-17440: - Summary: [0.98] Make sure DelayedClosing chore is stopped as soon as an HConnection is closed Key: HBASE-17440 URL: https://issues.apache.org/jira/browse/HBASE-17440 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl We're seeing many issue with run-away ZK client connection in long running app servers. 10k or more send or event threads are happening frequently. While I looked around in the code I noticed that DelayedClosing closing is not immediately ended when an HConnection is closed, when there's an issue with HBase or ZK and client reconnect in a tight loop, this can lead temporarily to very many threads running. These will all get cleaned out after at most 60s, but during that time a lot of threads can be created. The fix is a one-liner. We'll likely file other issues soon. Interestingly branch-1 and beyond do not have this chore anymore, although - at least in branch-1 and later - I still see the ZooKeeperAliveConnection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-16115) Missing security context in RegionObserver coprocessor when a compaction/split is triggered manually
[ https://issues.apache.org/jira/browse/HBASE-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-16115. --- Resolution: Won't Fix > Missing security context in RegionObserver coprocessor when a > compaction/split is triggered manually > > > Key: HBASE-16115 > URL: https://issues.apache.org/jira/browse/HBASE-16115 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.20 > Reporter: Lars Hofhansl > > We ran into an interesting phenomenon which can easily render a cluster > unusable. > We loaded some tests data into a test table and forced a manual compaction > through the UI. We have some compaction hooks implemented in a region > observer, which writes back to another HBase table when the compaction > finishes. We noticed that this coprocessor is not setup correctly, it seems > the security context is missing. > The interesting part is that this _only_ happens when the compaction is > triggere through the UI. Automatic compactions (major or minor) or when > triggered via the HBase shell (folling a kinit) work fine. Only the > UI-triggered compactions cause this issues and lead to essentially > neverending compactions, immovable regions, etc. > Not sure what exactly the issue is, but I wanted to make sure I capture this. > [~apurtell], [~ghelmling], FYI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12433) Coprocessors not dynamically reordered when reset priority
[ https://issues.apache.org/jira/browse/HBASE-12433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-12433. --- Resolution: Not A Bug > Coprocessors not dynamically reordered when reset priority > -- > > Key: HBASE-12433 > URL: https://issues.apache.org/jira/browse/HBASE-12433 > Project: HBase > Issue Type: Bug > Components: Coprocessors >Affects Versions: 0.98.7 >Reporter: James Taylor > > When modifying the coprocessor priority through the HBase shell, the order of > the firing of the coprocessors wasn't changing. It probably would have with a > cluster bounce, but if we can make it dynamic easily, that would be > preferable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12570) Improve table configuration sanity checking
[ https://issues.apache.org/jira/browse/HBASE-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-12570. --- Resolution: Duplicate > Improve table configuration sanity checking > --- > > Key: HBASE-12570 > URL: https://issues.apache.org/jira/browse/HBASE-12570 > Project: HBase > Issue Type: Umbrella >Reporter: James Taylor > > See PHOENIX-1473. If a split policy class cannot be resolved, then your HBase > cluster will be brought down as each region server that successively attempts > to open the region will not find the class and will bring itself down. > One idea to prevent this would be to fail the CREATE TABLE or ALTER TABLE > admin call if the split policy class cannot be found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-16765) New SteppingRegionSplitPolicy, avoid too aggressive spread of regions for small tables.
[ https://issues.apache.org/jira/browse/HBASE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-16765. --- Resolution: Fixed Assignee: Lars Hofhansl Fix Version/s: 1.1.8 0.98.24 1.2.4 1.4.0 1.3.0 2.0.0 > New SteppingRegionSplitPolicy, avoid too aggressive spread of regions for > small tables. > --- > > Key: HBASE-16765 > URL: https://issues.apache.org/jira/browse/HBASE-16765 > Project: HBase > Issue Type: Bug >Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.4, 0.98.24, 1.1.8 > > Attachments: 16765-0.98.txt > > > We just did some experiments on some larger clusters and found that while > using IncreasingToUpperBoundRegionSplitPolicy generally works well and is > very convenient, it does tend to produce too many regions. > Since the logic is - by design - local, checking the number of regions of the > table in question on the local server only, we end with more regions then > necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14613) Remove MemStoreChunkPool?
[ https://issues.apache.org/jira/browse/HBASE-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14613. --- Resolution: Won't Fix > Remove MemStoreChunkPool? > - > > Key: HBASE-14613 > URL: https://issues.apache.org/jira/browse/HBASE-14613 > Project: HBase > Issue Type: Sub-task > Reporter: Lars Hofhansl >Priority: Minor > Attachments: 14613-0.98.txt, gc.png, writes.png > > > I just stumbled across MemStoreChunkPool. The idea behind is to reuse chunks > of allocations rather than letting the GC handle this. > Now, it's off by default, and it seems to me to be of dubious value. I'd > recommend just removing it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16765) Improve IncreasingToUpperBoundRegionSplitPolicy
Lars Hofhansl created HBASE-16765: - Summary: Improve IncreasingToUpperBoundRegionSplitPolicy Key: HBASE-16765 URL: https://issues.apache.org/jira/browse/HBASE-16765 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl We just did some experiments on some larger clusters and found that while using IncreasingToUpperBoundRegionSplitPolicy generally works well and is very convenient, it does tend to produce too many regions. Since the logic is - by design - local, checking the number of regions of the table in question on the local server only, we end with more regions then necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-15059) Allow 0.94 to compile against Hadoop 2.7.x
[ https://issues.apache.org/jira/browse/HBASE-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-15059. --- Resolution: Won't Fix Fix Version/s: (was: 0.94.28) > Allow 0.94 to compile against Hadoop 2.7.x > -- > > Key: HBASE-15059 > URL: https://issues.apache.org/jira/browse/HBASE-15059 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Attachments: 15059-addendum.txt, 15059-v2.txt, 15059.txt > > > Currently HBase 0.94 cannot be compiled against Hadoop 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-15431) A bunch of methods are hot and too big to be inlined
[ https://issues.apache.org/jira/browse/HBASE-15431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-15431. --- Resolution: Invalid Giving up on this. > A bunch of methods are hot and too big to be inlined > > > Key: HBASE-15431 > URL: https://issues.apache.org/jira/browse/HBASE-15431 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Attachments: hotMethods.txt > > > I ran HBase with "-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining" and then looked for "hot method too big" log lines. > I'll attach a log of those messages. > I tried to increase -XX:FreqInlineSize to 1010 to inline all these methods > (as long as they're hot, but actually didn't see any improvement). > In all cases I primed the JVM to make sure the JVM gets a chance to profile > the methods and decide whether they're hot or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16115) Missing security context in RegionObserver coprocessor when a compaction is triggered through the UI
Lars Hofhansl created HBASE-16115: - Summary: Missing security context in RegionObserver coprocessor when a compaction is triggered through the UI Key: HBASE-16115 URL: https://issues.apache.org/jira/browse/HBASE-16115 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl We ran into an interesting phenomenon which can easily render a cluster unusable. We loaded some tests data into a test table and forced a manual compaction through the UI. We have some compaction hooks implemented in a region observer, which writes back to another HBase table when the compaction finishes. We noticed that this coprocessor is not setup correctly, it seems the security context is missing. The interesting part is that this _only_ happens when the compaction is triggere through the UI. Automatic compactions (major or minor) or when triggered via the HBase shell (folling a kinit) work fine. Only the UI-triggered compactions cause this issues and lead to essentially neverending compactions, immovable regions, etc. Not sure what exactly the issue is, but I wanted to make sure I capture this. [~apurtell], [~ghelmling], FYI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15881) Allow BZIP2 compression
Lars Hofhansl created HBASE-15881: - Summary: Allow BZIP2 compression Key: HBASE-15881 URL: https://issues.apache.org/jira/browse/HBASE-15881 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl BZIP2 is a very efficient compressor in terms of compression rate. Compression speed is very slow, de-compression is equivalent or faster than GZIP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-15452) Consider removing checkScanOrder from StoreScanner.next
[ https://issues.apache.org/jira/browse/HBASE-15452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-15452. --- Resolution: Invalid NM... I'm full of @#%^ > Consider removing checkScanOrder from StoreScanner.next > --- > > Key: HBASE-15452 > URL: https://issues.apache.org/jira/browse/HBASE-15452 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl > Attachments: 15452-0.98.txt > > > In looking why we spent so much time in StoreScanner.next when doing a simple > Phoenix count\(*) query I came across checkScanOrder. Not only is this a > function dispatch (that the JIT would eventually inline), it also requires > setting the prevKV member for every Cell encountered. > Removing that logic a yields measurable end-to-end improvement of 5-20% (in > 0.98). > I will repeat this test on my work machine tomorrow. > I think we're stable enough to remove that check anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15453) Considering reverting HBASE-10015 - reinstance synchronized in StoreScanner
Lars Hofhansl created HBASE-15453: - Summary: Considering reverting HBASE-10015 - reinstance synchronized in StoreScanner Key: HBASE-15453 URL: https://issues.apache.org/jira/browse/HBASE-15453 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl In HBASE-10015 back then I found that intrinsic locks (synchronized) in StoreScanner are slower that explicit locks. I was surprised by this. To make sure I added a simple perf test and many folks ran it on their machines. All found that explicit locks were faster. Now... I just ran that test again. On the latest JDK8 I find that now the intrinsic locks are significantly faster: Explicit locks: 10 runs mean:2223.6 sigma:72.29412147609237 Intrinsic locks: 10 runs mean:1865.3 sigma:32.63755505548784 I confirmed the same with timing some Phoenix scans. We can save a bunch of time by changing this back Arrghhh... So maybe it's time to revert this now...? (Note that in trunk due to [~ram_krish]'s work, we do not lock in StoreScanner anymore) I'll attach the perf test and a patch that changes lock to synchronized, if some folks could run this on 0.98, that'd be great. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15452) Consider removing checkScanOrder from StoreScanner.next
Lars Hofhansl created HBASE-15452: - Summary: Consider removing checkScanOrder from StoreScanner.next Key: HBASE-15452 URL: https://issues.apache.org/jira/browse/HBASE-15452 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl In looking why we spent so much time in StoreScanner.next when doing a simple Phoenix count\(*) query I came across checkScanOrder. Not only is this a function dispatch (that the JIT would eventually inline), it also requires setting the prevKV member for every Cell encountered. Removing that logic a yields measurable end-to-end improvement of 5-20% (in 0.98). I will repeat this test on my work machine tomorrow. I think we're stable enough to remove that check anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15431) A bunch of methods are hot and too big to be inline
Lars Hofhansl created HBASE-15431: - Summary: A bunch of methods are hot and too big to be inline Key: HBASE-15431 URL: https://issues.apache.org/jira/browse/HBASE-15431 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl I ran HBase with "-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining" and then looked for "hot method too big" log lines. I'll attach a log of those messages. I tried to increase -XX:FreqInlineSize to 1010 to inline all these methods (as long as they're hot, but actually didn't see any improvement). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13068) Unexpected client exception with slow scan
[ https://issues.apache.org/jira/browse/HBASE-13068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-13068. --- Resolution: Cannot Reproduce Closing this old issue. > Unexpected client exception with slow scan > -- > > Key: HBASE-13068 > URL: https://issues.apache.org/jira/browse/HBASE-13068 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.10.1 >Reporter: Lars Hofhansl > > I just came across in interesting exception: > {code} > Caused by: java.io.IOException: Call 10 not added as the connection > newbunny/127.0.0.1:60020/ClientService/lars (auth:SIMPLE)/6 is closing > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.addCall(RpcClient.java:495) > at > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1534) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442) > ... 13 more > {code} > Called from here: > {code} > at > org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:291) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:160) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:115) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:91) > at > org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:247) > {code} > This happened when I scanned with multiple client against a single region > server when all data is filtered at the server by a filter. > I had 10 clients, the region server has 30 handles. > This means the scanners are not getting closed and their lease has to expire. > The workaround is to increase hbase.ipc.client.connection.maxidletime. > But it's strange that this *only* happens at close time. And since I am not > using up all handlers there shouldn't be any starvation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15084) Remove references to repository.codehaus.org
Lars Hofhansl created HBASE-15084: - Summary: Remove references to repository.codehaus.org Key: HBASE-15084 URL: https://issues.apache.org/jira/browse/HBASE-15084 Project: HBase Issue Type: Bug Affects Versions: 0.94.27 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.28 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-15084) Remove references to repository.codehaus.org
[ https://issues.apache.org/jira/browse/HBASE-15084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-15084. --- Resolution: Fixed Committed to 0.94. > Remove references to repository.codehaus.org > > > Key: HBASE-15084 > URL: https://issues.apache.org/jira/browse/HBASE-15084 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.27 >Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 0.94.28 > > > repository.codehause.org is not longer active. > A dns-lookup reveals an alias to stop-looking-at.repository-codehaus-org :) > All repositories have been moved to Maven central, so it can just removed > from the pom. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14213) Ensure ASF policy compliant headers and correct LICENSE and NOTICE files in artifacts for 0.94
[ https://issues.apache.org/jira/browse/HBASE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14213. --- Resolution: Fixed Committed to 0.94. Thanks again. [~busbey] > Ensure ASF policy compliant headers and correct LICENSE and NOTICE files in > artifacts for 0.94 > -- > > Key: HBASE-14213 > URL: https://issues.apache.org/jira/browse/HBASE-14213 > Project: HBase > Issue Type: Task > Components: build >Reporter: Nick Dimiduk >Assignee: Sean Busbey >Priority: Blocker > Fix For: 0.94.28 > > Attachments: 14213-LICENSE.txt, 14213-combined.txt, 14213-part1.txt, > 14213-part2.txt, 14213-part3.sh, 14213-part4.sh, 14213-part5.sh, > HBASE-14213.1.0.94.patch > > > From tail of thread on HBASE-14085, opening a backport ticket for 0.94. Took > the liberty of assigning to [~busbey]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14213) Ensure ASF policy compliant headers and correct LICENSE and NOTICE files in artifacts for 0.94
[ https://issues.apache.org/jira/browse/HBASE-14213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14213. --- Resolution: Won't Fix 0.94 is EOL'd. We're doing final release, no more versions after that. > Ensure ASF policy compliant headers and correct LICENSE and NOTICE files in > artifacts for 0.94 > -- > > Key: HBASE-14213 > URL: https://issues.apache.org/jira/browse/HBASE-14213 > Project: HBase > Issue Type: Task > Components: build >Reporter: Nick Dimiduk >Assignee: Sean Busbey >Priority: Blocker > > From tail of thread on HBASE-14085, opening a backport ticket for 0.94. Took > the liberty of assigning to [~busbey]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-15054) Allow 0.94 to compile with JDK8
[ https://issues.apache.org/jira/browse/HBASE-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-15054. --- Resolution: Fixed Hadoop Flags: Reviewed > Allow 0.94 to compile with JDK8 > --- > > Key: HBASE-15054 > URL: https://issues.apache.org/jira/browse/HBASE-15054 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.28 > > Attachments: 15054.txt > > > Fix the following two problems: > # PoolMap > # InputSampler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-15059) Allow 0.94 to compile against Hadoop 2.7.x
[ https://issues.apache.org/jira/browse/HBASE-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-15059. --- Resolution: Fixed Hadoop Flags: Reviewed Pushed to 0.94 only (added a comment into the pom about the extra build steps) > Allow 0.94 to compile against Hadoop 2.7.x > -- > > Key: HBASE-15059 > URL: https://issues.apache.org/jira/browse/HBASE-15059 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.94.28 > > Attachments: 15059-v2.txt, 15059.txt > > > Currently HBase 0.94 cannot be compiled against Hadoop 2.7. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15054) Allow 0.94 to compile with JDK8
Lars Hofhansl created HBASE-15054: - Summary: Allow 0.94 to compile with JDK8 Key: HBASE-15054 URL: https://issues.apache.org/jira/browse/HBASE-15054 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix the following two problems: # PoolMap # InputSampler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14940) Make our unsafe based ops more safe
[ https://issues.apache.org/jira/browse/HBASE-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14940. --- Resolution: Fixed Release Note: Pushed to 0.98 only. > Make our unsafe based ops more safe > --- > > Key: HBASE-14940 > URL: https://issues.apache.org/jira/browse/HBASE-14940 > Project: HBase > Issue Type: Bug >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4 > > Attachments: HBASE-14940.patch, HBASE-14940_addendum_0.98.patch, > HBASE-14940_branch-1.patch, HBASE-14940_branch-1.patch, > HBASE-14940_branch-1.patch, HBASE-14940_branch-1.patch, HBASE-14940_v2.patch > > > Thanks for the nice findings [~ikeda] > This jira solves 3 issues with Unsafe operations and ByteBufferUtils > 1. We can do sun unsafe based reads and writes iff unsafe package is > available and underlying platform is having unaligned-access capability. But > we were missing the second check > 2. Java NIO is doing a chunk based copy while doing Unsafe copyMemory. The > max chunk size is 1 MB. This is done for "A limit is imposed to allow for > safepoint polling during a large copy" as mentioned in comments in Bits.java. > We are also going to do same way > 3. In ByteBufferUtils, when Unsafe is not available and ByteBuffers are off > heap, we were doing byte by byte operation (read/copy). We can avoid this and > do better way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14777) Fix Inter Cluster Replication Future ordering issues
[ https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14777. --- Resolution: Fixed Pushed to all branches. Thanks for bearing with me. > Fix Inter Cluster Replication Future ordering issues > > > Key: HBASE-14777 > URL: https://issues.apache.org/jira/browse/HBASE-14777 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Bhupendra Kumar Jain >Assignee: Ashu Pachauri >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14777-alternative.txt, HBASE-14777-1.patch, > HBASE-14777-2.patch, HBASE-14777-3.patch, HBASE-14777-4.patch, > HBASE-14777-5.patch, HBASE-14777-6.patch, HBASE-14777-addendum.patch, > HBASE-14777.patch > > > Replication fails with IndexOutOfBoundsException > {code} > regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint > threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: > 1 > at java.util.ArrayList.rangeCheck(Unknown Source) > at java.util.ArrayList.remove(Unknown Source) > at > org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222) > {code} > Its happening due to incorrect removal of entries from the replication > entries list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14869) Better request latency histograms
Lars Hofhansl created HBASE-14869: - Summary: Better request latency histograms Key: HBASE-14869 URL: https://issues.apache.org/jira/browse/HBASE-14869 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl I just discussed this with a colleague. The get, put, etc, histograms that each region server keeps are somewhat useless (depending on what you want to achieve of course), as they are aggregated and calculated by each region server. It would be better to record the number of requests in certainly latency bands in addition to what we do now. For example the number of gets that took 0-5ms, 6-10ms, 10-20ms, 20-50ms, 50-100ms, 100-1000ms, > 1000ms, etc. (just as an example, should be configurable). That way we can do further calculations after the fact, and answer questions like: How often did we miss our SLA? Percentage of requests that missed an SLA, etc. Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers
Lars Hofhansl created HBASE-14791: - Summary: [0.98] CopyTable is extremely slow when moving delete markers Key: HBASE-14791 URL: https://issues.apache.org/jira/browse/HBASE-14791 Project: HBase Issue Type: Bug Affects Versions: 0.98.16 Reporter: Lars Hofhansl We found that some of our copy table job run for many hours, even when there isn't that much data to copy. [~vik.karma] did his magic and found that the issue with copying delete markers (we use raw mode to also move deletes across). Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) are not batched and hence sent to the other side one by one, cause a network RTT for each delete marker. Looks like in trunk it's doing the right thing (using BufferedMutators for all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14657) Remove unneeded API from EncodedSeeker
Lars Hofhansl created HBASE-14657: - Summary: Remove unneeded API from EncodedSeeker Key: HBASE-14657 URL: https://issues.apache.org/jira/browse/HBASE-14657 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3 See parent. We do not need getKeyValueBuffer. It's only used for tests, and parent patch fixes all tests to use getKeyValue instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14628) Save object creation for scanning with FAST_DIFF encoding
Lars Hofhansl created HBASE-14628: - Summary: Save object creation for scanning with FAST_DIFF encoding Key: HBASE-14628 URL: https://issues.apache.org/jira/browse/HBASE-14628 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl I noticed that (at least in 0.98 - master is entirely different) we create ByteBuffer just to create a byte[], which is then used to create a KeyValue. We can save the creation of the ByteBuffer and hence save allocating an extra object for each KV we find by creating the byte[] directly. In a Phoenix count\(*) query that saved from 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14613) Remove MemStoreChunkPool?
Lars Hofhansl created HBASE-14613: - Summary: Remove MemStoreChunkPool? Key: HBASE-14613 URL: https://issues.apache.org/jira/browse/HBASE-14613 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Priority: Minor I just stumbled across MemStoreChunkPool. The idea behind is to reuse chunks of allocations rather than letting the GC handle this. Now, it's off by default, and it seems to me to be of dubious value. I'd recommend just removing it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14549) Simplify scanner stack reset logic
Lars Hofhansl created HBASE-14549: - Summary: Simplify scanner stack reset logic Key: HBASE-14549 URL: https://issues.apache.org/jira/browse/HBASE-14549 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Looking at the code, I find that the logic is unnecessarily complex. We indicate in updateReaders that the scanner stack needs to be reset. Than almost all store scanner (and derived classes) methods need to check and actually reset the scanner stack. Compaction are rare, we should reset the scanner stack in update readers, and hence avoid needing to check in all methods. Patch forthcoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14539) Slight improvement of StoreScanner.optimize
Lars Hofhansl created HBASE-14539: - Summary: Slight improvement of StoreScanner.optimize Key: HBASE-14539 URL: https://issues.apache.org/jira/browse/HBASE-14539 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor While looking at the code I noticed that StoreScanner.optimize does not some unnecessary work. This is a very tight loop and even just looking up a reference can throw off the CPUs cache lines. This does safe a few percent of performance (not a lot, though). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14539) Slight improvement of StoreScanner.optimize
[ https://issues.apache.org/jira/browse/HBASE-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14539. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to 0.98.x, 1.0.x, 1.1.x, 1.2.x, 1.3, and 2.0. > Slight improvement of StoreScanner.optimize > --- > > Key: HBASE-14539 > URL: https://issues.apache.org/jira/browse/HBASE-14539 > Project: HBase > Issue Type: Sub-task > Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Minor > Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.15 > > Attachments: 14539-0.98.txt > > > While looking at the code I noticed that StoreScanner.optimize does not some > unnecessary work. This is a very tight loop and even just looking up a > reference can throw off the CPUs cache lines. This does safe a few percent of > performance (not a lot, though). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14509) Configurable sparse indexes?
Lars Hofhansl created HBASE-14509: - Summary: Configurable sparse indexes? Key: HBASE-14509 URL: https://issues.apache.org/jira/browse/HBASE-14509 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl This idea just popped up today and I wanted to record it for discussion: What if we kept sparse column indexes per region or HFile or per configurable range? I.e. For any given CQ we record the lowest and highest value for a particular range (HFile, Region, or a custom range like the Phoenix guide post). By tweaking the size of these ranges we can control the size of the index, vs its selectivity. For example if we kept it by HFile we can almost instantly decide whether we need scan a particular HFile at all to find a particular value in a Cell. We can also collect min/max values for each n MB of data, for example when we can the region the first time. Assuming ranges are large enough we can always keep the index in memory together with the region. Kind of a sparse local index. Might much easier than the buddy region stuff we've been discussing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14489) postScannerFilterRow consumes a lot of CPU
[ https://issues.apache.org/jira/browse/HBASE-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14489. --- Resolution: Fixed Hadoop Flags: Reviewed > postScannerFilterRow consumes a lot of CPU > -- > > Key: HBASE-14489 > URL: https://issues.apache.org/jira/browse/HBASE-14489 > Project: HBase > Issue Type: Bug > Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Labels: performance > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14489-0.98.txt, 14489-master.txt > > > During an unrelated test I found that when scanning a tall table with CQ only > and filtering most results at the server, 50%(!) of time is spend in > postScannerFilterRow, even though the coprocessor does nothing in that hook. > We need to find a way not to call this hook when not needed, or to question > why we have this hook at all. > I think [~ram_krish] added the hook (or maybe [~anoop.hbase]). I am also not > sure whether Phoenix uses this hook ([~giacomotaylor]?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14489) postScannerFilterRow consumes a lot of CPU
Lars Hofhansl created HBASE-14489: - Summary: postScannerFilterRow consumes a lot of CPU Key: HBASE-14489 URL: https://issues.apache.org/jira/browse/HBASE-14489 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl During an unrelated test I found that when scanning a tall table with CQ only and filtering most results at the server, 50%(!) of time is spend in postScannerFilterRow, even though the coprocessor does nothing in that hook. We need to find a way not to call this hook when not needed, or to question why we have this hook at all. I think [~ram_krish] added the hook (or maybe [~anoop.hbase]). I am also not sure whether Phoenix uses this hook ([~giacomotaylor]?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14418) Make better SEEK vs SKIP decision with seek hints.
[ https://issues.apache.org/jira/browse/HBASE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14418. --- Resolution: Invalid Fix Version/s: (was: 0.98.12) (was: 1.1.0) (was: 1.0.1) (was: 2.0.0) Yeah. Creating the Cell, just to return the hint is what's taking up more time than the actual seek. Closing for now. I might revisit this. [~giacomotaylor], FYI. > Make better SEEK vs SKIP decision with seek hints. > -- > > Key: HBASE-14418 > URL: https://issues.apache.org/jira/browse/HBASE-14418 > Project: HBase > Issue Type: Sub-task > Reporter: Lars Hofhansl > Attachments: 13109.txt, 14418-0.98.txt > > > Continuation of parent. > We can also do this optimization for seek hints. This would allow filters and > coprocessors to be more liberal with seek hints, as seeking when unnecessary > is less of perf detriment. > It's not quite as clear cut, since in order to check, we actually do need to > create the seek hint Cell. Then when we actually seek, we need to create it > again. Need to test carefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14418) Make better SEEK vs SKIP decision with seek hints.
Lars Hofhansl created HBASE-14418: - Summary: Make better SEEK vs SKIP decision with seek hints. Key: HBASE-14418 URL: https://issues.apache.org/jira/browse/HBASE-14418 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Continuation of parent. We can also do this optimization for seek hints. This would allow filters and coprocessors to be more liberal with seek hints, as seeking when unnecessary is less of perf detriment. It's not quite as clear cut, since in order to check, we actually do need to create the seek hint Cell. Then when we actually seek, we need to create it again. Need to test carefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14364) hlog_roll and compact_rs broken in shell
Lars Hofhansl created HBASE-14364: - Summary: hlog_roll and compact_rs broken in shell Key: HBASE-14364 URL: https://issues.apache.org/jira/browse/HBASE-14364 Project: HBase Issue Type: Bug Affects Versions: 0.98.14 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Just noticed that both hlog_roll and compact_rs are broken in shell (at least in 0.98). The hlog_roll broken 3 times: (1) calls admin.rollWALWriter, which no longer exists, and (2) tries to pass a ServerName, but method takes a string, and (3) uses unqualified ServerName to get a server name, which leads to an uninitialized constant error. compact_rs only has the latter problem. Patch upcoming. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14315) Save one call to KeyValueHeap.peek per row
[ https://issues.apache.org/jira/browse/HBASE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-14315. --- Resolution: Fixed Committed to every branch on the planet. Save one call to KeyValueHeap.peek per row -- Key: HBASE-14315 URL: https://issues.apache.org/jira/browse/HBASE-14315 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.1.2, 1.3.0, 0.98.15, 1.2.1, 1.0.3 Attachments: 14315-0.98.txt, 14315-master.txt Another one of my micro optimizations. In StoreScanner.next(...) we can actually save a call to KeyValueHeap.peek, which in my runs of scan heavy loads shows up at top. Based on the run and data this can safe between 3 and 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14315) Save one call to KeyValueHeap.peek per row
Lars Hofhansl created HBASE-14315: - Summary: Save one call to KeyValueHeap.peek per row Key: HBASE-14315 URL: https://issues.apache.org/jira/browse/HBASE-14315 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Another one of my micro optimizations. In StoreScanner.next(...) we can actually safe a call to KeyValueHeap.peek, which in my runs of scan heavy loads shows up at top. Based on the run and data this can safe between 3 and 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12853) distributed write pattern to replace ad hoc 'salting'
[ https://issues.apache.org/jira/browse/HBASE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-12853. --- Resolution: Invalid Fix Version/s: (was: 2.0.0) The discussion has been off topic. We can open a new topic if we have something concrete. distributed write pattern to replace ad hoc 'salting' - Key: HBASE-12853 URL: https://issues.apache.org/jira/browse/HBASE-12853 Project: HBase Issue Type: New Feature Reporter: Michael Segel In reviewing HBASE-11682 (Description of Hot Spotting), one of the issues is that while 'salting' alleviated regional hot spotting, it increased the complexity required to utilize the data. Through the use of coprocessors, it should be possible to offer a method which distributes the data on write across the cluster and then manages reading the data returning a sort ordered result set, abstracting the underlying process. On table creation, a flag is set to indicate that this is a parallel table. On insert in to the table, if the flag is set to true then a prefix is added to the key. e.g. region server#- or region server #|| where the region server # is an integer between 1 and the number of region servers defined. On read (scan) for each region server defined, a separate scan is created adding the prefix. Since each scan will be in sort order, its possible to strip the prefix and return the lowest value key from each of the subsets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Support for upgrades from 0.94
Following our guidelines as stated here: https://hbase.apache.org/book.html#hbase.versioning we can remove the upgrade path from 2.0.Considering the 0.98 is a major version step over 0.94 we could in theory remove such from 1.x, but we only established semantic versioning with 1.0.0. So, yes, it seems we can (and I would argue, should) remove the upgrade code in question from 2.x. -- Lars From: Lars Francke lars.fran...@gmail.com To: dev@hbase.apache.org Sent: Friday, July 31, 2015 3:59 AM Subject: Support for upgrades from 0.94 Hi, this is referring to these two issues: https://issues.apache.org/jira/browse/HBASE-8778 https://issues.apache.org/jira/browse/HBASE-11611 I'm still looking for deprecated stuff that can be cleaned up. We have a file in the code (FSTableDescriptorMigrationToSubdir) that is used to migrate pre-0.96 table format to the new version. It has an annotation that marks it as to-delete for the next major version after 0.96. HBASE-11611 removed the class UpgradeTo96 without removing the hbase shell command upgrade which referred to that class. So that means that as of 2.0.0 upgrades from 0.94 are not supported anymore (If I understand that correctly). Are we okay with that? If so I'd like to create a JIRA removing the remnants and documenting this. Otherwise we need to partially revert HBASE-11611 and probably adapt the Upgrade class. Cheers, Lars
Re: [DISCUSSION] Switching from RTC to CTR
I don't really agree that RTC means we do not trust you. It just means that changes should be peer reviewed, with which I heartily agree.CTR can work with small group (for example in a branch). For a big project I think it will lead to lower quality (and we already have issues with constantly failing tests - part due to the infrastructure, but in part because they are flaky). That said, I like the idea to leave this at discretion of the committer. In that case we do not need the specific week time-line. For a small fix I think a committer can just commit without review at all (null checks, etc). For larger changes or features the committer should naturally request some review. Not a fan of codifying too much details, it's better to trust the judgment of the committer and state some general guidelines. So what am I saying? I think we can state that review is not _required_. Period. Then we could state that committers should use good judgment as to when to request feedback. -- Lars From: Andrew Purtell andrew.purt...@gmail.com To: dev@hbase.apache.org dev@hbase.apache.org Cc: priv...@hbase.apache.org priv...@hbase.apache.org Sent: Thursday, July 30, 2015 6:15 PM Subject: Re: [DISCUSSION] Switching from RTC to CTR I appreciate very much the earlier feedback about switching from RTC to CTR. It helped me think about the essential thing I was after. I'm thinking of making a formal proposal to adopt this, with a VOTE: After posting a patch to JIRA, after one week if there is no review or veto, a committer can commit their own work. It's important we discuss this and have a vote because the default Foundation decision making process ( http://www.apache.org/foundation/voting.html) does not allow what would amount to lazy consensus when RTC is in effect. Should my proposal pass, we would arrive at a hybrid policy that is identical to the default Foundation one *until* one week has elapsed after a code change is proposed. Then, for a committer, for that one code change, they would be able to operate using CTR. I think the HBase PMC is empowered to set this kind of policy for our own project at our option. If you feel I am mistaken about that, please speak up. Should the vote pass I will run it by board@ for review to be sure. We'd document this in the book: https://hbase.apache.org/book.html#_decisions Also, looking at https://hbase.apache.org/book.html#_decisions, I don't think the patch +1 policy should remain because the trial OWNERS concept hasn't worked out, IMHO. The OWNERS concept requires a set of constantly present and engaged owners, a resource demand that's hard to square with the volunteer nature of our community. The amount of time any committer or PMC member has on this project is highly variable day to day and week to week. I'm also thinking of calling a VOTE to significantly revise or strike this section. Both of these things have a common root: Volunteer time is a very precious commodity. Our community's supply of volunteer time fluctuates. I would like to see committers be able to make progress with their own work even in periods when volunteer time is in very short supply, or when they are working on niche concerns that simply do not draw sufficient interest from other committers. (This is different from work that people think isn't appropriate - in that case ignoring it so it will go away would no longer be an option, a veto would be required if you want to stop something.) On Wed, Jul 29, 2015 at 3:56 PM, Andrew Purtell andrew.purt...@gmail.com wrote: Had this thought after getting back on the road. As an alternative to any sweeping change we could do one incremental but very significant thing that acknowledges our status as trusted and busy peers: After posting a patch to JIRA, after one week if there is no review or veto, a committer can commit their own work. On Jul 29, 2015, at 2:20 PM, Mikhail Antonov olorinb...@gmail.com wrote: Just curious, I assume if this change is made, would it only apply to master branch? -Mikhail On Wed, Jul 29, 2015 at 2:09 PM, Andrew Purtell andrew.purt...@gmail.com wrote: @dev is now CCed I didn't want to over structure the discussion with too much detail up front. I do think CTR without supporting process or boundaries can be more problematic than helpful. That could take the form of customarily waiting for reviews before commit even under a CTR regime. I think this discussion has been great so far. I would also add that CTR moves 'R' from a gating requirement per commit (which can be hard to overcome for niche areas or when volunteer resources are less available) more to RMs. will be back later with more. On Jul 29, 2015, at 1:36 PM, Sean Busbey sean.bus...@gmail.com wrote: I'd also favor having this discussion on dev@. On Wed, Jul 29, 2015 at 2:29 PM, Gary Helmling ghelml...@gmail.com wrote: This is already a really interesting and meaningful discussion, and is
Re: DISCUSSION: lets do a developer workshop on near-term work
Personally, I think that is a reasonable way to test the internal friction of the server. I've been doing a lot of tests like that and found a lot of inefficiencies in HBase that way.For cases where we return all Cells back to a (remote) client improving the server by 10 or 20% would mostly go unnoticed. Analytics (aggregates via Phoenix of direct coprocessors) will be more important going forward, so improving that part is important. I completely agree that end-to-end (by which I mean data shipped to the client) testing is important, it's just I'd expect us to work on different areas (put Protobufs on a diet, have a streaming protocol, etc). -- Lars From: Andrew Purtell andrew.purt...@gmail.com To: dev@hbase.apache.org dev@hbase.apache.org Sent: Saturday, July 18, 2015 11:24 AM Subject: Re: DISCUSSION: lets do a developer workshop on near-term work That's not a realistic or useful test scenario, unless the goal is to accelerate queries where all cells are filtered at the server. On Jul 18, 2015, at 11:02 AM, Anoop John anoop.hb...@gmail.com wrote: No Andy. 11425 having doc attached to it. At the end of it, we have added perf numbers in a cluster testing. This was done using PE get and scan tests with filtering all cells at server (to not consider n/w bandwidth constraints) -Anoop- On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell andrew.purt...@gmail.com wrote: We have some microbenchmarks, not evidence of differences seen from a client application. I'm not saying that microbenchmarks are not totally necessary and a great start - they are - but that they don't measure an end goal. Furthermore unless I've missed one somewhere we don't have a JIRA or design doc that states a clear end goal metric like the strawman I threw together in my previous mail. A measurable system level goal and some data from full cluster testing would go a lot further toward letting all of us evaluate the potential and payoff of the work. In the meantime we should probably be assembling these changes on a branch instead of in trunk, for as long as the goal is not clearly defined and the payoff and potential for perf regressions is untested and unknown. On Jul 18, 2015, at 8:05 AM, Anoop John anoop.hb...@gmail.com wrote: Thanks Andy and Lars. The parent jira has doc attached which contains some perf gain numbers.. We will be doing more tests in next 2 weeks (before end of this month) and will publish them. Yes it will be great if it is more IST friendly time :-) -Anoop- On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell andrew.purt...@gmail.com wrote: I can represent your side Ram (and Anoop). I've been known always argue both side of a discussion and to never take sides easily (drives some folks crazy). I can vouch for this (smile) I also can offer support for off heaping there. At the same time we do have a gap where we can't point to a timeline of improvements (yet, anyway) with benchmarks showing gains where your goals need them. For example, stock HBase in one JVM can address max N GB for response time distribution D; dev version of HBase in off heap branch can address max N' GB for distribution D', where N' N and D D' (distribution D' statistically shows better/lower response times). On Jul 17, 2015, at 6:56 AM, lars hofhansl la...@apache.org wrote: I'm in favor of anything that improves performance (and preferably doesn't set us back into a world that's worse than C due to the lack of pointers in Java).Never said I don't like it, it's just that I'm perhaps asking for more numbers and justification in weighing the pros and cons. I can represent your side Ram (and Anoop). I've been known always argue both side of a discussion and to never take sides easily (drives some folks crazy). And Stack's there too, he yell at me where needed :) Perhaps we can do it a bit later in the evening so there is a fighting chance that folks on IST can participate. I know that some of our folks on IST would love to participate in the backup discussion). Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need an approx. number of folks. -- Lars From: ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com To: dev@hbase.apache.org dev@hbase.apache.org; lars hofhansl la...@apache.org Sent: Wednesday, July 15, 2015 10:10 AM Subject: Re: DISCUSSION: lets do a developer workshop on near-term work Hi What time will it be on August 26th? @LarsYa. I know that you are not generally in favour of this offheaping stuff. May be if we (from India) can attend this meeting remotely your thoughts can be discussed and also the current state of this work. RegardsRam On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl la...@apache.org wrote: Works for me. I'll be back in the Bay Area the week of August 9th. We have done a _lot_ of work on backups as well - ours are more complicated as we wanted fast per
[jira] [Resolved] (HBASE-12945) Port: New master API to track major compaction completion to 0.98
[ https://issues.apache.org/jira/browse/HBASE-12945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-12945. --- Resolution: Won't Fix Fix Version/s: (was: 0.98.14) Looks like there's no interest. Closing. Port: New master API to track major compaction completion to 0.98 - Key: HBASE-12945 URL: https://issues.apache.org/jira/browse/HBASE-12945 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-5210) HFiles are missing from an incremental load
[ https://issues.apache.org/jira/browse/HBASE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-5210. -- Resolution: Cannot Reproduce Closing for now HFiles are missing from an incremental load --- Key: HBASE-5210 URL: https://issues.apache.org/jira/browse/HBASE-5210 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.2 Environment: HBase 0.90.2 with Hadoop-0.20.2 (with durable sync). RHEL 2.6.18-164.15.1.el5. 4 node cluster (1 master, 3 slaves) Reporter: Lawrence Simpson Attachments: HBASE-5210-crazy-new-getRandomFilename.patch We run an overnight map/reduce job that loads data from an external source and adds that data to an existing HBase table. The input files have been loaded into hdfs. The map/reduce job uses the HFileOutputFormat (and the TotalOrderPartitioner) to create HFiles which are subsequently added to the HBase table. On at least two separate occasions (that we know of), a range of output would be missing for a given day. The range of keys for the missing values corresponded to those of a particular region. This implied that a complete HFile somehow went missing from the job. Further investigation revealed the following: * Two different reducers (running in separate JVMs and thus separate class loaders) * in the same server can end up using the same file names for their * HFiles. The scenario is as follows: *1. Both reducers start near the same time. *2. The first reducer reaches the point where it wants to write its first file. *3. It uses the StoreFile class which contains a static Random object *which is initialized by default using a timestamp. *4. The file name is generated using the random number generator. *5. The file name is checked against other existing files. *6. The file is written into temporary files in a directory named *after the reducer attempt. *7. The second reduce task reaches the same point, but its StoreClass *(which is now in the file system's cache) gets loaded within the *time resolution of the OS and thus initializes its Random() *object with the same seed as the first task. *8. The second task also checks for an existing file with the name *generated by the random number generator and finds no conflict *because each task is writing files in its own temporary folder. *9. The first task finishes and gets its temporary files committed *to the real folder specified for output of the HFiles. * 10.The second task then reaches its own conclusion and commits its *files (moveTaskOutputs). The released Hadoop code just overwrites *any files with the same name. No warning messages or anything. *The first task's HFiles just go missing. * * Note: The reducers here are NOT different attempts at the same *reduce task. They are different reduce tasks so data is *really lost. I am currently testing a fix in which I have added code to the Hadoop FileOutputCommitter.moveTaskOutputs method to check for a conflict with an existing file in the final output folder and to rename the HFile if needed. This may not be appropriate for all uses of FileOutputFormat. So I have put this into a new class which is then used by a subclass of HFileOutputFormat. Subclassing of FileOutputCommitter itself was a bit more of a problem due to private declarations. I don't know if my approach is the best fix for the problem. If someone more knowledgeable than myself deems that it is, I will be happy to share what I have done and by that time I may have some information on the results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 0.98 patch acceptance criteria discussion
Thanks Andy. I think the gist of the discussion boils down to this:We generally have two goals: (1) follow semver from 1.0.0 onward and (2) avoid losing features/improvements when upgrading from an older version to a newer one. Turns out these two are conflicting unless we follow certain additional policies. The issue at hand was a performance improvement that we added to 0.98, 1.3.0, and 2.0.0, but not 1.0.x, 1.1.x, and 1.2.x (x = 1 in all cases)So when somebody would upgrade from 0.98 to (say) 1.1.7 (if/when that's out) that improvement would silently be lost. I think the extra statement we have to make is that only the latest minor version of the next major branch is guaranteed have all the improvements of the previous major branch.Or phrased in other words: Improvements that are not bug fixes will only go into the x.y.0 minor version, but not (by default anyway, the RM should use good judgment) into any existing minor version (and thus not in a patch version 0) If that's OK with everybody we can just state that and move on (and I'll shut up :) ). -- Lars From: Andrew Purtell apurt...@apache.org To: dev@hbase.apache.org dev@hbase.apache.org Sent: Thursday, July 16, 2015 8:58 AM Subject: 0.98 patch acceptance criteria discussion Hi devs, I'd like to call your attention to an interesting and important discussion taking place on the tail of HBASE-12596. It starts from here: https://issues.apache.org/jira/browse/HBASE-12596?focusedCommentId=14628295page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14628295 -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: DISCUSSION: lets do a developer workshop on near-term work
I'm in favor of anything that improves performance (and preferably doesn't set us back into a world that's worse than C due to the lack of pointers in Java).Never said I don't like it, it's just that I'm perhaps asking for more numbers and justification in weighing the pros and cons. I can represent your side Ram (and Anoop). I've been known always argue both side of a discussion and to never take sides easily (drives some folks crazy). And Stack's there too, he yell at me where needed :) Perhaps we can do it a bit later in the evening so there is a fighting chance that folks on IST can participate. I know that some of our folks on IST would love to participate in the backup discussion). Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need an approx. number of folks. -- Lars From: ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com To: dev@hbase.apache.org dev@hbase.apache.org; lars hofhansl la...@apache.org Sent: Wednesday, July 15, 2015 10:10 AM Subject: Re: DISCUSSION: lets do a developer workshop on near-term work Hi What time will it be on August 26th? @LarsYa. I know that you are not generally in favour of this offheaping stuff. May be if we (from India) can attend this meeting remotely your thoughts can be discussed and also the current state of this work. RegardsRam On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl la...@apache.org wrote: Works for me. I'll be back in the Bay Area the week of August 9th. We have done a _lot_ of work on backups as well - ours are more complicated as we wanted fast per-tenant restores, so data is grouped by tenant. Would like to sync up on that (hopefully some of the folks who wrote most of the code will be in town, I'll check). Also interested in the Time and offheap parts (although you folks usually do not like what I think about the offheap efforts :) ). Would like to add the following topics: - Timestamp Resolution. Or making space for more bits in the timestamps (happy to cover that, unless it's part of the Time topic) - Replication. We found that replication cannot keep up with high write loads, due to the fact that replicated is strictly single threaded per regionserver (even though we have multiple region servers on the sink side) - Spark integration (Ted Malaska?) OK... Out now to make a bullshit hat. -- Lars From: Sean Busbey bus...@cloudera.com To: dev dev@hbase.apache.org Sent: Tuesday, July 14, 2015 7:11 PM Subject: Re: DISCUSSION: lets do a developer workshop on near-term work I'm planning to be in the Bay area the week of the 24th of August. -- Sean On Jul 14, 2015 7:53 PM, Andrew Purtell apurt...@apache.org wrote: I can be up in your area in August. On Tue, Jul 14, 2015 at 5:31 PM, Stack st...@duboce.net wrote: On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar enis@gmail.com wrote: Sounds good. It has been a while we did the talk-aton. I'll be off starting 25 of July, so I prefer something next week if possible. You ever coming back? If so, when? I'm back on 10th of August (Mikhail on the 20th). St.Ack Enis On Tue, Jul 14, 2015 at 3:18 PM, Stack st...@duboce.net wrote: Matteo and I were thinking it time devs got together for a pow-wow. There is a bunch of stuff in flight at the moment (see below list) and it would be good to meet and whiteboard, surface goodo ideas that have gone dormant in JIRA, or revisit designs/proposals out in JIRA-attached google doc that need socializing. You can only come if you are wearing your bullshit hat. Topics we'd go over could include: + Our filesystem layout will not work if 1M regions (Matteo/Stack) + Current state of the offheaping of read path and alternate KeyValue implementation (Anoop/Ram) + Append rejigger (Elliott) + A Pv2-based Assign (Matteo/Steven) + Splitting meta/1M regions + The revived Backup (Vladimir) + Time (Enis) + The overloaded SequenceId (Stack) + Upstreaming IT testing (Dima/Sean) + hbase-2.0.0 I put names by folks I know could talk to the topic. If you want to take over a topic or put your name by one, just say. Suggest that discussion lead off with a 5-10minute on current state of thought/design/implementation. What do others think? What date would suit folks? Anyone want to host? Thanks, Matteo and St.Ack -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: DISCUSSION: lets do a developer workshop on near-term work
Works for me. I'll be back in the Bay Area the week of August 9th. We have done a _lot_ of work on backups as well - ours are more complicated as we wanted fast per-tenant restores, so data is grouped by tenant. Would like to sync up on that (hopefully some of the folks who wrote most of the code will be in town, I'll check). Also interested in the Time and offheap parts (although you folks usually do not like what I think about the offheap efforts :) ). Would like to add the following topics: - Timestamp Resolution. Or making space for more bits in the timestamps (happy to cover that, unless it's part of the Time topic) - Replication. We found that replication cannot keep up with high write loads, due to the fact that replicated is strictly single threaded per regionserver (even though we have multiple region servers on the sink side) - Spark integration (Ted Malaska?) OK... Out now to make a bullshit hat. -- Lars From: Sean Busbey bus...@cloudera.com To: dev dev@hbase.apache.org Sent: Tuesday, July 14, 2015 7:11 PM Subject: Re: DISCUSSION: lets do a developer workshop on near-term work I'm planning to be in the Bay area the week of the 24th of August. -- Sean On Jul 14, 2015 7:53 PM, Andrew Purtell apurt...@apache.org wrote: I can be up in your area in August. On Tue, Jul 14, 2015 at 5:31 PM, Stack st...@duboce.net wrote: On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar enis@gmail.com wrote: Sounds good. It has been a while we did the talk-aton. I'll be off starting 25 of July, so I prefer something next week if possible. You ever coming back? If so, when? I'm back on 10th of August (Mikhail on the 20th). St.Ack Enis On Tue, Jul 14, 2015 at 3:18 PM, Stack st...@duboce.net wrote: Matteo and I were thinking it time devs got together for a pow-wow. There is a bunch of stuff in flight at the moment (see below list) and it would be good to meet and whiteboard, surface goodo ideas that have gone dormant in JIRA, or revisit designs/proposals out in JIRA-attached google doc that need socializing. You can only come if you are wearing your bullshit hat. Topics we'd go over could include: + Our filesystem layout will not work if 1M regions (Matteo/Stack) + Current state of the offheaping of read path and alternate KeyValue implementation (Anoop/Ram) + Append rejigger (Elliott) + A Pv2-based Assign (Matteo/Steven) + Splitting meta/1M regions + The revived Backup (Vladimir) + Time (Enis) + The overloaded SequenceId (Stack) + Upstreaming IT testing (Dima/Sean) + hbase-2.0.0 I put names by folks I know could talk to the topic. If you want to take over a topic or put your name by one, just say. Suggest that discussion lead off with a 5-10minute on current state of thought/design/implementation. What do others think? What date would suit folks? Anyone want to host? Thanks, Matteo and St.Ack -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
[jira] [Resolved] (HBASE-11482) Optimize HBase TableInput/OutputFormats for exposing tables and snapshots as Spark RDDs
[ https://issues.apache.org/jira/browse/HBASE-11482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-11482. --- Resolution: Duplicate Fix Version/s: (was: 2.0.0) Closing as dupe of HBASE-13992. Optimize HBase TableInput/OutputFormats for exposing tables and snapshots as Spark RDDs --- Key: HBASE-11482 URL: https://issues.apache.org/jira/browse/HBASE-11482 Project: HBase Issue Type: New Feature Components: mapreduce, spark Reporter: Andrew Purtell Assignee: Ted Malaska A core concept of Apache Spark is the resilient distributed dataset (RDD), a fault-tolerant collection of elements that can be operated on in parallel. One can create a RDDs referencing a dataset in any external storage system offering a Hadoop InputFormat, like HBase's TableInputFormat and TableSnapshotInputFormat. Insure the integration is reasonable and provides good performance. Add the ability to save RDDs back to HBase with a {{saveAsHBaseTable}} action, implicitly creating necessary schema on demand. Add support for {{filter}} transformations that push predicates down to the server as HBase filters. Consider supporting conversions between Scala and Java types and HBase data using the HBase types library. Consider an option to lazily and automatically produce a snapshot only when needed, in a coordinated way. (Concurrently executing workers may want to materialize a table snapshot RDD at the same time.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13329) ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray
[ https://issues.apache.org/jira/browse/HBASE-13329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-13329. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to 2.0, 1.3, 1.2, 1.1, and 1.0. (0.98 does not have this issue) ArrayIndexOutOfBoundsException in CellComparator#getMinimumMidpointArray Key: HBASE-13329 URL: https://issues.apache.org/jira/browse/HBASE-13329 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.1 Environment: linux-debian-jessie ec2 - t2.micro instances Reporter: Ruben Aguiar Assignee: Lars Hofhansl Priority: Critical Attachments: 13329-asserts.patch, 13329-v1.patch, 13329.txt, HBASE-13329.test.00.branch-1.1.patch While trying to benchmark my opentsdb cluster, I've created a script that sends to hbase always the same value (in this case 1). After a few minutes, the whole region server crashes and the region itself becomes impossible to open again (cannot assign or unassign). After some investigation, what I saw on the logs is that when a Memstore flush is called on a large region (128mb) the process errors, killing the regionserver. On restart, replaying the edits generates the same error, making the region unavailable. Tried to manually unassign, assign or close_region. That didn't work because the code that reads/replays it crashes. From my investigation this seems to be an overflow issue. The logs show that the function getMinimumMidpointArray tried to access index -32743 of an array, extremely close to the minimum short value in Java. Upon investigation of the source code, it seems an index short is used, being incremented as long as the two vectors are the same, probably making it overflow on large vectors with equal data. Changing it to int should solve the problem. Here follows the hadoop logs of when the regionserver went down. Any help is appreciated. Any other information you need please do tell me: 2015-03-24 18:00:56,187 INFO [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220018516 with entries=143, filesize=134.70 MB; new WAL /hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427220056140 2015-03-24 18:00:56,188 INFO [regionserver//10.2.0.73:16020.logRoller] wal.FSHLog: Archiving hdfs://10.2.0.74:8020/hbase/WALs/10.2.0.73,16020,1427216382590/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 to hdfs://10.2.0.74:8020/hbase/oldWALs/10.2.0.73%2C16020%2C1427216382590.default.1427219987709 2015-03-24 18:04:35,722 INFO [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2., current region memstore size 128.04 MB 2015-03-24 18:04:36,154 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region server 10.2.0.73,16020,1427216382590: Replay of WAL required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1427133969325.52bc1994da0fea97563a4a656a58bec2. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1999) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1770) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1702) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException: -32743 at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:478) at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121
Re: Backup/Restore (HBASE-7192) design doc
Lemme have a look. Very interesting in this.Did you see the snapshot bug I just recently fixed where each snapshot would leave a Zookeeper watcher around for each region server? Pretty bad, and nobody noticed. -- Lars From: Vladimir Rodionov vladrodio...@gmail.com To: hbase-...@hadoop.apache.org Sent: Thursday, July 2, 2015 1:05 PM Subject: Backup/Restore (HBASE-7192) design doc Hi, folks Kindly soliciting feedback on a latest design doc: https://issues.apache.org/jira/browse/HBASE-7912 -Vlad
[jira] [Resolved] (HBASE-12765) SplitTransaction creates too many threads (potentially)
[ https://issues.apache.org/jira/browse/HBASE-12765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved HBASE-12765. --- Resolution: Invalid SplitTransaction creates too many threads (potentially) --- Key: HBASE-12765 URL: https://issues.apache.org/jira/browse/HBASE-12765 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Attachments: 12765.txt In splitStoreFiles(...) we create a new thread pool with as many threads as there are files to split. We should be able to do better. During times of very heavy write loads there might be a lot of files to split and multiple splits might be going on at the same time on the same region server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)