[jira] Commented: (HBASE-3382) Make HBase client work better under concurrent clients
[ https://issues.apache.org/jira/browse/HBASE-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984571#action_12984571 ] ryan rawson commented on HBASE-3382: I did some work to minimally rewrite the HBase client to use nio instead of oio, no actual architectural changes, just changes to see if we can't improve the performance of straight line reading of bytes off the network socket. I used the aforemetioned tracing system to get time for receiving the response, then used the data log to come up with a few different timings, here are the semi-raw data: nio: hadoop@sv4borg235:/homes/hadoop/borg221/hbase$ grep receiveResponse prof_1.txt | perl -n -e ' /split: (\d+?) .*size: (\d+)/ ; if ($2 > 100){ print ((int((10/$1)*$2))/(1024*1024)); print " $2 \n"; }' | sort -n | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 124.836689844131 hadoop@sv4borg235:/homes/hadoop/borg221/hbase$ grep receiveResponse prof_10.txt | perl -n -e ' /split: (\d+?) .*size: (\d+)/ ; if ($2 > 100){ print ((int((10/$1)*$2))/(1024*1024)); print " $2 \n"; }' | sort -n | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 112.391825942993 OIO: hadoop@sv4borg235:/homes/hadoop/borg221/hbase$ grep receiveResponse new_1thr.txt | perl -n -e ' /split: (\d+?) .*size: (\d+)/ ; if ($2 > 100){ print ((int((10/$1)*$2))/(1024*1024)); print " $2 \n"; }' | sort -n | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 135.158706989288 hadoop@sv4borg235:/homes/hadoop/borg221/hbase$ grep receiveResponse new_10thr.txt | perl -n -e ' /split: (\d+?) .*size: (\d+)/ ; if ($2 > 100){ print ((int((10/$1)*$2))/(1024*1024)); print " $2 \n"; }' | sort -n | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 120.16641916275 As you can see, the OIO client actually performed a bit better than the NIO client, under the same YCSB config (which is listed above), cached workload, etc. This is probably due to needing to cycle thru a select() call to wait for more data, rather than just letting the OS handle it. > Make HBase client work better under concurrent clients > -- > > Key: HBASE-3382 > URL: https://issues.apache.org/jira/browse/HBASE-3382 > Project: HBase > Issue Type: Bug > Components: performance >Reporter: ryan rawson > Attachments: HBASE-3382.txt > > > The HBase client uses 1 socket per regionserver for communication. This is > good for socket control but potentially bad for latency. How bad? I did a > simple YCSB test that had this config: > readproportion=0 > updateproportion=0 > scanproportion=1 > insertproportion=0 > fieldlength=10 > fieldcount=100 > requestdistribution=zipfian > scanlength=300 > scanlengthdistribution=zipfian > I ran this with 1 and 10 threads. The summary is as so: > 1 thread: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 35.871 > 10 threads: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 228.576 > We are taking a 6.5x latency hit in our client. But why? > First step was to move the deserialization out of the Connection thread, this > seemed like it could have a big win, an analog change on the server side got > a 20% performance improvement (already commited as HBASE-2941). I did this > and got about a 20% improvement again, with that 228ms number going to about > 190 ms. > So I then wrote a high performance nanosecond resolution tracing utility. > Clients can flag an API call, and we get tracing and numbers through the > client pipeline. What I found is that a lot of time is being spent in > receiving the response from the network. The code block is like so: > NanoProfiler.split(id, "receiveResponse"); > if (LOG.isDebugEnabled()) > LOG.debug(getName() + " got value #" + id); > Call call = calls.get(id); > size -= 4; // 4 byte off for id because we already read it. > ByteBuffer buf = ByteBuffer.allocate(size); > IOUtils.readFully(in, buf.array(), buf.arrayOffset(), size); > buf.limit(size); > buf.rewind(); > NanoProfiler.split(id, "setResponse", "Data size: " + size); > I came up with some numbers: > 11726 (receiveResponse) split: 64991689 overall: 133562895 Data size: 4288937 > 12163 (receiveResponse) split: 32743954 overall: 103787420 Data size: 1606273 > 12561 (receiveResponse) split: 3517940 overall: 83346740 Data size: 4 > 12136 (receiveResponse) split: 64448701 overall: 203872573 Data size: 3570569 > The first number is the internal counter for keeping requests unique from > HTable on down. The numbers are in ns, the data size is in bytes. > Doing some simple calculatio
[jira] Updated: (HBASE-3382) Make HBase client work better under concurrent clients
[ https://issues.apache.org/jira/browse/HBASE-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3382: --- Attachment: HBASE-3382-nio.txt a non-async, but nio port of hbase client. > Make HBase client work better under concurrent clients > -- > > Key: HBASE-3382 > URL: https://issues.apache.org/jira/browse/HBASE-3382 > Project: HBase > Issue Type: Bug > Components: performance >Reporter: ryan rawson >Assignee: ryan rawson > Attachments: HBASE-3382-nio.txt, HBASE-3382.txt > > > The HBase client uses 1 socket per regionserver for communication. This is > good for socket control but potentially bad for latency. How bad? I did a > simple YCSB test that had this config: > readproportion=0 > updateproportion=0 > scanproportion=1 > insertproportion=0 > fieldlength=10 > fieldcount=100 > requestdistribution=zipfian > scanlength=300 > scanlengthdistribution=zipfian > I ran this with 1 and 10 threads. The summary is as so: > 1 thread: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 35.871 > 10 threads: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 228.576 > We are taking a 6.5x latency hit in our client. But why? > First step was to move the deserialization out of the Connection thread, this > seemed like it could have a big win, an analog change on the server side got > a 20% performance improvement (already commited as HBASE-2941). I did this > and got about a 20% improvement again, with that 228ms number going to about > 190 ms. > So I then wrote a high performance nanosecond resolution tracing utility. > Clients can flag an API call, and we get tracing and numbers through the > client pipeline. What I found is that a lot of time is being spent in > receiving the response from the network. The code block is like so: > NanoProfiler.split(id, "receiveResponse"); > if (LOG.isDebugEnabled()) > LOG.debug(getName() + " got value #" + id); > Call call = calls.get(id); > size -= 4; // 4 byte off for id because we already read it. > ByteBuffer buf = ByteBuffer.allocate(size); > IOUtils.readFully(in, buf.array(), buf.arrayOffset(), size); > buf.limit(size); > buf.rewind(); > NanoProfiler.split(id, "setResponse", "Data size: " + size); > I came up with some numbers: > 11726 (receiveResponse) split: 64991689 overall: 133562895 Data size: 4288937 > 12163 (receiveResponse) split: 32743954 overall: 103787420 Data size: 1606273 > 12561 (receiveResponse) split: 3517940 overall: 83346740 Data size: 4 > 12136 (receiveResponse) split: 64448701 overall: 203872573 Data size: 3570569 > The first number is the internal counter for keeping requests unique from > HTable on down. The numbers are in ns, the data size is in bytes. > Doing some simple calculations, we see for the first line we were reading at > about 31 MB/sec. The second one is even worse. Other calls are like: > 26 (receiveResponse) split: 7985400 overall: 21546226 Data size: 850429 > which is 107 MB/sec which is pretty close to the maximum of gige. In my set > up, the ycsb client ran on the master node and HAD to use network to talk to > regionservers. > Even at full line rate, we could still see unacceptable hold ups of unrelated > calls that just happen to need to talk to the same regionserver. > This issue is about these findings, what to do, how to improve. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-3382) Make HBase client work better under concurrent clients
[ https://issues.apache.org/jira/browse/HBASE-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson reassigned HBASE-3382: -- Assignee: ryan rawson > Make HBase client work better under concurrent clients > -- > > Key: HBASE-3382 > URL: https://issues.apache.org/jira/browse/HBASE-3382 > Project: HBase > Issue Type: Bug > Components: performance >Reporter: ryan rawson >Assignee: ryan rawson > Attachments: HBASE-3382-nio.txt, HBASE-3382.txt > > > The HBase client uses 1 socket per regionserver for communication. This is > good for socket control but potentially bad for latency. How bad? I did a > simple YCSB test that had this config: > readproportion=0 > updateproportion=0 > scanproportion=1 > insertproportion=0 > fieldlength=10 > fieldcount=100 > requestdistribution=zipfian > scanlength=300 > scanlengthdistribution=zipfian > I ran this with 1 and 10 threads. The summary is as so: > 1 thread: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 35.871 > 10 threads: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 228.576 > We are taking a 6.5x latency hit in our client. But why? > First step was to move the deserialization out of the Connection thread, this > seemed like it could have a big win, an analog change on the server side got > a 20% performance improvement (already commited as HBASE-2941). I did this > and got about a 20% improvement again, with that 228ms number going to about > 190 ms. > So I then wrote a high performance nanosecond resolution tracing utility. > Clients can flag an API call, and we get tracing and numbers through the > client pipeline. What I found is that a lot of time is being spent in > receiving the response from the network. The code block is like so: > NanoProfiler.split(id, "receiveResponse"); > if (LOG.isDebugEnabled()) > LOG.debug(getName() + " got value #" + id); > Call call = calls.get(id); > size -= 4; // 4 byte off for id because we already read it. > ByteBuffer buf = ByteBuffer.allocate(size); > IOUtils.readFully(in, buf.array(), buf.arrayOffset(), size); > buf.limit(size); > buf.rewind(); > NanoProfiler.split(id, "setResponse", "Data size: " + size); > I came up with some numbers: > 11726 (receiveResponse) split: 64991689 overall: 133562895 Data size: 4288937 > 12163 (receiveResponse) split: 32743954 overall: 103787420 Data size: 1606273 > 12561 (receiveResponse) split: 3517940 overall: 83346740 Data size: 4 > 12136 (receiveResponse) split: 64448701 overall: 203872573 Data size: 3570569 > The first number is the internal counter for keeping requests unique from > HTable on down. The numbers are in ns, the data size is in bytes. > Doing some simple calculations, we see for the first line we were reading at > about 31 MB/sec. The second one is even worse. Other calls are like: > 26 (receiveResponse) split: 7985400 overall: 21546226 Data size: 850429 > which is 107 MB/sec which is pretty close to the maximum of gige. In my set > up, the ycsb client ran on the master node and HAD to use network to talk to > regionservers. > Even at full line rate, we could still see unacceptable hold ups of unrelated > calls that just happen to need to talk to the same regionserver. > This issue is about these findings, what to do, how to improve. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3453) How about RowPaginationFilter
[ https://issues.apache.org/jira/browse/HBASE-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984575#action_12984575 ] ryan rawson commented on HBASE-3453: hey, a common tactic is for the http layer/client to store the last previous retrieved row so when people hit 'next' the code can start from there, and you retrieve PAGE_SIZE+1 to know the 'next' row id. good luck! > How about RowPaginationFilter > - > > Key: HBASE-3453 > URL: https://issues.apache.org/jira/browse/HBASE-3453 > Project: HBase > Issue Type: Wish > Components: client >Affects Versions: 0.90.1 > Environment: windows 7 >Reporter: ncanis > Attachments: RowPaginationFilter.java > > > I know hbase has already PageFilter. > But, sometime we need to get row data from specified position. > * only for newbie: > If you want to write custom Filter, you also add filter class to an hbase > server classpath. > {code:title=RowPaginationFilter|borderStyle=solid} > /** >* Constructor that takes a maximum page size. >* >* get row from offset to offset+limit ( offset<= row<=offset+limit ) >* @param offset start position >* @param limit count from offset position >*/ > public RowPaginationFilter(final int offset, final int limit) { > this.offset = offset; > this.limit = limit; > } > //true to exclude row, false to include row. > @Override > public boolean filterRow() { > > boolean isExclude = this.rowsAccepted < this.offset || > this.rowsAccepted>=this.limit+this.offset; > rowsAccepted++; > return isExclude; > } > {code} > - -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3382) Make HBase client work better under concurrent clients
[ https://issues.apache.org/jira/browse/HBASE-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984576#action_12984576 ] ryan rawson commented on HBASE-3382: So it's pretty clear that to improve performance under load, we should be using multiple sockets. Here is a rough block diagram of how the client works: HTable -- calls --> HConnectionImplementation -- calls --> HBaseRPC.waitForProxy() In waitForProxy, a HBaseClient object is grabbed and associated with the proxy via the embedded Invoker object. Let's call this 'client' (as does the code) HCI -- calls -> ProxyObject (anonymous) -->client.call() Now a few notes: - The HCI will reuse the same proxy object a few times, if not a LOT of times. - The proxy object has 1 reference to 1 HBaseClient object. - The HBaseClient object has 1 socket/connection per Regionserver. Multiple threads will interleave their requests & replies (in any order, out of order replies ok) on the 1 socket. So there are a few different approaches, in HBASE-2939 a patch allows for every new call to grab a different connection off the pool, with different pool types. This has the disadvantage of needing 1 thread per extra socket to a RS. Another solution is to change the Connection object & thread to do async on multiple sockets to allow 1 thread per regionserver, but multiple sockets under it all. another solution is to use a nio framework to implement this instead of doing raw nio programming. > Make HBase client work better under concurrent clients > -- > > Key: HBASE-3382 > URL: https://issues.apache.org/jira/browse/HBASE-3382 > Project: HBase > Issue Type: Bug > Components: performance >Reporter: ryan rawson >Assignee: ryan rawson > Attachments: HBASE-3382-nio.txt, HBASE-3382.txt > > > The HBase client uses 1 socket per regionserver for communication. This is > good for socket control but potentially bad for latency. How bad? I did a > simple YCSB test that had this config: > readproportion=0 > updateproportion=0 > scanproportion=1 > insertproportion=0 > fieldlength=10 > fieldcount=100 > requestdistribution=zipfian > scanlength=300 > scanlengthdistribution=zipfian > I ran this with 1 and 10 threads. The summary is as so: > 1 thread: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 35.871 > 10 threads: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 228.576 > We are taking a 6.5x latency hit in our client. But why? > First step was to move the deserialization out of the Connection thread, this > seemed like it could have a big win, an analog change on the server side got > a 20% performance improvement (already commited as HBASE-2941). I did this > and got about a 20% improvement again, with that 228ms number going to about > 190 ms. > So I then wrote a high performance nanosecond resolution tracing utility. > Clients can flag an API call, and we get tracing and numbers through the > client pipeline. What I found is that a lot of time is being spent in > receiving the response from the network. The code block is like so: > NanoProfiler.split(id, "receiveResponse"); > if (LOG.isDebugEnabled()) > LOG.debug(getName() + " got value #" + id); > Call call = calls.get(id); > size -= 4; // 4 byte off for id because we already read it. > ByteBuffer buf = ByteBuffer.allocate(size); > IOUtils.readFully(in, buf.array(), buf.arrayOffset(), size); > buf.limit(size); > buf.rewind(); > NanoProfiler.split(id, "setResponse", "Data size: " + size); > I came up with some numbers: > 11726 (receiveResponse) split: 64991689 overall: 133562895 Data size: 4288937 > 12163 (receiveResponse) split: 32743954 overall: 103787420 Data size: 1606273 > 12561 (receiveResponse) split: 3517940 overall: 83346740 Data size: 4 > 12136 (receiveResponse) split: 64448701 overall: 203872573 Data size: 3570569 > The first number is the internal counter for keeping requests unique from > HTable on down. The numbers are in ns, the data size is in bytes. > Doing some simple calculations, we see for the first line we were reading at > about 31 MB/sec. The second one is even worse. Other calls are like: > 26 (receiveResponse) split: 7985400 overall: 21546226 Data size: 850429 > which is 107 MB/sec which is pretty close to the maximum of gige. In my set > up, the ycsb client ran on the master node and HAD to use network to talk to > regionservers. > Even at full line rate, we could still see unacceptable hold ups of unrelated > calls that just happen to need to talk to the same regionserver. > This issue is about these findings, what to do, how to improve. -- This message is automatically g
[jira] Commented: (HBASE-3374) Our jruby jar has *GPL jars in it; fix
[ https://issues.apache.org/jira/browse/HBASE-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985426#action_12985426 ] ryan rawson commented on HBASE-3374: hi Charles, The patch has already been submitted, the deed is done. In our survey of the code, there were about 4 different libraries embedded and included in JRuby, some of LGPL some of GPLv3. Some of them look pretty fundamental, such as the dependency on: http://code.google.com/p/jvm-language-runtime As you probably know, the ASF has pretty strict rules regarding licensing, so undoing the downgrade is not possible until a JRuby that has ASF compatible licensing is available. > Our jruby jar has *GPL jars in it; fix > -- > > Key: HBASE-3374 > URL: https://issues.apache.org/jira/browse/HBASE-3374 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.0 > > Attachments: jruby.txt > > > The latest JRuby's complete jar bundles *GPL jars (JNA and JFFI among > others). It looks like the functionality we depend on -- the shell in > particular -- makes use of these dirty jars so they are hard to strip. They > came in because we (I!) just updated our JRuby w/o checking in on what > updates contained. JRuby has been doing this for a while now (1.1.x added > the first LGPL). You have to go all the ways back to the original HBase > checkin, HBASE-487, of JRuby -- 1.0.3 -- to get a JRuby w/o *GPL jars. > Plan is to try and revert our JRuby all the ways down to 1.0.3 before > shipping 0.90.0. Thats what this issue is about. > We should also look into moving off JRuby in the medium to long-term. Its > kinda awkward sticking on an old version that is no longer supported. I'll > open an issue for that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3374) Our jruby jar has *GPL jars in it; fix
[ https://issues.apache.org/jira/browse/HBASE-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985428#action_12985428 ] ryan rawson commented on HBASE-3374: Based on COPYING from: https://github.com/jruby/jruby/blob/master/COPYING The following libraries are a problem: build_lib/jffi*jar (http://kenai.com/projects/jffi) and build_lib/jgrapht-jdk1.5.jar (http://jgrapht.sourceforge.net) are distributed under the GPL v3. build_lib/jna*jar (http://jna.dev.java.net) are distributed under the LGPL-2.1+ license. build_lib/invokedynamic.jar and build_lib/jsr292-mock.jar (http://code.google.com/p/jvm-language-runtime) are distributed under the LGPL license. Also there are a few broken links such as http://projectkenai.com/projects/jaffl which are a little concerning. > Our jruby jar has *GPL jars in it; fix > -- > > Key: HBASE-3374 > URL: https://issues.apache.org/jira/browse/HBASE-3374 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.0 > > Attachments: jruby.txt > > > The latest JRuby's complete jar bundles *GPL jars (JNA and JFFI among > others). It looks like the functionality we depend on -- the shell in > particular -- makes use of these dirty jars so they are hard to strip. They > came in because we (I!) just updated our JRuby w/o checking in on what > updates contained. JRuby has been doing this for a while now (1.1.x added > the first LGPL). You have to go all the ways back to the original HBase > checkin, HBASE-487, of JRuby -- 1.0.3 -- to get a JRuby w/o *GPL jars. > Plan is to try and revert our JRuby all the ways down to 1.0.3 before > shipping 0.90.0. Thats what this issue is about. > We should also look into moving off JRuby in the medium to long-term. Its > kinda awkward sticking on an old version that is no longer supported. I'll > open an issue for that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3479) javadoc: Filters/Scan/Get should be explicit about the requirement to add columns you are filtering on
javadoc: Filters/Scan/Get should be explicit about the requirement to add columns you are filtering on -- Key: HBASE-3479 URL: https://issues.apache.org/jira/browse/HBASE-3479 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: ryan rawson Fix For: 0.90.1 improve our javadoc! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3455) Heap fragmentation in region server
[ https://issues.apache.org/jira/browse/HBASE-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986853#action_12986853 ] ryan rawson commented on HBASE-3455: for ICVs, we might be force flushing due to the maximum hlog fairly 'frequently' anyways. if we spill to the next slab, will we let the GC free up the previous slabs? maybe once we fill up an allocation we dereference the slab from the slab manager that way we can let GC do the hard work for us? > Heap fragmentation in region server > --- > > Key: HBASE-3455 > URL: https://issues.apache.org/jira/browse/HBASE-3455 > Project: HBase > Issue Type: Brainstorming > Components: performance, regionserver >Reporter: Todd Lipcon >Priority: Critical > Attachments: collapse-arrays.patch, HBasefragmentation.pdf, > icv-frag.png, mslab-1.txt, parse-fls-statistics.py, with-kvallocs.png > > > Stop-the-world GC pauses have long been a problem in HBase. "Concurrent mode > failures" can usually be tuned around by setting the initiating occupancy > fraction low, but eventually the heap becomes fragmented and a promotion > failure occurs. > This JIRA is to do research/experiments about the heap fragmentation issue > and possible solutions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3455) Heap fragmentation in region server
[ https://issues.apache.org/jira/browse/HBASE-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986866#action_12986866 ] ryan rawson commented on HBASE-3455: we cannot replace the old value with the new one because there is no way to atomically do that. > Heap fragmentation in region server > --- > > Key: HBASE-3455 > URL: https://issues.apache.org/jira/browse/HBASE-3455 > Project: HBase > Issue Type: Brainstorming > Components: performance, regionserver >Reporter: Todd Lipcon >Priority: Critical > Attachments: collapse-arrays.patch, HBasefragmentation.pdf, > icv-frag.png, mslab-1.txt, parse-fls-statistics.py, with-kvallocs.png > > > Stop-the-world GC pauses have long been a problem in HBase. "Concurrent mode > failures" can usually be tuned around by setting the initiating occupancy > fraction low, but eventually the heap becomes fragmented and a promotion > failure occurs. > This JIRA is to do research/experiments about the heap fragmentation issue > and possible solutions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3480) Reduce the size of Result serialization
Reduce the size of Result serialization --- Key: HBASE-3480 URL: https://issues.apache.org/jira/browse/HBASE-3480 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: ryan rawson When faced with a gigabit ethernet network connection, things are pretty slow actually. For example, let's take a 2 MB reply, using a 120MB/sec line rate, we are talking about about 16ms to transfer that data across a gige line. This is a pretty significant amount of time. So this JIRA is about reducing the size of the Result[] serialization. By exploiting family and qualifier and rowkey duplication, I created a simple encoding scheme to use a dictionary instead of literal strings. in my testing, I am seeing some success with the sizes. Average serialized size is about 1/2 of previous, but time to serialize on the regionserver side is way up, by a factor of 10x. This might be due to the simplistic first implementation however. Here is the post change size: grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; print $1, " ", $2, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 377047.1125 Here is the pre change size: grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; print $1, " ", $2, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 601078.505882353 That is about a 60% improvement in size. But times are not so good, here are some samples of the old, in (size) (time in ns) 3874599 10685836 5582725 11525888 so that is about 11ms to serialize 3-5mb of data. In the new implementation: 1898788 118504672 1630058 91133003 this is 118-91ms for serialized sizes of 1.6-1.8 MB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3480) Reduce the size of Result serialization
[ https://issues.apache.org/jira/browse/HBASE-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3480: --- Attachment: HBASE-3480.txt > Reduce the size of Result serialization > --- > > Key: HBASE-3480 > URL: https://issues.apache.org/jira/browse/HBASE-3480 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: ryan rawson > Attachments: HBASE-3480.txt > > > When faced with a gigabit ethernet network connection, things are pretty slow > actually. For example, let's take a 2 MB reply, using a 120MB/sec line rate, > we are talking about about 16ms to transfer that data across a gige line. > This is a pretty significant amount of time. > So this JIRA is about reducing the size of the Result[] serialization. By > exploiting family and qualifier and rowkey duplication, I created a simple > encoding scheme to use a dictionary instead of literal strings. > in my testing, I am seeing some success with the sizes. Average serialized > size is about 1/2 of previous, but time to serialize on the regionserver side > is way up, by a factor of 10x. This might be due to the simplistic first > implementation however. > Here is the post change size: > grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; > print $1, " ", $2, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += > $_; $count++; END {print $sum/$count, "\n"}' > 377047.1125 > Here is the pre change size: > grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; > print $1, " ", $2, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += > $_; $count++; END {print $sum/$count, "\n"}' > 601078.505882353 > That is about a 60% improvement in size. > But times are not so good, here are some samples of the old, in (size) (time > in ns) > 3874599 10685836 > 5582725 11525888 > so that is about 11ms to serialize 3-5mb of data. > In the new implementation: > 1898788 118504672 > 1630058 91133003 > this is 118-91ms for serialized sizes of 1.6-1.8 MB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery
[ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986899#action_12986899 ] ryan rawson commented on HBASE-3481: Your analysis sounds correct. The correct thing to do here is to provide the MAX_SEQ_ID from the memstore KVs, not from the "current" HLog seqid. good find! > max seq id in flushed file can be larger than its correct value causing data > loss during recovery > - > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug >Reporter: Kannan Muthukkaruppan >Priority: Critical > > [While doing some cluster kill tests, I noticed some missing data after log > recovery. Upon investigating further, and pretty printing contents of HFiles > and recovered logs, this is my analysis of the situation/bug. Please confirm > the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile should > be the max sequence id of all KVs in the memstore. However, we seem to be > simply obtain the current sequence id from the HRegion, and stamp the HFile's > MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in > pseudo-code) > {code} >1. HLog.append(); >1.1 obtainSeqNum() >1.2 writeToWAL() >2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented to > say 15 (if there are 5 more outstanding puts)... but if their writeToWAL() is > still in progress. In this case, none of these edits (11..15) would have been > written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record its > MAX_SEQ_ID as 16 instead of 10 (because that's what obtainSeqNum() would > return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain the > data for edits 11..15. > Now, at this point if the region server were to crash, and we run log > recovery, the splits all go through correctly, and a correct recovered.edits > file is generated with the edits 11..15. > Next, when the region is opened, the HRegion notes that one of the store file > says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it > skips replaying edits 11..15. Or in other words, data loss. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery
[ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986907#action_12986907 ] ryan rawson commented on HBASE-3481: if we avoid the skip behaviour, which isnt optimization, we will introduce duplicate KVs into the HFiles, which for people who are depending on the version count to keep a reasonable, important, number of versions will be trouble for them. why not just nab the largest seqid during flush and provide it at the end? The footer is written out at close() time, and we have time between the last KV being appened and close() being called to add a correct SEQ_ID. > max seq id in flushed file can be larger than its correct value causing data > loss during recovery > - > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug >Reporter: Kannan Muthukkaruppan >Priority: Critical > > [While doing some cluster kill tests, I noticed some missing data after log > recovery. Upon investigating further, and pretty printing contents of HFiles > and recovered logs, this is my analysis of the situation/bug. Please confirm > the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile should > be the max sequence id of all KVs in the memstore. However, we seem to simply > obtain the current sequence id from the HRegion, and stamp the HFile's > MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply: > {code} > private long obtainSeqNum() { > return this.logSeqNum.incrementAndGet(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in > pseudo-code) > {code} > 1. HLog.append(); >1.1 obtainSeqNum() >1.2 writeToWAL() > 2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented to > say 15 if there are 5 more outstanding puts. Say the writeToWAL() is still in > progress for these puts. In this case, none of these edits (11..15) would > have been written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record its > MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what > obtainSeqNum() would return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain the > data for edits 11..15. > Now, at this point if the region server were to crash, and we run log > recovery, the splits all go through correctly, and a correct recovered.edits > file is generated with the edits 11..15. > Next, when the region is opened, the HRegion notes that one of the store file > says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it > skips replaying edits 11..15. Or in other words, data loss. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3436) HBase start scripts should not include maven repos jars if they exist in lib
[ https://issues.apache.org/jira/browse/HBASE-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986908#action_12986908 ] ryan rawson commented on HBASE-3436: we've enjoyed shipping 'build in place' tar.gz in the past, and i'd like to keep that going if at all possible, so going with the [ -x 'target' ] kind of logic seems good to me. > HBase start scripts should not include maven repos jars if they exist in lib > > > Key: HBASE-3436 > URL: https://issues.apache.org/jira/browse/HBASE-3436 > Project: HBase > Issue Type: Bug >Reporter: Bill Graham >Priority: Critical > Fix For: 0.90.1 > > > When starting the master, the jars of the users maven repos get injected into > the classpath as a convenience to developers. This can cause quite a > debugging headache when hadoop jars don't match what's on the cluster. > We should change the start scripts to not do this if the jars exist in the > lib dir. Or better yet, we would only include maven repos jars if an optional > param exists in hbase-env.sh. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery
[ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986921#action_12986921 ] ryan rawson commented on HBASE-3481: In HRegion.internalFlushCache we have this logic: this.updatesLock.writeLock().lock(); final long currentMemStoreSize = this.memstoreSize.get(); List storeFlushers = new ArrayList(stores.size()); try { sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); completeSequenceId = this.getCompleteCacheFlushSequenceId(sequenceId); for (Store s : stores.values()) { storeFlushers.add(s.getStoreFlusher(completeSequenceId)); } // prepare flush (take a snapshot) for (StoreFlusher flusher : storeFlushers) { flusher.prepare(); } } finally { this.updatesLock.writeLock().unlock(); } We take a write lock, no more puts/deletes/whatever can be done to this hregion. we then grab a seqid (wal.startCacheFlush). We now snapshot everything. we then release the update lock and mutations can happen to the region again. The flush sequence id should lie exactly between the snapshot and the memstore. Given this code, I'm not sure how to explain what you are seeing... But this logic seems spot on and correct. On Wed, Jan 26, 2011 at 1:14 AM, Kannan Muthukkaruppan (JIRA) > max seq id in flushed file can be larger than its correct value causing data > loss during recovery > - > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug >Reporter: Kannan Muthukkaruppan >Priority: Critical > > [While doing some cluster kill tests, I noticed some missing data after log > recovery. Upon investigating further, and pretty printing contents of HFiles > and recovered logs, this is my analysis of the situation/bug. Please confirm > the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile should > be the max sequence id of all KVs in the memstore. However, we seem to simply > obtain the current sequence id from the HRegion, and stamp the HFile's > MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply: > {code} > private long obtainSeqNum() { > return this.logSeqNum.incrementAndGet(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in > pseudo-code) > {code} > 1. HLog.append(); >1.1 obtainSeqNum() >1.2 writeToWAL() > 2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented to > say 15 if there are 5 more outstanding puts. Say the writeToWAL() is still in > progress for these puts. In this case, none of these edits (11..15) would > have been written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record its > MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what > obtainSeqNum() would return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain the > data for edits 11..15. > Now, at this point if the region server were to crash, and we run log > recovery, the splits all go through correctly, and a correct recovered.edits > file is generated with the edits 11..15. > Next, when the region is opened, the HRegion notes that one of the store file > says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it > skips replaying edits 11..15. Or in other words, data loss. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery
[ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3481: --- Attachment: HBASE-3481.txt here is a patch adding in those locks > max seq id in flushed file can be larger than its correct value causing data > loss during recovery > - > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug >Reporter: Kannan Muthukkaruppan >Priority: Critical > Attachments: HBASE-3481.txt > > > [While doing some cluster kill tests, I noticed some missing data after log > recovery. Upon investigating further, and pretty printing contents of HFiles > and recovered logs, this is my analysis of the situation/bug. Please confirm > the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile should > be the max sequence id of all KVs in the memstore. However, we seem to simply > obtain the current sequence id from the HRegion, and stamp the HFile's > MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply: > {code} > private long obtainSeqNum() { > return this.logSeqNum.incrementAndGet(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in > pseudo-code) > {code} > 1. HLog.append(); >1.1 obtainSeqNum() >1.2 writeToWAL() > 2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented to > say 15 if there are 5 more outstanding puts. Say the writeToWAL() is still in > progress for these puts. In this case, none of these edits (11..15) would > have been written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record its > MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what > obtainSeqNum() would return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain the > data for edits 11..15. > Now, at this point if the region server were to crash, and we run log > recovery, the splits all go through correctly, and a correct recovered.edits > file is generated with the edits 11..15. > Next, when the region is opened, the HRegion notes that one of the store file > says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it > skips replaying edits 11..15. Or in other words, data loss. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3480) Reduce the size of Result serialization
[ https://issues.apache.org/jira/browse/HBASE-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987418#action_12987418 ] ryan rawson commented on HBASE-3480: I changed the code to use LZF pure java compression from this library: https://github.com/ning/compress The result size is smaller now: grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns compressed: (true|false)/ ; print $1, " ", $2, , " ", $3, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 277735.361445783 But the times still arent so great: 1775773 106297860 true 1620568 68043741 true 1334129 98508585 true 1408999 78860459 true 1264817 60595079 true 622714 28482354 true 511205 23480742 true The 'true' means that this response was compressed. The first column is response size from HRS -> client in bytes, the second column is time it took to serialize, including both the Result.write* code and the compression in nanoseconds. We may be better off with a C-based compression algorithm, which would be feasible via this mechanism: - we will only compress responses we know the size for, and if the size > $THRESHOLD - we can serialize the Result/Result[] into a DirectByteBuffer - we can then compress the resulting DirectByteBuffer in to a new one - we can then use nio to directly reply from this DBB. I think we could dig up some benchmarks for the java vs C implementation of LZF and figure out if this might be worthwhile or not. > Reduce the size of Result serialization > --- > > Key: HBASE-3480 > URL: https://issues.apache.org/jira/browse/HBASE-3480 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: ryan rawson > Attachments: HBASE-3480.txt > > > When faced with a gigabit ethernet network connection, things are pretty slow > actually. For example, let's take a 2 MB reply, using a 120MB/sec line rate, > we are talking about about 16ms to transfer that data across a gige line. > This is a pretty significant amount of time. > So this JIRA is about reducing the size of the Result[] serialization. By > exploiting family and qualifier and rowkey duplication, I created a simple > encoding scheme to use a dictionary instead of literal strings. > in my testing, I am seeing some success with the sizes. Average serialized > size is about 1/2 of previous, but time to serialize on the regionserver side > is way up, by a factor of 10x. This might be due to the simplistic first > implementation however. > Here is the post change size: > grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; > print $1, " ", $2, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += > $_; $count++; END {print $sum/$count, "\n"}' > 377047.1125 > Here is the pre change size: > grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; > print $1, " ", $2, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += > $_; $count++; END {print $sum/$count, "\n"}' > 601078.505882353 > That is about a 60% improvement in size. > But times are not so good, here are some samples of the old, in (size) (time > in ns) > 3874599 10685836 > 5582725 11525888 > so that is about 11ms to serialize 3-5mb of data. > In the new implementation: > 1898788 118504672 > 1630058 91133003 > this is 118-91ms for serialized sizes of 1.6-1.8 MB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3480) Reduce the size of Result serialization
[ https://issues.apache.org/jira/browse/HBASE-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3480: --- Attachment: HBASE-3480-lzf.txt > Reduce the size of Result serialization > --- > > Key: HBASE-3480 > URL: https://issues.apache.org/jira/browse/HBASE-3480 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: ryan rawson > Attachments: HBASE-3480-lzf.txt, HBASE-3480.txt > > > When faced with a gigabit ethernet network connection, things are pretty slow > actually. For example, let's take a 2 MB reply, using a 120MB/sec line rate, > we are talking about about 16ms to transfer that data across a gige line. > This is a pretty significant amount of time. > So this JIRA is about reducing the size of the Result[] serialization. By > exploiting family and qualifier and rowkey duplication, I created a simple > encoding scheme to use a dictionary instead of literal strings. > in my testing, I am seeing some success with the sizes. Average serialized > size is about 1/2 of previous, but time to serialize on the regionserver side > is way up, by a factor of 10x. This might be due to the simplistic first > implementation however. > Here is the post change size: > grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; > print $1, " ", $2, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += > $_; $count++; END {print $sum/$count, "\n"}' > 377047.1125 > Here is the pre change size: > grep 'Serialized size' * | perl -ne '/Serialized size: (\d+?) in (\d+?) ns/ ; > print $1, " ", $2, "\n" if $1 > 1;' | cut -f1 -d' ' | perl -ne '$sum += > $_; $count++; END {print $sum/$count, "\n"}' > 601078.505882353 > That is about a 60% improvement in size. > But times are not so good, here are some samples of the old, in (size) (time > in ns) > 3874599 10685836 > 5582725 11525888 > so that is about 11ms to serialize 3-5mb of data. > In the new implementation: > 1898788 118504672 > 1630058 91133003 > this is 118-91ms for serialized sizes of 1.6-1.8 MB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery
[ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987449#action_12987449 ] ryan rawson commented on HBASE-3481: ill commit in am > max seq id in flushed file can be larger than its correct value causing data > loss during recovery > - > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug >Reporter: Kannan Muthukkaruppan >Assignee: ryan rawson >Priority: Blocker > Fix For: 0.90.1 > > Attachments: HBASE-3481.txt > > > [While doing some cluster kill tests, I noticed some missing data after log > recovery. Upon investigating further, and pretty printing contents of HFiles > and recovered logs, this is my analysis of the situation/bug. Please confirm > the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile should > be the max sequence id of all KVs in the memstore. However, we seem to simply > obtain the current sequence id from the HRegion, and stamp the HFile's > MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply: > {code} > private long obtainSeqNum() { > return this.logSeqNum.incrementAndGet(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in > pseudo-code) > {code} > 1. HLog.append(); >1.1 obtainSeqNum() >1.2 writeToWAL() > 2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented to > say 15 if there are 5 more outstanding puts. Say the writeToWAL() is still in > progress for these puts. In this case, none of these edits (11..15) would > have been written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record its > MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what > obtainSeqNum() would return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain the > data for edits 11..15. > Now, at this point if the region server were to crash, and we run log > recovery, the splits all go through correctly, and a correct recovered.edits > file is generated with the edits 11..15. > Next, when the region is opened, the HRegion notes that one of the store file > says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it > skips replaying edits 11..15. Or in other words, data loss. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery
[ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3481: --- Resolution: Fixed Fix Version/s: 0.92.0 Status: Resolved (was: Patch Available) > max seq id in flushed file can be larger than its correct value causing data > loss during recovery > - > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug >Reporter: Kannan Muthukkaruppan >Assignee: ryan rawson >Priority: Blocker > Fix For: 0.90.1, 0.92.0 > > Attachments: HBASE-3481.txt > > > [While doing some cluster kill tests, I noticed some missing data after log > recovery. Upon investigating further, and pretty printing contents of HFiles > and recovered logs, this is my analysis of the situation/bug. Please confirm > the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile should > be the max sequence id of all KVs in the memstore. However, we seem to simply > obtain the current sequence id from the HRegion, and stamp the HFile's > MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply: > {code} > private long obtainSeqNum() { > return this.logSeqNum.incrementAndGet(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in > pseudo-code) > {code} > 1. HLog.append(); >1.1 obtainSeqNum() >1.2 writeToWAL() > 2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented to > say 15 if there are 5 more outstanding puts. Say the writeToWAL() is still in > progress for these puts. In this case, none of these edits (11..15) would > have been written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record its > MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what > obtainSeqNum() would return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain the > data for edits 11..15. > Now, at this point if the region server were to crash, and we run log > recovery, the splits all go through correctly, and a correct recovered.edits > file is generated with the edits 11..15. > Next, when the region is opened, the HRegion notes that one of the store file > says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it > skips replaying edits 11..15. Or in other words, data loss. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2939) Allow Client-Side Connection Pooling
[ https://issues.apache.org/jira/browse/HBASE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987454#action_12987454 ] ryan rawson commented on HBASE-2939: btw the recent patch applies with no major issues, some import jazz but intellij fixes that. I'm working on perf testing this, but I'm having problems with my test env right now. > Allow Client-Side Connection Pooling > > > Key: HBASE-2939 > URL: https://issues.apache.org/jira/browse/HBASE-2939 > Project: HBase > Issue Type: Improvement > Components: client >Affects Versions: 0.89.20100621 >Reporter: Karthick Sankarachary >Assignee: ryan rawson >Priority: Critical > Fix For: 0.92.0 > > Attachments: HBASE-2939-0.20.6.patch, HBASE-2939.patch, > HBASE-2939.patch > > > By design, the HBase RPC client multiplexes calls to a given region server > (or the master for that matter) over a single socket, access to which is > managed by a connection thread defined in the HBaseClient class. While this > approach may suffice for most cases, it tends to break down in the context of > a real-time, multi-threaded server, where latencies need to be lower and > throughputs higher. > In brief, the problem is that we dedicate one thread to handle all > client-side reads and writes for a given server, which in turn forces them to > share the same socket. As load increases, this is bound to serialize calls on > the client-side. In particular, when the rate at which calls are submitted to > the connection thread is greater than that at which the server responds, then > some of those calls will inevitably end up sitting idle, just waiting their > turn to go over the wire. > In general, sharing sockets across multiple client threads is a good idea, > but limiting the number of such sockets to one may be overly restrictive for > certain cases. Here, we propose a way of defining multiple sockets per server > endpoint, access to which may be managed through either a load-balancing or > thread-local pool. To that end, we define the notion of a SharedMap, which > maps a key to a resource pool, and supports both of those pool types. > Specifically, we will apply that map in the HBaseClient, to associate > multiple connection threads with each server endpoint (denoted by a > connection id). > Currently, the SharedMap supports the following types of pools: > * A ThreadLocalPool, which represents a pool that builds on the > ThreadLocal class. It essentially binds the resource to the thread from which > it is accessed. > * A ReusablePool, which represents a pool that builds on the LinkedList > class. It essentially allows resources to be checked out, at which point it > is (temporarily) removed from the pool. When the resource is no longer > required, it should be returned to the pool in order to be reused. > * A RoundRobinPool, which represents a pool that stores its resources in > an ArrayList. It load-balances access to its resources by returning a > different resource every time a given key is looked up. > To control the type and size of the connection pools, we give the user a > couple of parameters (viz. "hbase.client.ipc.pool.type" and > "hbase.client.ipc.pool.size"). In case the size of the pool is set to a > non-zero positive number, that is used to cap the number of resources that a > pool may contain for any given key. A size of Integer#MAX_VALUE is > interpreted to mean an unbounded pool. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3485) implement a Compressor based on LZF
implement a Compressor based on LZF --- Key: HBASE-3485 URL: https://issues.apache.org/jira/browse/HBASE-3485 Project: HBase Issue Type: Bug Reporter: ryan rawson this library: https://github.com/ning/compress implements LZF in pure-java and has an appropriate license. We could consider shipping with this support enabled and provide a ready to use alternative to LZO. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3486) investigate and fix why HBASE-3481 didn't trigger unit tests
investigate and fix why HBASE-3481 didn't trigger unit tests Key: HBASE-3486 URL: https://issues.apache.org/jira/browse/HBASE-3486 Project: HBase Issue Type: Bug Reporter: ryan rawson we have unit tests that cover durability, but they did not detect the problem outlined in HBASE-3481. We should fix those tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3481) max seq id in flushed file can be larger than its correct value causing data loss during recovery
[ https://issues.apache.org/jira/browse/HBASE-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987768#action_12987768 ] ryan rawson commented on HBASE-3481: that is a good point I opened HBASE-3486 to do that. We already have durability unit tests, they should be fixed to fail w/o this patch. > max seq id in flushed file can be larger than its correct value causing data > loss during recovery > - > > Key: HBASE-3481 > URL: https://issues.apache.org/jira/browse/HBASE-3481 > Project: HBase > Issue Type: Bug >Reporter: Kannan Muthukkaruppan >Assignee: ryan rawson >Priority: Blocker > Fix For: 0.90.1, 0.92.0 > > Attachments: HBASE-3481.txt > > > [While doing some cluster kill tests, I noticed some missing data after log > recovery. Upon investigating further, and pretty printing contents of HFiles > and recovered logs, this is my analysis of the situation/bug. Please confirm > the theory and pitch in with suggestions.] > When memstores are flushed, the max sequence id recorded in the HFile should > be the max sequence id of all KVs in the memstore. However, we seem to simply > obtain the current sequence id from the HRegion, and stamp the HFile's > MAX_SEQ_ID with it. > From HRegion.java: > {code} > sequenceId = (wal == null)? myseqid: wal.startCacheFlush(); > {code} > where, startCacheFlush() is: > {code} > public long startCacheFlush() { > this.cacheFlushLock.lock(); > return obtainSeqNum(); > } > {code} > where, obtainSeqNum() is simply: > {code} > private long obtainSeqNum() { > return this.logSeqNum.incrementAndGet(); > } > {code} > So let's say a memstore contains edits with sequence number 1..10. > Meanwhile, say more Puts come along, and are going through this flow (in > pseudo-code) > {code} > 1. HLog.append(); >1.1 obtainSeqNum() >1.2 writeToWAL() > 2 updateMemStore() > {code} > So it is possible that the sequence number has already been incremented to > say 15 if there are 5 more outstanding puts. Say the writeToWAL() is still in > progress for these puts. In this case, none of these edits (11..15) would > have been written to memstore yet. > At this point if a cache flush of the memstore happens, then we'll record its > MAX_SEQ_ID as 16 in the store file instead of 10 (because that's what > obtainSeqNum() would return as the next sequence number to use, right?). > Assume that the edits 11..15 eventually complete. And so HLogs do contain the > data for edits 11..15. > Now, at this point if the region server were to crash, and we run log > recovery, the splits all go through correctly, and a correct recovered.edits > file is generated with the edits 11..15. > Next, when the region is opened, the HRegion notes that one of the store file > says MAX_SEQ_ID is 16. So, when it replays the recovered.edits file, it > skips replaying edits 11..15. Or in other words, data loss. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3487) regionservers w/o a master give up after a while but does so in a silent way that leaves the process hanging in a ugly way
regionservers w/o a master give up after a while but does so in a silent way that leaves the process hanging in a ugly way -- Key: HBASE-3487 URL: https://issues.apache.org/jira/browse/HBASE-3487 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: ryan rawson while testing I was having problems with my master aborting early on, which causes trouble with the regionservers... they are SUPPOSED to wait forever for the master to come up, but they eventually 'give up' without saying anything helpful. For example this was in the log: 2011-01-27 17:27:25,912 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry 2011-01-27 17:27:28,912 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry 2011-01-27 17:27:31,912 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry 2011-01-27 17:27:34,912 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry 2011-01-27 17:27:37,913 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: No master found, will retry 2011-01-27 17:28:37,593 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.26 MB, free=393.42 MB, max=396.68 MB, blocks=1, accesses=69, hits=64, hitRatio=92.75%%, cachingAccesses=65, cachingHits=64, cachingHitsRatio=98.46%%, evictions=0, evicted=0, evictedPerRun=NaN then nothing else. It had been well over 3 minutes at this point. jstacking the process shows lots of threads running, but the process is effectively dead and only kill -9 will get rid of it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3382) Make HBase client work better under concurrent clients
[ https://issues.apache.org/jira/browse/HBASE-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987883#action_12987883 ] ryan rawson commented on HBASE-3382: Doing more testing with HBASE-2939 I ran some tests using YCSB, it was very confusing at first because I wasnt getting the performance boost I was hoping for. So with a configuration that does scan only load, I am seeing a base line performance of about 50-60ms for 1 thread. Upping this to 10 threads the performance gets much worse, up to 400ms or so. Doing some custom tracing in our client code it revealed that the source of the slowness was waiting for other responses to be streamed to the client. That is thread1 asks for a big fat reply, but it takes 100ms to read off the wire, thread2 which did a little-itty-bitty request (close scanner for example), must wait or that 100ms thus being unnecessarily slowed down. So I tried this patch with ThreadLocal, and while I see improvement I am not seeing enough improvement, with lines like this: [SCAN], AverageLatency(ms), 363.44 [SCAN], AverageLatency(ms), 448.31 [SCAN], AverageLatency(ms), 426.53 The data size is small enough and fully cached, and I added logging that verifies that we are CREATING multiple connections (1 per thread it seems). But the "call_wait" profile time (the time spent between sending the request and when the connection code starts to receive our response) is pretty high, in previous tests I saw something like this; cat new_1thr.txt | perl -ne 'if(/call_wait\) split: (\d+?) /) { print $1/100, "\n";}' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 3.86964071428571 cat new_10thr.txt | perl -ne 'if(/call_wait\) split: (\d+?) /) { print $1/100, "\n";}' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 56.1530285016722 As you can see going from an average wait time of 3ms to an average wait time of 56ms is pretty hurting! But using the work to add ThreadLocal connections I did not get as much boost as I hoped for, instead I saw call_wait time like: cat 10_thr.txt| perl -ne 'if(/call_wait\) split: (\d+?) /) { print $1/100, "\n";}' | perl -ne '$sum += $_; $count++; END {print $sum/$count, "\n"}' 19.9225164798658 while 19ms < 56ms that is still a lot of ms of wait. At this point we might be also seeing server side slowness. I think the next step is to extend the NanoProfiler code into the server side so we can have extensive tracing between both the server and client. This result suggests we are seeing server-side slowness under concurrency, which is reasonable, but I wasn't seeing in previous profiler runs, but a lot of performance code has been committed in the mean time. > Make HBase client work better under concurrent clients > -- > > Key: HBASE-3382 > URL: https://issues.apache.org/jira/browse/HBASE-3382 > Project: HBase > Issue Type: Bug > Components: performance >Reporter: ryan rawson >Assignee: ryan rawson > Attachments: HBASE-3382-nio.txt, HBASE-3382.txt > > > The HBase client uses 1 socket per regionserver for communication. This is > good for socket control but potentially bad for latency. How bad? I did a > simple YCSB test that had this config: > readproportion=0 > updateproportion=0 > scanproportion=1 > insertproportion=0 > fieldlength=10 > fieldcount=100 > requestdistribution=zipfian > scanlength=300 > scanlengthdistribution=zipfian > I ran this with 1 and 10 threads. The summary is as so: > 1 thread: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 35.871 > 10 threads: > [SCAN] Operations 1000 > [SCAN] AverageLatency(ms) 228.576 > We are taking a 6.5x latency hit in our client. But why? > First step was to move the deserialization out of the Connection thread, this > seemed like it could have a big win, an analog change on the server side got > a 20% performance improvement (already commited as HBASE-2941). I did this > and got about a 20% improvement again, with that 228ms number going to about > 190 ms. > So I then wrote a high performance nanosecond resolution tracing utility. > Clients can flag an API call, and we get tracing and numbers through the > client pipeline. What I found is that a lot of time is being spent in > receiving the response from the network. The code block is like so: > NanoProfiler.split(id, "receiveResponse"); > if (LOG.isDebugEnabled()) > LOG.debug(getName() + " got value #" + id); > Call call = calls.get(id); > size -= 4; // 4 byte off for id because we already read it. > ByteBuffer buf = ByteBuffer.allocate(size); > IOUtils.readFully(in, buf.array(), buf.arrayOffset(), size); > bu
[jira] Created: (HBASE-3494) checkAndPut implementation doesnt verify row param and writable row are the same
checkAndPut implementation doesnt verify row param and writable row are the same Key: HBASE-3494 URL: https://issues.apache.org/jira/browse/HBASE-3494 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: ryan rawson the API checkAndPut, and on the server side checkAndMutate doesn't enforce that the row in the API call and the row in the passed writable that should be executed if the check passes, are the same row! Looking at the code, if someone were to 'fool' us, we'd probably end up with rows in the wrong region in the worst case. Or we'd end up with non-locked puts/deletes to different rows since the checkAndMutate grabs the row lock and calls put/delete methods that do not grab row locks. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3494) checkAndPut implementation doesnt verify row param and writable row are the same
[ https://issues.apache.org/jira/browse/HBASE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3494: --- Attachment: HBASE-3494.txt > checkAndPut implementation doesnt verify row param and writable row are the > same > > > Key: HBASE-3494 > URL: https://issues.apache.org/jira/browse/HBASE-3494 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: ryan rawson > Attachments: HBASE-3494.txt > > > the API checkAndPut, and on the server side checkAndMutate doesn't enforce > that the row in the API call and the row in the passed writable that should > be executed if the check passes, are the same row! Looking at the code, if > someone were to 'fool' us, we'd probably end up with rows in the wrong region > in the worst case. Or we'd end up with non-locked puts/deletes to different > rows since the checkAndMutate grabs the row lock and calls put/delete methods > that do not grab row locks. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3494) checkAndPut implementation doesnt verify row param and writable row are the same
[ https://issues.apache.org/jira/browse/HBASE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3494: --- Fix Version/s: 0.90.1 Assignee: ryan rawson Status: Patch Available (was: Open) fix w/test > checkAndPut implementation doesnt verify row param and writable row are the > same > > > Key: HBASE-3494 > URL: https://issues.apache.org/jira/browse/HBASE-3494 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.90.1 > > Attachments: HBASE-3494.txt > > > the API checkAndPut, and on the server side checkAndMutate doesn't enforce > that the row in the API call and the row in the passed writable that should > be executed if the check passes, are the same row! Looking at the code, if > someone were to 'fool' us, we'd probably end up with rows in the wrong region > in the worst case. Or we'd end up with non-locked puts/deletes to different > rows since the checkAndMutate grabs the row lock and calls put/delete methods > that do not grab row locks. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3455) Heap fragmentation in region server
[ https://issues.apache.org/jira/browse/HBASE-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988958#comment-12988958 ] ryan rawson commented on HBASE-3455: just had a look at the patch, it is looking good. nice minimal touch point w/memstore, and I am also happy to note the chunks are being dereferenced during rollover. The other CAS logic seems to make sense as well. > Heap fragmentation in region server > --- > > Key: HBASE-3455 > URL: https://issues.apache.org/jira/browse/HBASE-3455 > Project: HBase > Issue Type: Brainstorming > Components: performance, regionserver >Affects Versions: 0.90.1 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.90.1 > > Attachments: HBasefragmentation.pdf, collapse-arrays.patch, > icv-frag.png, mslab-1.txt, mslab-2.txt, mslab-3.txt, mslab-4.txt, > parse-fls-statistics.py, with-kvallocs.png > > > Stop-the-world GC pauses have long been a problem in HBase. "Concurrent mode > failures" can usually be tuned around by setting the initiating occupancy > fraction low, but eventually the heap becomes fragmented and a promotion > failure occurs. > This JIRA is to do research/experiments about the heap fragmentation issue > and possible solutions. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989399#comment-12989399 ] ryan rawson commented on HBASE-2856: Don't we already have this? The comparator uses the max_seq_id to break ties between KVs... The primary issue is that we need to know which KVs are 'committed' and which are still being created in progress. Right now we have a problem whereby the scanner stack gets a little wonky about how it handles partial next()s. By moving the memstoreTS pruning up to the HRegion scanner level, and working on entire rows at a time, this might mitigate most of the problem actually. This might get ugly with family only flushes, since in theory you might end up with a row that is not completely written but is in memstore & hfile at the same time. Given that the scope of a RWCC "transaction" is only memstore insert, I'm not sure how that would happen. It's possible we could prevent it from becoming a problem with judicious use of the updateLock in HRegion though. For example, by grabbing the updateLock.writeLock().lock() during the switch over, or the flush, we could ensure that all the pending writes are now complete, then do the switch out, then we'd never have a situation where a half committed write is in memstore & hfile at the same time. > TestAcidGuarantee broken on trunk > -- > > Key: HBASE-2856 > URL: https://issues.apache.org/jira/browse/HBASE-2856 > Project: HBase > Issue Type: Bug >Affects Versions: 0.89.20100621 >Reporter: ryan rawson >Assignee: stack >Priority: Blocker > Fix For: 0.92.0 > > Attachments: 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, > acid.txt > > > TestAcidGuarantee has a test whereby it attempts to read a number of columns > from a row, and every so often the first column of N is different, when it > should be the same. This is a bug deep inside the scanner whereby the first > peek() of a row is done at time T then the rest of the read is done at T+1 > after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' > data becomes committed and flushed to disk. > One possible solution is to introduce the memstoreTS (or similarly equivalent > value) to the HFile thus allowing us to preserve read consistency past > flushes. Another solution involves fixing the scanners so that peek() is not > destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3498) Memstore scanner needs new semantics, which may require new data structure
Memstore scanner needs new semantics, which may require new data structure -- Key: HBASE-3498 URL: https://issues.apache.org/jira/browse/HBASE-3498 Project: HBase Issue Type: Sub-task Reporter: ryan rawson Assignee: ryan rawson Fix For: 0.92.0 We may need a new memstore datastructure. Much has been written about the concurrency and speed and cpu usage, but there are new things that were brought to light with HBASE-2856. Specifically we need a memstore scanner that serves up to the moment reads, with a row-level completeness. Specifically after a memstore scanner goes past the end of a row, it should return some kind of 'end of row' token which the StoreScanner should trigger on to know it's at the end of the row. The next call to memstore scanner.next() should return the _very next available row from the start of that row_ at _the time it's requested_. It should specifically NOT: - return everything but the first column - skip a row that was inserted _after_ the previous next() was completed -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3498) Memstore scanner needs new semantics, which may require new data structure
[ https://issues.apache.org/jira/browse/HBASE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989475#comment-12989475 ] ryan rawson commented on HBASE-3498: The current memstore does not exhibit the require behavior because it is just a flat sequence of KVs, with no understanding of 'row' or 'between a row'. This means as you are iterating through, the StoreScanner will find the 'next' row and will halt scanning for that row, leaving the memstore scanner pointing to the next row. Now other writing threads have a chance to insert KVs between the current KV and the 'previous' KV which was just scanned out. This has 2 major problems: - we could miss a row, and not be as live as we want. This seems kind of academic, but can have real consequences in the META scanner in the master. - we might miss new columns, or version of columns, on the current row. Thus when we read the next row we MISS data returning a partial read to the StoreScanner that can only be fixed by pruning back how 'current' we are in terms of which KV.memstoreTS to include. This has different problems for HBASE-2856. > Memstore scanner needs new semantics, which may require new data structure > -- > > Key: HBASE-3498 > URL: https://issues.apache.org/jira/browse/HBASE-3498 > Project: HBase > Issue Type: Sub-task >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.92.0 > > > We may need a new memstore datastructure. Much has been written about the > concurrency and speed and cpu usage, but there are new things that were > brought to light with HBASE-2856. > Specifically we need a memstore scanner that serves up to the moment reads, > with a row-level completeness. Specifically after a memstore scanner goes > past the end of a row, it should return some kind of 'end of row' token which > the StoreScanner should trigger on to know it's at the end of the row. The > next call to memstore scanner.next() should return the _very next available > row from the start of that row_ at _the time it's requested_. > It should specifically NOT: > - return everything but the first column > - skip a row that was inserted _after_ the previous next() was completed -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3498) Memstore scanner needs new semantics, which may require new data structure
[ https://issues.apache.org/jira/browse/HBASE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989478#comment-12989478 ] ryan rawson commented on HBASE-3498: I came up with 2 solutions to this problem: Create Sentential KV.firstOnRow and KV.lastOnRow values that the memstore scanner can 'hang out' at between rows. They will ensure that edits will end up before the scanner iterator point. It might seem only the lastOnRow is required, but the firstOnRow is also required because without it after a reseek we will end up pointing to the 'first' real KV of a row, thus causing a problem when someone else inserts more KVs before us. Another solution is to change our data structure to be more like: CSLM> The memstore scanner would know when it got to the 'end' of a row, and return a sentinel value to the StoreScanner, then it would allow the next next() call to push us into the next row, and only iterate there WHEN we need to, thus never 'missing' a row. Also we could control the iteration to ensure we dont actually start "in" to the next row until we really need to. We might be able to use a different kind of data structure for the SortedSet, giving us benefits there. > Memstore scanner needs new semantics, which may require new data structure > -- > > Key: HBASE-3498 > URL: https://issues.apache.org/jira/browse/HBASE-3498 > Project: HBase > Issue Type: Sub-task >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.92.0 > > > We may need a new memstore datastructure. Much has been written about the > concurrency and speed and cpu usage, but there are new things that were > brought to light with HBASE-2856. > Specifically we need a memstore scanner that serves up to the moment reads, > with a row-level completeness. Specifically after a memstore scanner goes > past the end of a row, it should return some kind of 'end of row' token which > the StoreScanner should trigger on to know it's at the end of the row. The > next call to memstore scanner.next() should return the _very next available > row from the start of that row_ at _the time it's requested_. > It should specifically NOT: > - return everything but the first column > - skip a row that was inserted _after_ the previous next() was completed -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3416) For intra-row scanning, the update readers notification resets the query matcher and can lead to incorrect behavior
[ https://issues.apache.org/jira/browse/HBASE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989544#comment-12989544 ] ryan rawson commented on HBASE-3416: this looks good. if tests demonstrate bug then big +1 on commit > For intra-row scanning, the update readers notification resets the query > matcher and can lead to incorrect behavior > --- > > Key: HBASE-3416 > URL: https://issues.apache.org/jira/browse/HBASE-3416 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.90.0 >Reporter: Jonathan Gray > Fix For: 0.90.1 > > Attachments: HBASE-3416.patch > > > In {{StoreScanner.resetScannerStack()}}, which is called on the first > {{next()}} call after readers have been updated, we do a query matcher reset. > Normally this is not an issue because the query matcher does not need to > maintain state between rows. However, if doing intra-row scanning w/ the > specified limit, we could have the query matcher reset in the middle of > reading a row. This could lead to incorrect behavior (too many versions > coming back, etc). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3494) checkAndPut implementation doesnt verify row param and writable row are the same
[ https://issues.apache.org/jira/browse/HBASE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3494: --- Resolution: Fixed Status: Resolved (was: Patch Available) > checkAndPut implementation doesnt verify row param and writable row are the > same > > > Key: HBASE-3494 > URL: https://issues.apache.org/jira/browse/HBASE-3494 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.90.1 > > Attachments: HBASE-3494.txt > > > the API checkAndPut, and on the server side checkAndMutate doesn't enforce > that the row in the API call and the row in the passed writable that should > be executed if the check passes, are the same row! Looking at the code, if > someone were to 'fool' us, we'd probably end up with rows in the wrong region > in the worst case. Or we'd end up with non-locked puts/deletes to different > rows since the checkAndMutate grabs the row lock and calls put/delete methods > that do not grab row locks. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3416) For intra-row scanning, the update readers notification resets the query matcher and can lead to incorrect behavior
[ https://issues.apache.org/jira/browse/HBASE-3416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989728#comment-12989728 ] ryan rawson commented on HBASE-3416: That looks great to me. Other tests have done worse :) > For intra-row scanning, the update readers notification resets the query > matcher and can lead to incorrect behavior > --- > > Key: HBASE-3416 > URL: https://issues.apache.org/jira/browse/HBASE-3416 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.90.0 >Reporter: Jonathan Gray > Fix For: 0.90.1 > > Attachments: HBASE-3416.patch > > > In {{StoreScanner.resetScannerStack()}}, which is called on the first > {{next()}} call after readers have been updated, we do a query matcher reset. > Normally this is not an issue because the query matcher does not need to > maintain state between rows. However, if doing intra-row scanning w/ the > specified limit, we could have the query matcher reset in the middle of > reading a row. This could lead to incorrect behavior (too many versions > coming back, etc). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3498) Memstore scanner needs new semantics, which may require new data structure
[ https://issues.apache.org/jira/browse/HBASE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989847#comment-12989847 ] ryan rawson commented on HBASE-3498: one of the problems with the memstore scanner is peeking is _destructive_. Meaning when you 'peek' you are returning what the internal iterator is pointing to and when you call next() you return THAT and then move the iterator forward. Meaning the iterator pointer is always pointing to the peek()able value, rather than the 'current' value. If the memstore scanner was not destructive, we'd have a situation when the scanner stack called peek() we'd be looking at the NEXT row, but the iterator would be pointing to whatever is 'now'. There is one problem with this, and that is the KeyValueHeap doesn't like it when peek() changes. Since the value of MemStoreScanner.peek() would change depending on if someone else inserted rows, this causes major problems in KVH, in now we get stuff out of order. > Memstore scanner needs new semantics, which may require new data structure > -- > > Key: HBASE-3498 > URL: https://issues.apache.org/jira/browse/HBASE-3498 > Project: HBase > Issue Type: Sub-task >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.92.0 > > > We may need a new memstore datastructure. Much has been written about the > concurrency and speed and cpu usage, but there are new things that were > brought to light with HBASE-2856. > Specifically we need a memstore scanner that serves up to the moment reads, > with a row-level completeness. Specifically after a memstore scanner goes > past the end of a row, it should return some kind of 'end of row' token which > the StoreScanner should trigger on to know it's at the end of the row. The > next call to memstore scanner.next() should return the _very next available > row from the start of that row_ at _the time it's requested_. > It should specifically NOT: > - return everything but the first column > - skip a row that was inserted _after_ the previous next() was completed -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3498) Memstore scanner needs new semantics, which may require new data structure
[ https://issues.apache.org/jira/browse/HBASE-3498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989859#comment-12989859 ] ryan rawson commented on HBASE-3498: After talking to stack about the unstable peek(), this might be ok. Consider, we have a scanner at the 'current' position that has a peek()ed value of 'C'. Now later on when we peek() again we get 'B' or some other value < 'C' (but larger than the previously "gotten" values). This is still the 'current' heap in the KeyValueHeap, and only during the 'next' would we recheck the priority queue. At this point a different scanner might become the 'current' (aka top) scanner, but it could NOT have previously been the current scanner, because it would have had to have a value < 'C', which it did not. This only works if only 1 KeyValueScanner has an unstable 'peek' and only if it is unstable in 1 direction (gets smaller than) only. > Memstore scanner needs new semantics, which may require new data structure > -- > > Key: HBASE-3498 > URL: https://issues.apache.org/jira/browse/HBASE-3498 > Project: HBase > Issue Type: Sub-task >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.92.0 > > > We may need a new memstore datastructure. Much has been written about the > concurrency and speed and cpu usage, but there are new things that were > brought to light with HBASE-2856. > Specifically we need a memstore scanner that serves up to the moment reads, > with a row-level completeness. Specifically after a memstore scanner goes > past the end of a row, it should return some kind of 'end of row' token which > the StoreScanner should trigger on to know it's at the end of the row. The > next call to memstore scanner.next() should return the _very next available > row from the start of that row_ at _the time it's requested_. > It should specifically NOT: > - return everything but the first column > - skip a row that was inserted _after_ the previous next() was completed -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-2856) TestAcidGuarantee broken on trunk
[ https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990261#comment-12990261 ] ryan rawson commented on HBASE-2856: i dont think bulk load is an issue, since we only call update readers between rows (once we move the update readers to the HRegion.Scanner level), then it will be an atomic 'appearance' of data. Does that sound right? > TestAcidGuarantee broken on trunk > -- > > Key: HBASE-2856 > URL: https://issues.apache.org/jira/browse/HBASE-2856 > Project: HBase > Issue Type: Bug >Affects Versions: 0.89.20100621 >Reporter: ryan rawson >Assignee: stack >Priority: Blocker > Fix For: 0.92.0 > > Attachments: 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 2856-v5.txt, > acid.txt > > > TestAcidGuarantee has a test whereby it attempts to read a number of columns > from a row, and every so often the first column of N is different, when it > should be the same. This is a bug deep inside the scanner whereby the first > peek() of a row is done at time T then the rest of the read is done at T+1 > after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' > data becomes committed and flushed to disk. > One possible solution is to introduce the memstoreTS (or similarly equivalent > value) to the HFile thus allowing us to preserve read consistency past > flushes. Another solution involves fixing the scanners so that peek() is not > destructive (and thus might return different things at different times alas). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3504) HLog performance improvement
[ https://issues.apache.org/jira/browse/HBASE-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990472#comment-12990472 ] ryan rawson commented on HBASE-3504: But doesn't each appending thread need to atomically do some work? A read write lock would not maintain that atomic semantics? https://issues.apache.org/jira/browse/HBASE-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990470#comment-12990470] the lock in Write mode while the HLog.append() calls could acquire it in Read mode. logSyncer thread invokes HDFS.sync on a writer that has already been closed by the log-roller.This could happen because the logSyncer thread does not acquire the updateLock while invoking HDFS sync on the writer. to the HDFS log file. This is a scalability bottleneck for a workload that comprises mostly of counter-increments. > HLog performance improvement > > > Key: HBASE-3504 > URL: https://issues.apache.org/jira/browse/HBASE-3504 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: dhruba borthakur >Assignee: dhruba borthakur > > The HLog.updateLock protects the rolling of logs with concurrent writes to > the HDFS log file. This is a scalability bottleneck for a workload that > comprises mostly of counter-increments. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991217#comment-12991217 ] ryan rawson commented on HBASE-3431: I'll have a look monday > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3499) Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE
[ https://issues.apache.org/jira/browse/HBASE-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991216#comment-12991216 ] ryan rawson commented on HBASE-3499: If we reset the size of that every time we start, wouldn't that prevent people from customizing that size then? > Users upgrading to 0.90.0 need to have their .META. table updated with the > right MEMSTORE_SIZE > -- > > Key: HBASE-3499 > URL: https://issues.apache.org/jira/browse/HBASE-3499 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: Jean-Daniel Cryans >Priority: Blocker > Fix For: 0.90.1 > > Attachments: set_meta_memstore_size.rb > > > With Jack Levin, we were able to figure that users that are upgrading from a > 0.20.x era cluster have their .META. schema set with a 16KB MEMSTORE_SIZE. > This was done in order to minimize lost meta rows when append wasn't > available but even if we changed it in HTD, we also have to make sure all > users upgrading to 0.90 have it changed too. > In Jack's case, he ended up with 2143 storefiles in .META. during a cold > start, slowing everything down. He reported a few times in the past that his > .META. was always extremely busy. > We should be able to do it as a one-off thing in HMaster when opening .META. > (an update in place). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3507) requests count per HRegion and rebalance command
[ https://issues.apache.org/jira/browse/HBASE-3507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991609#comment-12991609 ] ryan rawson commented on HBASE-3507: how frequently would you move regions? Moving a highly loaded region under load might be more disruptive than leaving where it is... Using a more sophisticated algorithm that uses a decaying moving average over some period of time seems like that might have a better impact. For example do you move a region because it gets hot for 30 seconds? 1 minute? 5 minutes? 10 minutes even? I'm not sure where the line is, but it seems like the goal should be to move regions for persistent high long term load, not transient spikes. Thoughts? > requests count per HRegion and rebalance command > > > Key: HBASE-3507 > URL: https://issues.apache.org/jira/browse/HBASE-3507 > Project: HBase > Issue Type: Improvement > Components: performance, regionserver >Reporter: Sebastian Bauer >Priority: Trivial > Attachments: hbase-requestsCount-2.patch, hbase-requestsCount.patch > > > Path-1 add another mertic for HRegion to count request made to region. > Path-2 add another command to hbase shell to grab all regions, sort by > requests from Path-1 and move in round-robin algorithm to servers -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3431) Regionserver is not using the name given it by the master; double entry in master listing of servers
[ https://issues.apache.org/jira/browse/HBASE-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991612#comment-12991612 ] ryan rawson commented on HBASE-3431: one thing to consider is a lot of the network code attempts to figure out what is the 'primary ip' then bind to just that IP. would it make sense to bind to * instead? (ie: 0.0.0.0) Why not accept RPCs on all interfaces? If security is a concern, I think SASL and host level firewall controls are a better way to address that, rather than bake it in HBase. That way it won't really "matter" what our IP is, whatever IP the master 'sees' us as could be used as what to stuff in the META. Then we could use the registration name to identify dead hosts, etc, etc. > Regionserver is not using the name given it by the master; double entry in > master listing of servers > > > Key: HBASE-3431 > URL: https://issues.apache.org/jira/browse/HBASE-3431 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: stack >Assignee: stack >Priority: Blocker > Fix For: 0.90.1 > > Attachments: 3431-v2.txt, 3431-v3.txt, 3431-v3.txt, 3431-v4.txt, > 3431.txt > > > Our man Ted Dunning found the following where RS checks in with one name, the > master tells it use another name but we seem to go ahead and continue with > our original name. > In RS logs I see: > {code} > 2011-01-07 15:45:50,757 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer [regionserver60020]: > Master passed us address to use. Was=perfnode11:60020, Now=10.10.30.11:60020 > {code} > On master I see > {code} > 2011-01-07 15:45:38,613 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 0 on 6]: Registering > server=10.10.30.11,60020,1294443935414, regionCount=0, userLoad=false > {code} > > then later > {code} > 2011-01-07 15:45:44,247 INFO org.apache.hadoop.hbase.master.ServerManager > [IPC Server handler 2 on 6]: Registering > server=perfnode11,60020,1294443935414, regionCount=0, userLoad=true > {code} > This might be since we started letting servers register in other than with > the reportStartup. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3513) upgrade thrift to 0.5.0 and use mvn version
upgrade thrift to 0.5.0 and use mvn version --- Key: HBASE-3513 URL: https://issues.apache.org/jira/browse/HBASE-3513 Project: HBase Issue Type: Bug Components: thrift Affects Versions: 0.90.0 Reporter: ryan rawson Assignee: ryan rawson Fix For: 0.90.1 We should upgrade our thrift to 0.5.0, it is the latest and greatest and is in apache maven repo. Doing some testing with a thrift 0.5.0 server, and an older pre-release php client shows the two are on-wire compatible. Given that the upgrade is entirely on the server side, and has no wire-impact this should be a relatively low-impact change. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3513) upgrade thrift to 0.5.0 and use mvn version
[ https://issues.apache.org/jira/browse/HBASE-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3513: --- Attachment: HBASE-3515.txt this patch is a little WIP, it's based on 0.90 with this commit cherry picked in: https://github.com/stumbleupon/hbase/commit/67a5b46d069f0e7660d2de998c7b311103708fb4 > upgrade thrift to 0.5.0 and use mvn version > --- > > Key: HBASE-3513 > URL: https://issues.apache.org/jira/browse/HBASE-3513 > Project: HBase > Issue Type: Bug > Components: thrift >Affects Versions: 0.90.0 >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.90.1 > > Attachments: HBASE-3515.txt > > > We should upgrade our thrift to 0.5.0, it is the latest and greatest and is > in apache maven repo. > Doing some testing with a thrift 0.5.0 server, and an older pre-release php > client shows the two are on-wire compatible. > Given that the upgrade is entirely on the server side, and has no wire-impact > this should be a relatively low-impact change. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3514) Speedup HFile.Writer append
[ https://issues.apache.org/jira/browse/HBASE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991836#comment-12991836 ] ryan rawson commented on HBASE-3514: how does the performance numbers look like on a HDFS cluster? I'm looking at the code, I need to look at it more to see where it's going exactly. As for the meta binary search, I'm wondering how you see performance improvement? We only typically have a maximum of 2-3 meta blocks, and typically more like 0. I'd prefer not to add another (untested) binary search algorithm to our code base. > Speedup HFile.Writer append > --- > > Key: HBASE-3514 > URL: https://issues.apache.org/jira/browse/HBASE-3514 > Project: HBase > Issue Type: Improvement > Components: io >Affects Versions: 0.90.0 >Reporter: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-3514-append.patch, > HBASE-3514-metaBlock-bsearch.patch > > > Remove double writes when block cache is specified, by using, only, the > ByteArrayDataStream. > baos is flushed with the compress stream on finishBlock. > On my machines HFilePerformanceEvaluation SequentialWriteBenchmark passes > from 4000ms to 2500ms. > Running SequentialWriteBenchmark for 100 rows took 4247ms. > Running SequentialWriteBenchmark for 100 rows took 4512ms. > Running SequentialWriteBenchmark for 100 rows took 4498ms. > Running SequentialWriteBenchmark for 100 rows took 2697ms. > Running SequentialWriteBenchmark for 100 rows took 2770ms. > Running SequentialWriteBenchmark for 100 rows took 2721ms. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3465) Hbase should use a HADOOP_HOME environment variable if available.
[ https://issues.apache.org/jira/browse/HBASE-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992248#comment-12992248 ] ryan rawson commented on HBASE-3465: we should consider using jarjar: http://code.google.com/p/jarjar/ That way we would not have conflicting jar issues with our classpath. > Hbase should use a HADOOP_HOME environment variable if available. > - > > Key: HBASE-3465 > URL: https://issues.apache.org/jira/browse/HBASE-3465 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.0 >Reporter: Ted Dunning > Fix For: 0.90.1, 0.92.0 > > > I have been burned a few times lately while developing code by having the > make sure that the hadoop jar in hbase/lib is exactly correct. In my own > deployment, there are actually 3 jars and a native library to keep in sync > that hbase shouldn't have to know about explicitly. A similar problem arises > when using stock hbase with CDH3 because of the security patches changing the > wire protocol. > All of these problems could be avoided by not assuming that the hadoop > library is in the local directory. Moreover, I think it might be possible to > assemble the distribution such that the compile time hadoop dependency is in > a cognate directory to lib and is referenced using a default value for > HADOOP_HOME. > Does anybody have any violent antipathies to such a change? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3514) Speedup HFile.Writer append
[ https://issues.apache.org/jira/browse/HBASE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992252#comment-12992252 ] ryan rawson commented on HBASE-3514: the hfile writer append code does: private void checkBlockBoundary() throws IOException { if (this.out != null && this.out.size() < blocksize) return; finishBlock(); newBlock(); If we did stopped at the first key > than a block we'd also have to check that there is at least 1 KV, otherwise we'd end up stuck :-) Perhaps it might be beneficial to undersize the blocks to the first KV < than block size, rather than the first KV > block size? > Speedup HFile.Writer append > --- > > Key: HBASE-3514 > URL: https://issues.apache.org/jira/browse/HBASE-3514 > Project: HBase > Issue Type: Improvement > Components: io >Affects Versions: 0.90.0 >Reporter: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-3514-append.patch, > HBASE-3514-metaBlock-bsearch.patch > > > Remove double writes when block cache is specified, by using, only, the > ByteArrayDataStream. > baos is flushed with the compress stream on finishBlock. > On my machines HFilePerformanceEvaluation SequentialWriteBenchmark passes > from 4000ms to 2500ms. > Running SequentialWriteBenchmark for 100 rows took 4247ms. > Running SequentialWriteBenchmark for 100 rows took 4512ms. > Running SequentialWriteBenchmark for 100 rows took 4498ms. > Running SequentialWriteBenchmark for 100 rows took 2697ms. > Running SequentialWriteBenchmark for 100 rows took 2770ms. > Running SequentialWriteBenchmark for 100 rows took 2721ms. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3514) Speedup HFile.Writer append
[ https://issues.apache.org/jira/browse/HBASE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992278#comment-12992278 ] ryan rawson commented on HBASE-3514: if you file a jira and post a patch i'll review it. i have some additional thoughts that belong there not here. as for this one, I need to apply the batch and grovel it in my ide, i havent had time today so far. I'm thinking part1 is ok, and the binary search for meta index might not work because im not sure if the meta block names are actually sorted. > Speedup HFile.Writer append > --- > > Key: HBASE-3514 > URL: https://issues.apache.org/jira/browse/HBASE-3514 > Project: HBase > Issue Type: Improvement > Components: io >Affects Versions: 0.90.0 >Reporter: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-3514-append.patch, > HBASE-3514-metaBlock-bsearch.patch > > > Remove double writes when block cache is specified, by using, only, the > ByteArrayDataStream. > baos is flushed with the compress stream on finishBlock. > On my machines HFilePerformanceEvaluation SequentialWriteBenchmark passes > from 4000ms to 2500ms. > Running SequentialWriteBenchmark for 100 rows took 4247ms. > Running SequentialWriteBenchmark for 100 rows took 4512ms. > Running SequentialWriteBenchmark for 100 rows took 4498ms. > Running SequentialWriteBenchmark for 100 rows took 2697ms. > Running SequentialWriteBenchmark for 100 rows took 2770ms. > Running SequentialWriteBenchmark for 100 rows took 2721ms. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3518) shade some of our dependencies to make some client's life easier
shade some of our dependencies to make some client's life easier Key: HBASE-3518 URL: https://issues.apache.org/jira/browse/HBASE-3518 Project: HBase Issue Type: Improvement Components: build Affects Versions: 0.90.0 Reporter: ryan rawson Assignee: ryan rawson Fix For: 0.92.0 Attachments: HBASE-shade-jar.txt Clients who wish to use thrift, protobuf, avro and who include our classpath on their classpath run into incompatibilities, for example my client might depend on protobuf 2.1 but we ship 2.3.0, if there are any incompatible APIs then I won't be able to run my stuff by including HBase's classpath, nor will I be able to use bin/hbase to run my stuff. We can help by using maven shade to include then rename some dependencies into the hbase*.jar itself, thus ensuring that they won't leak out. We could also build an all inclusive JAR that includes ALL our core dependencies, although we probably might want to skip including Hadoop since that is frequently switched out. Then a user would be able to include hbase*.jar and run. This might not play well with the maven build and transitive export thing, we should probably think about it a bit more. My initial list was: - avro - protobuf - thrift -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3518) shade some of our dependencies to make some client's life easier
[ https://issues.apache.org/jira/browse/HBASE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3518: --- Attachment: HBASE-shade-jar.txt initial stab, it seems to work, but needs to have the assembly fixed so we dont copy in these dependencies. Maybe someone with some serious mvn foo can help? > shade some of our dependencies to make some client's life easier > > > Key: HBASE-3518 > URL: https://issues.apache.org/jira/browse/HBASE-3518 > Project: HBase > Issue Type: Improvement > Components: build >Affects Versions: 0.90.0 >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.92.0 > > Attachments: HBASE-shade-jar.txt > > > Clients who wish to use thrift, protobuf, avro and who include our classpath > on their classpath run into incompatibilities, for example my client might > depend on protobuf 2.1 but we ship 2.3.0, if there are any incompatible APIs > then I won't be able to run my stuff by including HBase's classpath, nor will > I be able to use bin/hbase to run my stuff. > We can help by using maven shade to include then rename some dependencies > into the hbase*.jar itself, thus ensuring that they won't leak out. We could > also build an all inclusive JAR that includes ALL our core dependencies, > although we probably might want to skip including Hadoop since that is > frequently switched out. Then a user would be able to include hbase*.jar and > run. > This might not play well with the maven build and transitive export thing, we > should probably think about it a bit more. > My initial list was: > - avro > - protobuf > - thrift -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3522) Unbundle our RPC versioning; rather than a global for all 4 Interfaces -- region, master, region to master, and coprocesssors -- instead version each individually
[ https://issues.apache.org/jira/browse/HBASE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993191#comment-12993191 ] ryan rawson commented on HBASE-3522: this was previously required because the interfaces members from all our RPC interfaces were sorted then sequentially assigned IDs which were used in place of strings. Now that we send strings for the method names instead of ids, this jira is now possible. > Unbundle our RPC versioning; rather than a global for all 4 Interfaces -- > region, master, region to master, and coprocesssors -- instead version each > individually > -- > > Key: HBASE-3522 > URL: https://issues.apache.org/jira/browse/HBASE-3522 > Project: HBase > Issue Type: Improvement >Reporter: stack > > We'd undo the global RPC version so a change in CP Interface or a change in > the 'private' regionserver to master Interface would not break clients who do > not use CPs or who don't care about the private regionserver to master > protocol. > Benoît suggested this. I want it because I want to get rid of heartbeating > so will want to change the regionserver to master Interface. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3522) Unbundle our RPC versioning; rather than a global for all 4 Interfaces -- region, master, region to master, and coprocesssors -- instead version each individually
[ https://issues.apache.org/jira/browse/HBASE-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993199#comment-12993199 ] ryan rawson commented on HBASE-3522: +1, commit that sucka! > Unbundle our RPC versioning; rather than a global for all 4 Interfaces -- > region, master, region to master, and coprocesssors -- instead version each > individually > -- > > Key: HBASE-3522 > URL: https://issues.apache.org/jira/browse/HBASE-3522 > Project: HBase > Issue Type: Improvement >Reporter: stack > > We'd undo the global RPC version so a change in CP Interface or a change in > the 'private' regionserver to master Interface would not break clients who do > not use CPs or who don't care about the private regionserver to master > protocol. > Benoît suggested this. I want it because I want to get rid of heartbeating > so will want to change the regionserver to master Interface. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3523) Rewrite our client
[ https://issues.apache.org/jira/browse/HBASE-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993276#comment-12993276 ] ryan rawson commented on HBASE-3523: Things that are issues: - the use of proxy means that the interfaces _must have_ InterruptedException on the interface, or else you get "undeclared throwable exception", but now you are conflating a business contract (the interfaces) and networking/execution realities. Futhermore going through a proxy object isn't necessary, its just more layers, since few people directly code against the interfaces. - multiple level of timeouts causes unnecessary confusion. Also the retry loops in HCM cause confusion and issues. - client should support parallelism more directly, no more thread pools that just sleep! - lots of callables make the code harder to read, either get rid of them or use more inner classes. Jumping around files makes for difficult comprehension. Some good things: - the base socket handling is actually in good shape. 1 socket per client-rs pair is about where we want to be. - multiplexing requests on the same socket is good, not spawning extra threads server side just to handle more clients is also good. since every client will have an open socket to at least the META region, this is very important! - the handler pool is a natural side effect of the previous point, unbounding it might not be a good idea. Other constraints: - we will want to provide an efficient blocking API, it's what is expected. - an async api might be nice, perhaps it can layer on or something. - Making HTable thread agnostic might be useful. Pooling the write buffer or doing something else interesting there would be necessary. > Rewrite our client > -- > > Key: HBASE-3523 > URL: https://issues.apache.org/jira/browse/HBASE-3523 > Project: HBase > Issue Type: Brainstorming >Reporter: stack > > Is it just me or do others sense that there is pressure building to redo the > client? If just me, ignore the below... I'll just keep notes in here. > Otherwise, what would the requirements for a client rewrite look like? > + Let out InterruptedException > + Enveloping of messages or space for metadata that can be passed by client > to server and by server to client; e.g. the region a.b.c moved to server > x.y.z. or scanner is finished or timeout > + A different RPC? One with tighter serialization. > + More sane timeout/retry policy. > Does it have to support async communication? Do callbacks? > What else? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3524) NPE from CompactionChecker
[ https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993321#comment-12993321 ] ryan rawson commented on HBASE-3524: Old files causing new code to break it seems. Good job tracking it down! > NPE from CompactionChecker > -- > > Key: HBASE-3524 > URL: https://issues.apache.org/jira/browse/HBASE-3524 > Project: HBase > Issue Type: Bug >Reporter: James Kennedy > Fix For: 0.90.2 > > > I recently updated production data to use HBase 0.90.0. > Now I'm periodically seeing: > [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR > nServer$MajorCompactionChecker - Caught exception > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832) > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810) > at > org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > The only negative effect is that this is interrupting compactions from > happening. But that is pretty serious and this might be a sign of data > corruption? > Maybe it's just my data, but this task should at least involve improving the > handling to catch the NPE and still iterate through the other onlineRegions > that might compact without error. The MajorCompactionChecker.chore() method > only catches IOExceptions and so this NPE breaks out of that loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3524) NPE from CompactionChecker
[ https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993327#comment-12993327 ] ryan rawson commented on HBASE-3524: the issue is that if the hfile does not have timerangeBytes, this code doesn't trigger: (StoreFile.java) if (timerangeBytes != null) { this.reader.timeRangeTracker = new TimeRangeTracker(); Writables.copyWritable(timerangeBytes, this.reader.timeRangeTracker); } And timeRangeTracker remains null. But this code doesnt check for null: (Store.java) 832long oldest = now - sf.getReader().timeRangeTracker.minimumTimestamp; if timeRangeTracker is null, we should probably use Integer.MIN_VALUE for minimumTimestamp. What is the creation time of your empty file? When is it from? Maybe it's old? > NPE from CompactionChecker > -- > > Key: HBASE-3524 > URL: https://issues.apache.org/jira/browse/HBASE-3524 > Project: HBase > Issue Type: Bug >Reporter: James Kennedy > Fix For: 0.90.2 > > > I recently updated production data to use HBase 0.90.0. > Now I'm periodically seeing: > [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR > nServer$MajorCompactionChecker - Caught exception > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832) > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810) > at > org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > The only negative effect is that this is interrupting compactions from > happening. But that is pretty serious and this might be a sign of data > corruption? > Maybe it's just my data, but this task should at least involve improving the > handling to catch the NPE and still iterate through the other onlineRegions > that might compact without error. The MajorCompactionChecker.chore() method > only catches IOExceptions and so this NPE breaks out of that loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3524) NPE from CompactionChecker
[ https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993328#comment-12993328 ] ryan rawson commented on HBASE-3524: try this patch: diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java b/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java index d7e3ce3..519111a 100644 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java @@ -829,7 +829,10 @@ public class Store implements HeapSize { if (filesToCompact.size() == 1) { // Single file StoreFile sf = filesToCompact.get(0); -long oldest = now - sf.getReader().timeRangeTracker.minimumTimestamp; +long oldest = +(sf.getReader().timeRangeTracker == null) ? +Long.MIN_VALUE : +now - sf.getReader().timeRangeTracker.minimumTimestamp; if (sf.isMajorCompaction() && (this.ttl == HConstants.FOREVER || oldest < this.ttl)) { if (LOG.isDebugEnabled()) { no test yet! doh! > NPE from CompactionChecker > -- > > Key: HBASE-3524 > URL: https://issues.apache.org/jira/browse/HBASE-3524 > Project: HBase > Issue Type: Bug >Reporter: James Kennedy > Fix For: 0.90.2 > > > I recently updated production data to use HBase 0.90.0. > Now I'm periodically seeing: > [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR > nServer$MajorCompactionChecker - Caught exception > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832) > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810) > at > org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > The only negative effect is that this is interrupting compactions from > happening. But that is pretty serious and this might be a sign of data > corruption? > Maybe it's just my data, but this task should at least involve improving the > handling to catch the NPE and still iterate through the other onlineRegions > that might compact without error. The MajorCompactionChecker.chore() method > only catches IOExceptions and so this NPE breaks out of that loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3524) NPE from CompactionChecker
[ https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993343#comment-12993343 ] ryan rawson commented on HBASE-3524: compaction is "optional", meaning if it fails no data is lost, so you should probably be fine. Older versions of the code did not write out time tracker data and that is why your older files were giving you npes. > NPE from CompactionChecker > -- > > Key: HBASE-3524 > URL: https://issues.apache.org/jira/browse/HBASE-3524 > Project: HBase > Issue Type: Bug >Reporter: James Kennedy > Fix For: 0.90.2 > > > I recently updated production data to use HBase 0.90.0. > Now I'm periodically seeing: > [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR > nServer$MajorCompactionChecker - Caught exception > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832) > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810) > at > org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > The only negative effect is that this is interrupting compactions from > happening. But that is pretty serious and this might be a sign of data > corruption? > Maybe it's just my data, but this task should at least involve improving the > handling to catch the NPE and still iterate through the other onlineRegions > that might compact without error. The MajorCompactionChecker.chore() method > only catches IOExceptions and so this NPE breaks out of that loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3524) NPE from CompactionChecker
[ https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3524: --- Component/s: regionserver Priority: Blocker (was: Major) Affects Version/s: 0.90.0 Fix Version/s: 0.90.1 > NPE from CompactionChecker > -- > > Key: HBASE-3524 > URL: https://issues.apache.org/jira/browse/HBASE-3524 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.90.0 >Reporter: James Kennedy >Priority: Blocker > Fix For: 0.90.1, 0.90.2 > > > I recently updated production data to use HBase 0.90.0. > Now I'm periodically seeing: > [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR > nServer$MajorCompactionChecker - Caught exception > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832) > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810) > at > org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > The only negative effect is that this is interrupting compactions from > happening. But that is pretty serious and this might be a sign of data > corruption? > Maybe it's just my data, but this task should at least involve improving the > handling to catch the NPE and still iterate through the other onlineRegions > that might compact without error. The MajorCompactionChecker.chore() method > only catches IOExceptions and so this NPE breaks out of that loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3526) Tables with TTL should be able to prune memstore w/o flushing
Tables with TTL should be able to prune memstore w/o flushing - Key: HBASE-3526 URL: https://issues.apache.org/jira/browse/HBASE-3526 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.90.0 Reporter: ryan rawson Fix For: 0.90.2 If you have a table with TTL, the memstore will grow until it hits flush size, at which point the flush code will prune the KVs going to hfile. If you have a small TTL, it may not be necessary to flush, since pruning data in memory would ensure that we never grow too big. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3524) NPE from CompactionChecker
[ https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993793#comment-12993793 ] ryan rawson commented on HBASE-3524: the logic is: (this.ttl == HConstants.FOREVER || oldest < this.ttl)) { so if we set oldest to MAX_VALUE then it will _never_ be smaller than this.ttl which is set to 'length of time in ms an edit should live'. Therefore it would be in the range from 1 -> MAX_VALUE, and by setting oldest to MIN_VALUE this branch will ALWAYS be true and prefer to major compact a single file. Make sense? > NPE from CompactionChecker > -- > > Key: HBASE-3524 > URL: https://issues.apache.org/jira/browse/HBASE-3524 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.90.0 >Reporter: James Kennedy >Assignee: ryan rawson >Priority: Blocker > Fix For: 0.90.1, 0.90.2 > > Attachments: 3524.txt > > > I recently updated production data to use HBase 0.90.0. > Now I'm periodically seeing: > [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR > nServer$MajorCompactionChecker - Caught exception > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832) > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810) > at > org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > The only negative effect is that this is interrupting compactions from > happening. But that is pretty serious and this might be a sign of data > corruption? > Maybe it's just my data, but this task should at least involve improving the > handling to catch the NPE and still iterate through the other onlineRegions > that might compact without error. The MajorCompactionChecker.chore() method > only catches IOExceptions and so this NPE breaks out of that loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3524) NPE from CompactionChecker
[ https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993795#comment-12993795 ] ryan rawson commented on HBASE-3524: ttl logic is a little backwards from what one might expect. I need a unit test here and then commit. it will become part of 0.90.1. > NPE from CompactionChecker > -- > > Key: HBASE-3524 > URL: https://issues.apache.org/jira/browse/HBASE-3524 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.90.0 >Reporter: James Kennedy >Assignee: ryan rawson >Priority: Blocker > Fix For: 0.90.1, 0.90.2 > > Attachments: 3524.txt > > > I recently updated production data to use HBase 0.90.0. > Now I'm periodically seeing: > [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR > nServer$MajorCompactionChecker - Caught exception > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832) > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810) > at > org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > The only negative effect is that this is interrupting compactions from > happening. But that is pretty serious and this might be a sign of data > corruption? > Maybe it's just my data, but this task should at least involve improving the > handling to catch the NPE and still iterate through the other onlineRegions > that might compact without error. The MajorCompactionChecker.chore() method > only catches IOExceptions and so this NPE breaks out of that loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HBASE-3524) NPE from CompactionChecker
[ https://issues.apache.org/jira/browse/HBASE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson resolved HBASE-3524. Resolution: Fixed fixed in both trunk & branch. > NPE from CompactionChecker > -- > > Key: HBASE-3524 > URL: https://issues.apache.org/jira/browse/HBASE-3524 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.90.0 >Reporter: James Kennedy >Assignee: ryan rawson >Priority: Blocker > Fix For: 0.90.1, 0.90.2 > > Attachments: 3524.txt > > > I recently updated production data to use HBase 0.90.0. > Now I'm periodically seeing: > [10/02/11 17:23:27] 30076066 [mpactionChecker] ERROR > nServer$MajorCompactionChecker - Caught exception > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:832) > at > org.apache.hadoop.hbase.regionserver.Store.isMajorCompaction(Store.java:810) > at > org.apache.hadoop.hbase.regionserver.HRegion.isMajorCompaction(HRegion.java:2800) > at > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker.chore(HRegionServer.java:1047) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > The only negative effect is that this is interrupting compactions from > happening. But that is pretty serious and this might be a sign of data > corruption? > Maybe it's just my data, but this task should at least involve improving the > handling to catch the NPE and still iterate through the other onlineRegions > that might compact without error. The MajorCompactionChecker.chore() method > only catches IOExceptions and so this NPE breaks out of that loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3528) Maven should restrict more exported deps
Maven should restrict more exported deps Key: HBASE-3528 URL: https://issues.apache.org/jira/browse/HBASE-3528 Project: HBase Issue Type: Bug Components: build Reporter: ryan rawson Fix For: 0.92.0 Our maven build exports a lot of deps, and they flow to clients who depend on us. for example we shouldnt export thrift,protobuf,jetty,jruby,tomcat,jersey,guava, etc. Clients should be able to depend on any version of the above libraries or NOT depend on them. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3525) mvn assembly is over-filling the hbase lib dir
[ https://issues.apache.org/jira/browse/HBASE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993848#comment-12993848 ] ryan rawson commented on HBASE-3525: lgtm +1 > mvn assembly is over-filling the hbase lib dir > -- > > Key: HBASE-3525 > URL: https://issues.apache.org/jira/browse/HBASE-3525 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack > Attachments: 3525.txt > > > Here is what our lib dir looks this in 0.90.1: > {code} > -rwxr-xr-x 1 Stack staff62983 Mar 16 2009 activation-1.1.jar > -rwxr-xr-x 1 Stack staff 1034049 May 21 2009 ant-1.6.5.jar > -rwxr-xr-x 1 Stack staff 1323005 Jul 20 2009 ant-1.7.1.jar > -rwxr-xr-x 1 Stack staff12143 Jul 20 2009 ant-launcher-1.7.1.jar > -rwxr-xr-x 1 Stack staff43033 May 5 2009 asm-3.1.jar > -rwxr-xr-x 1 Stack staff 339831 Oct 18 10:05 avro-1.3.3.jar > -rwxr-xr-x 1 Stack staff41123 Dec 8 2009 commons-cli-1.2.jar > -rwxr-xr-x 1 Stack staff58160 Oct 18 10:05 commons-codec-1.4.jar > -rwxr-xr-x 1 Stack staff 112341 Mar 16 2009 commons-el-1.0.jar > -rwxr-xr-x 1 Stack staff 305001 Mar 16 2009 commons-httpclient-3.1.jar > -rwxr-xr-x 1 Stack staff 279193 May 17 2010 commons-lang-2.5.jar > -rwxr-xr-x 1 Stack staff60686 Mar 13 2009 commons-logging-1.1.1.jar > -rwxr-xr-x 1 Stack staff 180792 Mar 4 2010 commons-net-1.4.1.jar > -rwxr-xr-x 1 Stack staff 3566844 Jun 5 2009 core-3.1.1.jar > -rwxr-xr-x 1 Stack staff 936397 Oct 18 10:05 guava-r06.jar > -rwxr-xr-x 1 Stack staff 2707856 Jan 11 13:26 > hadoop-core-0.20-append-r1056497.jar > -rwxr-xr-x 1 Stack staff 2241521 Feb 9 15:57 hbase-0.90.1.jar > -rwxr-xr-x 1 Stack staff 706710 Mar 4 2010 hsqldb-1.8.0.10.jar > -rwxr-xr-x 1 Stack staff 171958 Oct 18 10:05 jackson-core-asl-1.5.5.jar > -rwxr-xr-x 1 Stack staff17065 Oct 18 10:05 jackson-jaxrs-1.5.5.jar > -rwxr-xr-x 1 Stack staff 386509 Oct 18 10:05 jackson-mapper-asl-1.4.2.jar > -rwxr-xr-x 1 Stack staff24745 Oct 18 10:05 jackson-xc-1.5.5.jar > -rwxr-xr-x 1 Stack staff 408133 May 21 2010 jasper-compiler-5.5.23.jar > -rwxr-xr-x 1 Stack staff76844 May 17 2010 jasper-runtime-5.5.23.jar > -rwxr-xr-x 1 Stack staff 103515 May 6 2009 jaxb-api-2.1.jar > -rwxr-xr-x 1 Stack staff 867801 Mar 4 2010 jaxb-impl-2.1.12.jar > -rwxr-xr-x 1 Stack staff 455517 Oct 18 10:05 jersey-core-1.4.jar > -rwxr-xr-x 1 Stack staff 142827 Oct 18 10:05 jersey-json-1.4.jar > -rwxr-xr-x 1 Stack staff 677600 Oct 18 10:05 jersey-server-1.4.jar > -rwxr-xr-x 1 Stack staff 377780 Mar 4 2010 jets3t-0.7.1.jar > -rwxr-xr-x 1 Stack staff67758 May 6 2009 jettison-1.1.jar > -rwxr-xr-x 1 Stack staff 539912 Jan 3 16:51 jetty-6.1.26.jar > -rwxr-xr-x 1 Stack staff 177131 Jan 3 16:51 jetty-util-6.1.26.jar > -rwxr-xr-x 1 Stack staff87325 Jul 20 2009 jline-0.9.94.jar > -rwxr-xr-x 1 Stack staff 4477138 Jan 3 16:51 jruby-complete-1.0.3.jar > -rwxr-xr-x 1 Stack staff 1024680 May 17 2010 jsp-2.1-6.1.14.jar > -rwxr-xr-x 1 Stack staff 134910 May 17 2010 jsp-api-2.1-6.1.14.jar > -rwxr-xr-x 1 Stack staff46367 Mar 4 2010 jsr311-api-1.1.1.jar > -rwxr-xr-x 1 Stack staff 121070 Mar 13 2009 junit-3.8.1.jar > -rwxr-xr-x 1 Stack staff11981 Mar 4 2010 kfs-0.3.jar > -rwxr-xr-x 1 Stack staff 481535 Oct 18 10:05 log4j-1.2.16.jar > -rwxr-xr-x 1 Stack staff65261 Apr 14 2009 oro-2.0.8.jar > -rwxr-xr-x 1 Stack staff29392 Jun 14 2010 paranamer-2.2.jar > -rwxr-xr-x 1 Stack staff 5420 Jun 14 2010 paranamer-ant-2.2.jar > -rwxr-xr-x 1 Stack staff 6931 Jun 14 2010 paranamer-generator-2.2.jar > -rwxr-xr-x 1 Stack staff 328635 Mar 4 2010 protobuf-java-2.3.0.jar > -rwxr-xr-x 1 Stack staff 173236 Jun 14 2010 qdox-1.10.1.jar > drwxr-xr-x 7 Stack staff 238 Feb 8 16:23 ruby > -rwxr-xr-x 1 Stack staff 132368 May 17 2010 servlet-api-2.5-6.1.14.jar > -rwxr-xr-x 1 Stack staff23445 Mar 4 2010 slf4j-api-1.5.8.jar > -rwxr-xr-x 1 Stack staff 9679 Mar 4 2010 slf4j-log4j12-1.5.8.jar > -rwxr-xr-x 1 Stack staff26514 May 6 2009 stax-api-1.0.1.jar > -rwxr-xr-x 1 Stack staff 187530 Mar 4 2010 thrift-0.2.0.jar > -rwxr-xr-x 1 Stack staff15010 Mar 4 2010 xmlenc-0.52.jar > -rwxr-xr-x 1 Stack staff 598364 Dec 10 15:13 zookeeper-3.3.2.jar > {code} > We are picking up bunch of hadoop dependencies. I'd think it harmless other > than the bulk. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3523) Rewrite our client
[ https://issues.apache.org/jira/browse/HBASE-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994019#comment-12994019 ] ryan rawson commented on HBASE-3523: Are not executors thread pools? Right now HCM will use the executor passed to it from HTable to do parallel queries (multi). But there is no good reason to layer more threads on top of the socket/proxy layer. > Rewrite our client > -- > > Key: HBASE-3523 > URL: https://issues.apache.org/jira/browse/HBASE-3523 > Project: HBase > Issue Type: Brainstorming >Reporter: stack > > Is it just me or do others sense that there is pressure building to redo the > client? If just me, ignore the below... I'll just keep notes in here. > Otherwise, what would the requirements for a client rewrite look like? > + Let out InterruptedException > + Enveloping of messages or space for metadata that can be passed by client > to server and by server to client; e.g. the region a.b.c moved to server > x.y.z. or scanner is finished or timeout > + A different RPC? One with tighter serialization. > + More sane timeout/retry policy. > Does it have to support async communication? Do callbacks? > What else? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3523) Rewrite our client
[ https://issues.apache.org/jira/browse/HBASE-3523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994025#comment-12994025 ] ryan rawson commented on HBASE-3523: wouldn't it be better not to need thread pools? Right now we are using them to merely wait on sync APIs that underneath async/multiplexing. A big waste of threads to me...! > Rewrite our client > -- > > Key: HBASE-3523 > URL: https://issues.apache.org/jira/browse/HBASE-3523 > Project: HBase > Issue Type: Brainstorming >Reporter: stack > > Is it just me or do others sense that there is pressure building to redo the > client? If just me, ignore the below... I'll just keep notes in here. > Otherwise, what would the requirements for a client rewrite look like? > + Let out InterruptedException > + Enveloping of messages or space for metadata that can be passed by client > to server and by server to client; e.g. the region a.b.c moved to server > x.y.z. or scanner is finished or timeout > + A different RPC? One with tighter serialization. > + More sane timeout/retry policy. > Does it have to support async communication? Do callbacks? > What else? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3513) upgrade thrift to 0.5.0 and use mvn version
[ https://issues.apache.org/jira/browse/HBASE-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994152#comment-12994152 ] ryan rawson commented on HBASE-3513: thrift 0.5.0 is in the apache maven official repo... there was no 0.6.0 when i went looking. > upgrade thrift to 0.5.0 and use mvn version > --- > > Key: HBASE-3513 > URL: https://issues.apache.org/jira/browse/HBASE-3513 > Project: HBase > Issue Type: Bug > Components: thrift >Affects Versions: 0.90.0 >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.90.2 > > Attachments: HBASE-3515.txt > > > We should upgrade our thrift to 0.5.0, it is the latest and greatest and is > in apache maven repo. > Doing some testing with a thrift 0.5.0 server, and an older pre-release php > client shows the two are on-wire compatible. > Given that the upgrade is entirely on the server side, and has no wire-impact > this should be a relatively low-impact change. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3513) upgrade thrift to 0.5.0 and use mvn version
[ https://issues.apache.org/jira/browse/HBASE-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994567#comment-12994567 ] ryan rawson commented on HBASE-3513: Try this search: https://repository.apache.org/index.html#nexus-search;quick~libthrift > upgrade thrift to 0.5.0 and use mvn version > --- > > Key: HBASE-3513 > URL: https://issues.apache.org/jira/browse/HBASE-3513 > Project: HBase > Issue Type: Bug > Components: thrift >Affects Versions: 0.90.0 >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.90.2 > > Attachments: HBASE-3515.txt > > > We should upgrade our thrift to 0.5.0, it is the latest and greatest and is > in apache maven repo. > Doing some testing with a thrift 0.5.0 server, and an older pre-release php > client shows the two are on-wire compatible. > Given that the upgrade is entirely on the server side, and has no wire-impact > this should be a relatively low-impact change. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994573#comment-12994573 ] ryan rawson commented on HBASE-1744: talking about this recently with stack, I made the point that we should probably put the new API in the same package/thrift file as the old. Use new symbols, at the end of the .thrift file. That way people can use new and old clients with the same server, making migration easier. It might make some names a little ugly (we could go with the windowsy Ex suffix?) but making an easier migration strategy is important I feel. > Thrift server to match the new java api. > > > Key: HBASE-1744 > URL: https://issues.apache.org/jira/browse/HBASE-1744 > Project: HBase > Issue Type: Improvement > Components: thrift >Reporter: Tim Sell >Assignee: Lars Francke >Priority: Critical > Fix For: 0.92.0 > > Attachments: HBASE-1744.preview.1.patch, thriftexperiment.patch > > > This mutateRows, etc.. is a little confusing compared to the new cleaner java > client. > Thinking of ways to make a thrift client that is just as elegant. something > like: > void put(1:Bytes table, 2:TPut put) throws (1:IOError io) > with: > struct TColumn { > 1:Bytes family, > 2:Bytes qualifier, > 3:i64 timestamp > } > struct TPut { > 1:Bytes row, > 2:map values > } > This creates more verbose rpc than if the columns in TPut were just > map>, but that is harder to fit timestamps into and > still be intuitive from say python. > Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3534) Action should not store or serialize regionName
Action should not store or serialize regionName --- Key: HBASE-3534 URL: https://issues.apache.org/jira/browse/HBASE-3534 Project: HBase Issue Type: Bug Reporter: ryan rawson Fix For: 0.92.0 Action stores the regionName, BUT an action comes from a MultiAction, which contains: public Map>> actions Which means we are storing the regionName multiple times. In fact, no one even calls the accessor getRegionName! It changes the serialization of Action and MultiAction, but reduces the byte overhead. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss
[ https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994966#comment-12994966 ] ryan rawson commented on HBASE-3065: can you check your ZK cluster health? There is a link at the top of the master page called 'zk dump'. We had a situation where 2/5 of our quorum members were not part of it, and you get error messages like that a lot. We changed the logging so it might be illustrating a deployment issue on your end. > Retry all 'retryable' zk operations; e.g. connection loss > - > > Key: HBASE-3065 > URL: https://issues.apache.org/jira/browse/HBASE-3065 > Project: HBase > Issue Type: Bug >Reporter: stack > Fix For: 0.92.0 > > > The 'new' master refactored our zk code tidying up all zk accesses and > coralling them behind nice zk utility classes. One improvement was letting > out all KeeperExceptions letting the client deal. Thats good generally > because in old days, we'd suppress important state zk changes in state. But > there is at least one case the new zk utility could handle for the > application and thats the class of retryable KeeperExceptions. The one that > comes to mind is conection loss. On connection loss we should retry the > just-failed operation. Usually the retry will just work. At worse, on > reconnect, we'll pick up the expired session event. > Adding in this change shouldn't be too bad given the refactor of zk corralled > all zk access into one or two classes only. > One thing to consider though is how much we should retry. We could retry on > a timer or we could retry for ever as long as the Stoppable interface is > passed so if another thread has stopped or aborted the hosting service, we'll > notice and give up trying. Doing the latter is probably better than some > kinda timeout. > HBASE-3062 adds a timed retry on the first zk operation. This issue is about > generalizing what is over there across all zk access. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3542) MultiGet methods in Thrift
[ https://issues.apache.org/jira/browse/HBASE-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996012#comment-12996012 ] ryan rawson commented on HBASE-3542: not yet, there is a big pile of junk in there that needs to be worked thru, im still testing other things at the moment. our api is not that fancy and uses weird names like 'parallelGet'. Any implementation though should use multi() to accomplish it's task. > MultiGet methods in Thrift > -- > > Key: HBASE-3542 > URL: https://issues.apache.org/jira/browse/HBASE-3542 > Project: HBase > Issue Type: Improvement > Components: thrift >Affects Versions: 0.90.0 >Reporter: Martin Blom > Attachments: HBASE-3542[0.90].patch > > > The Thrift API does not expose multi-get operations. This patch adds the > meyhods getRows, getRowsWithColumns, getRowsTs and getRowsWithColumnsTs. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3542) MultiGet methods in Thrift
[ https://issues.apache.org/jira/browse/HBASE-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996053#comment-12996053 ] ryan rawson commented on HBASE-3542: some major things: - adding new methods to the bottom of the .thrift file means the resulting generated files have a minimum of changes. - I'm showing files that are "changed" by virtue of having their license removed and having trailing spaces added. Should be fixed! - lots of white space changes in Hbase.java, shouldnt exist. many IDEs can strip whitespace changes for you. - indentation in ThriftServer.java is off. The project standard is no tabs, 2 spaces per level, 4 spaces for continuing indent. If you set 2/4 in intellij it will do the right thing, eclipse has similar settings. > MultiGet methods in Thrift > -- > > Key: HBASE-3542 > URL: https://issues.apache.org/jira/browse/HBASE-3542 > Project: HBase > Issue Type: Improvement > Components: thrift >Affects Versions: 0.90.0 >Reporter: Martin Blom > Attachments: HBASE-3542[0.90].patch > > > The Thrift API does not expose multi-get operations. This patch adds the > meyhods getRows, getRowsWithColumns, getRowsTs and getRowsWithColumnsTs. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3542) MultiGet methods in Thrift
[ https://issues.apache.org/jira/browse/HBASE-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996292#comment-12996292 ] ryan rawson commented on HBASE-3542: I think if you removed trailing spaces from generated files, added in the license and double checked the ThriftServer tabs we'll be there. Part of the reason to put calls at the bottom is to minimize the changes to the huge Hbase.java and make the change easier to understand/rationalize. > MultiGet methods in Thrift > -- > > Key: HBASE-3542 > URL: https://issues.apache.org/jira/browse/HBASE-3542 > Project: HBase > Issue Type: Improvement > Components: thrift >Affects Versions: 0.90.0 >Reporter: Martin Blom > Attachments: HBASE-3542[0.90].patch > > > The Thrift API does not expose multi-get operations. This patch adds the > meyhods getRows, getRowsWithColumns, getRowsTs and getRowsWithColumnsTs. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3551) Loaded hfile indexes occupy a good chunk of heap; look into shrinking the amount used and/or evicting unused indices
[ https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997100#comment-12997100 ] ryan rawson commented on HBASE-3551: the index size is related to (a) block size and (b) the key size, perhaps by tweaking one or both something might beneficial might happen? > Loaded hfile indexes occupy a good chunk of heap; look into shrinking the > amount used and/or evicting unused indices > > > Key: HBASE-3551 > URL: https://issues.apache.org/jira/browse/HBASE-3551 > Project: HBase > Issue Type: Improvement >Reporter: stack > > I hung with a user Marc and we were looking over configs and his cluster > profile up on ec2. One thing we noticed was that his 100+ 1G regions of two > families had ~2.5G of heap resident. We did a bit of math and couldn't get > to 2.5G so that needs looking into. Even still, 2.5G is a bunch of heap to > give over to indices (He actually OOME'd when he had his RS heap set to just > 3G; we shouldn't OOME, we should just run slower). It sounds like he needs > the indices loaded but still, for some cases we should drop indices for > unaccessed files. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3186) Patch that disables pread
[ https://issues.apache.org/jira/browse/HBASE-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997251#comment-12997251 ] ryan rawson commented on HBASE-3186: also HDFS-347 is really promising as well and avoids this. > Patch that disables pread > - > > Key: HBASE-3186 > URL: https://issues.apache.org/jira/browse/HBASE-3186 > Project: HBase > Issue Type: Task >Reporter: stack > Attachments: 3006-3185-patch-to-0.20.6.txt, disable_read.txt > > > Make a patch to disable pread to see if pread is responsible for growing > number of established connections seen on the mozilla cluster -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3555) Switch to TestNG
Switch to TestNG Key: HBASE-3555 URL: https://issues.apache.org/jira/browse/HBASE-3555 Project: HBase Issue Type: Improvement Reporter: ryan rawson I have been messing with TestNG and I think we should switch to it. It is very similar to junit 4 with annotations, but it supports several features which would allow our build to become slightly more sane: - test groups allow us to separate slow/fast tests from each other - surefire support for running specific groups would allow 'check in tests' vs 'hudson/integration tests' (ie fast/slow) - it supports all the features of junit 4, plus it is VERY similar, making for the transition easy. - they have assertEquals(byte[],byte[]) (!) -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3560) the hbase-default entry of "hbase.defaults.for.version" causes tests not to run via not-maven
the hbase-default entry of "hbase.defaults.for.version" causes tests not to run via not-maven - Key: HBASE-3560 URL: https://issues.apache.org/jira/browse/HBASE-3560 Project: HBase Issue Type: Bug Reporter: ryan rawson using the default setup of a maven project for intellij, tests fail because the hbase-default.xml contains a token @@@VERSION@@@. The maven build creates a substitute file, but this file isn't by default on the classpath. Is there a different place to put the hbase-default.xml file than target/classes? That is excluded from the classpath because IJ has it's own compiler. Including that path in the project classpath seems dangerous. Can we get the output to go to just 'target' or 'target/generated-sources/org' perhaps? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3542) MultiGet methods in Thrift
[ https://issues.apache.org/jira/browse/HBASE-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3542: --- Resolution: Fixed Status: Resolved (was: Patch Available) trunk and branch > MultiGet methods in Thrift > -- > > Key: HBASE-3542 > URL: https://issues.apache.org/jira/browse/HBASE-3542 > Project: HBase > Issue Type: Improvement > Components: thrift >Affects Versions: 0.90.0 >Reporter: Martin Blom >Assignee: Martin Blom > Fix For: 0.90.2 > > Attachments: HBASE-3542[0.90].patch, HBASE-3542[0.90].v2.patch > > > The Thrift API does not expose multi-get operations. This patch adds the > meyhods getRows, getRowsWithColumns, getRowsTs and getRowsWithColumnsTs. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3514) Speedup HFile.Writer append
[ https://issues.apache.org/jira/browse/HBASE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998639#comment-12998639 ] ryan rawson commented on HBASE-3514: im putting this to trunk, it won't apply cleanly to 0.90 branch. can you produce a patch that applies to 0.90 branch? if so we can include it there too! Thanks! > Speedup HFile.Writer append > --- > > Key: HBASE-3514 > URL: https://issues.apache.org/jira/browse/HBASE-3514 > Project: HBase > Issue Type: Improvement > Components: io >Affects Versions: 0.90.0 >Reporter: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-3514-append.patch, > HBASE-3514-metaBlock-bsearch.patch > > > Remove double writes when block cache is specified, by using, only, the > ByteArrayDataStream. > baos is flushed with the compress stream on finishBlock. > On my machines HFilePerformanceEvaluation SequentialWriteBenchmark passes > from 4000ms to 2500ms. > Running SequentialWriteBenchmark for 100 rows took 4247ms. > Running SequentialWriteBenchmark for 100 rows took 4512ms. > Running SequentialWriteBenchmark for 100 rows took 4498ms. > Running SequentialWriteBenchmark for 100 rows took 2697ms. > Running SequentialWriteBenchmark for 100 rows took 2770ms. > Running SequentialWriteBenchmark for 100 rows took 2721ms. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3514) Speedup HFile.Writer append
[ https://issues.apache.org/jira/browse/HBASE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998738#comment-12998738 ] ryan rawson commented on HBASE-3514: This is breaking the build, here is the output: https://hudson.apache.org/hudson/job/HBase-TRUNK/1749/ Apparently this line: + // pre-allocates the byte stream to the block size + 25% + baos = new ByteArrayOutputStream(blocksize + (int)(blocksize * 0.25)); isn't doing the right thing. The patch had (blocksize / 25) which wasnt giving what the comment said, so I switched to what you see here. Somehow this stack snippet happens: Caused by: java.lang.IllegalArgumentException: Negative initial size: -1610612738 at java.io.ByteArrayOutputStream.(ByteArrayOutputStream.java:57) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:417) Can you poke at it please? > Speedup HFile.Writer append > --- > > Key: HBASE-3514 > URL: https://issues.apache.org/jira/browse/HBASE-3514 > Project: HBase > Issue Type: Improvement > Components: io >Affects Versions: 0.90.0 >Reporter: Matteo Bertozzi >Priority: Minor > Attachments: HBASE-3514-append.patch, > HBASE-3514-metaBlock-bsearch.patch > > > Remove double writes when block cache is specified, by using, only, the > ByteArrayDataStream. > baos is flushed with the compress stream on finishBlock. > On my machines HFilePerformanceEvaluation SequentialWriteBenchmark passes > from 4000ms to 2500ms. > Running SequentialWriteBenchmark for 100 rows took 4247ms. > Running SequentialWriteBenchmark for 100 rows took 4512ms. > Running SequentialWriteBenchmark for 100 rows took 4498ms. > Running SequentialWriteBenchmark for 100 rows took 2697ms. > Running SequentialWriteBenchmark for 100 rows took 2770ms. > Running SequentialWriteBenchmark for 100 rows took 2721ms. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3553) number of active threads in HTable's ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999097#comment-12999097 ] ryan rawson commented on HBASE-3553: great results guys, I'll check out the patch later and commit it. We need to move away from thread pool executors for this kind of stuff, and avoid the problems. > number of active threads in HTable's ThreadPoolExecutor > --- > > Key: HBASE-3553 > URL: https://issues.apache.org/jira/browse/HBASE-3553 > Project: HBase > Issue Type: Improvement > Components: client >Affects Versions: 0.90.1 >Reporter: Himanshu Vashishtha > Fix For: 0.90.2 > > Attachments: HBASE-3553_final.patch, ThreadPoolTester.java, > benchmark_results.txt > > > Using a ThreadPoolExecutor with corePoolSize = 0 and using > LinkedBlockingQueue as the collection to hold incoming runnable tasks seems > to be having the effect of running only 1 thread, irrespective of the > maxpoolsize set by reading the property hbase.htable.threads.max (or number > of RS). (This is what I infer from reading source code of ThreadPoolExecutor > class in 1.6) > On a 3 node ec2 cluster, a full table scan with approx 9m rows results in > almost similar timing with a sequential scanner (240 secs) and scanning with > a Coprocessor (230 secs), that uses HTable's pool to submit callable objects > for each region. > I try to come up with a test class that creates a similar threadpool, and > test that whether the pool size ever grows beyond 1. It also confirms that it > remains 1 though it executed 100 requests. > It seems the desired behavior was to release all resources when the client is > done reading, but this can be achieved by setting allowCoreThreadTimeOut to > true (after setting a +ve corePoolSize). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HBASE-3553) number of active threads in HTable's ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson reassigned HBASE-3553: -- Assignee: ryan rawson > number of active threads in HTable's ThreadPoolExecutor > --- > > Key: HBASE-3553 > URL: https://issues.apache.org/jira/browse/HBASE-3553 > Project: HBase > Issue Type: Improvement > Components: client >Affects Versions: 0.90.1 >Reporter: Himanshu Vashishtha >Assignee: ryan rawson > Fix For: 0.90.2 > > Attachments: HBASE-3553_final.patch, ThreadPoolTester.java, > benchmark_results.txt > > > Using a ThreadPoolExecutor with corePoolSize = 0 and using > LinkedBlockingQueue as the collection to hold incoming runnable tasks seems > to be having the effect of running only 1 thread, irrespective of the > maxpoolsize set by reading the property hbase.htable.threads.max (or number > of RS). (This is what I infer from reading source code of ThreadPoolExecutor > class in 1.6) > On a 3 node ec2 cluster, a full table scan with approx 9m rows results in > almost similar timing with a sequential scanner (240 secs) and scanning with > a Coprocessor (230 secs), that uses HTable's pool to submit callable objects > for each region. > I try to come up with a test class that creates a similar threadpool, and > test that whether the pool size ever grows beyond 1. It also confirms that it > remains 1 though it executed 100 requests. > It seems the desired behavior was to release all resources when the client is > done reading, but this can be achieved by setting allowCoreThreadTimeOut to > true (after setting a +ve corePoolSize). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3553) number of active threads in HTable's ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3553: --- Assignee: Gary Helmling (was: ryan rawson) > number of active threads in HTable's ThreadPoolExecutor > --- > > Key: HBASE-3553 > URL: https://issues.apache.org/jira/browse/HBASE-3553 > Project: HBase > Issue Type: Improvement > Components: client >Affects Versions: 0.90.1 >Reporter: Himanshu Vashishtha >Assignee: Gary Helmling > Fix For: 0.90.2 > > Attachments: HBASE-3553_final.patch, ThreadPoolTester.java, > benchmark_results.txt > > > Using a ThreadPoolExecutor with corePoolSize = 0 and using > LinkedBlockingQueue as the collection to hold incoming runnable tasks seems > to be having the effect of running only 1 thread, irrespective of the > maxpoolsize set by reading the property hbase.htable.threads.max (or number > of RS). (This is what I infer from reading source code of ThreadPoolExecutor > class in 1.6) > On a 3 node ec2 cluster, a full table scan with approx 9m rows results in > almost similar timing with a sequential scanner (240 secs) and scanning with > a Coprocessor (230 secs), that uses HTable's pool to submit callable objects > for each region. > I try to come up with a test class that creates a similar threadpool, and > test that whether the pool size ever grows beyond 1. It also confirms that it > remains 1 though it executed 100 requests. > It seems the desired behavior was to release all resources when the client is > done reading, but this can be achieved by setting allowCoreThreadTimeOut to > true (after setting a +ve corePoolSize). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3553) number of active threads in HTable's ThreadPoolExecutor
[ https://issues.apache.org/jira/browse/HBASE-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999098#comment-12999098 ] ryan rawson commented on HBASE-3553: oh sorry pre coffee obviously. Thanks again. > number of active threads in HTable's ThreadPoolExecutor > --- > > Key: HBASE-3553 > URL: https://issues.apache.org/jira/browse/HBASE-3553 > Project: HBase > Issue Type: Improvement > Components: client >Affects Versions: 0.90.1 >Reporter: Himanshu Vashishtha >Assignee: ryan rawson > Fix For: 0.90.2 > > Attachments: HBASE-3553_final.patch, ThreadPoolTester.java, > benchmark_results.txt > > > Using a ThreadPoolExecutor with corePoolSize = 0 and using > LinkedBlockingQueue as the collection to hold incoming runnable tasks seems > to be having the effect of running only 1 thread, irrespective of the > maxpoolsize set by reading the property hbase.htable.threads.max (or number > of RS). (This is what I infer from reading source code of ThreadPoolExecutor > class in 1.6) > On a 3 node ec2 cluster, a full table scan with approx 9m rows results in > almost similar timing with a sequential scanner (240 secs) and scanning with > a Coprocessor (230 secs), that uses HTable's pool to submit callable objects > for each region. > I try to come up with a test class that creates a similar threadpool, and > test that whether the pool size ever grows beyond 1. It also confirms that it > remains 1 though it executed 100 requests. > It seems the desired behavior was to release all resources when the client is > done reading, but this can be achieved by setting allowCoreThreadTimeOut to > true (after setting a +ve corePoolSize). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3560) the hbase-default entry of "hbase.defaults.for.version" causes tests not to run via not-maven
[ https://issues.apache.org/jira/browse/HBASE-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999220#comment-12999220 ] ryan rawson commented on HBASE-3560: that seems fine as well, todd mentioned something similar (disable check in tests). he said that we might set a property in tests only? > the hbase-default entry of "hbase.defaults.for.version" causes tests not to > run via not-maven > - > > Key: HBASE-3560 > URL: https://issues.apache.org/jira/browse/HBASE-3560 > Project: HBase > Issue Type: Bug >Reporter: ryan rawson > Attachments: 3560.txt > > > using the default setup of a maven project for intellij, tests fail because > the hbase-default.xml contains a token @@@VERSION@@@. The maven build > creates a substitute file, but this file isn't by default on the classpath. > Is there a different place to put the hbase-default.xml file than > target/classes? That is excluded from the classpath because IJ has it's own > compiler. Including that path in the project classpath seems dangerous. > Can we get the output to go to just 'target' or > 'target/generated-sources/org' perhaps? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3560) the hbase-default entry of "hbase.defaults.for.version" causes tests not to run via not-maven
[ https://issues.apache.org/jira/browse/HBASE-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999222#comment-12999222 ] ryan rawson commented on HBASE-3560: lgtm +1 > the hbase-default entry of "hbase.defaults.for.version" causes tests not to > run via not-maven > - > > Key: HBASE-3560 > URL: https://issues.apache.org/jira/browse/HBASE-3560 > Project: HBase > Issue Type: Bug >Reporter: ryan rawson > Attachments: 3560.txt > > > using the default setup of a maven project for intellij, tests fail because > the hbase-default.xml contains a token @@@VERSION@@@. The maven build > creates a substitute file, but this file isn't by default on the classpath. > Is there a different place to put the hbase-default.xml file than > target/classes? That is excluded from the classpath because IJ has it's own > compiler. Including that path in the project classpath seems dangerous. > Can we get the output to go to just 'target' or > 'target/generated-sources/org' perhaps? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HBASE-3560) the hbase-default entry of "hbase.defaults.for.version" causes tests not to run via not-maven
[ https://issues.apache.org/jira/browse/HBASE-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson resolved HBASE-3560. Resolution: Fixed Fix Version/s: 0.92.0 thanks for the patch stack, committed. > the hbase-default entry of "hbase.defaults.for.version" causes tests not to > run via not-maven > - > > Key: HBASE-3560 > URL: https://issues.apache.org/jira/browse/HBASE-3560 > Project: HBase > Issue Type: Bug >Reporter: ryan rawson > Fix For: 0.92.0 > > Attachments: 3560.txt > > > using the default setup of a maven project for intellij, tests fail because > the hbase-default.xml contains a token @@@VERSION@@@. The maven build > creates a substitute file, but this file isn't by default on the classpath. > Is there a different place to put the hbase-default.xml file than > target/classes? That is excluded from the classpath because IJ has it's own > compiler. Including that path in the project classpath seems dangerous. > Can we get the output to go to just 'target' or > 'target/generated-sources/org' perhaps? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3529) Add search to HBase
[ https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999533#comment-12999533 ] ryan rawson commented on HBASE-3529: there are no local hbase files. You'll have to come up with something yourself i guess? > Add search to HBase > --- > > Key: HBASE-3529 > URL: https://issues.apache.org/jira/browse/HBASE-3529 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.90.0 >Reporter: Jason Rutherglen > > Using the Apache Lucene library we can add freetext search to HBase. The > advantages of this are: > * HBase is highly scalable and distributed > * HBase is realtime > * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) > * Lucene offers many types of queries not currently available in HBase (eg, > AND, OR, NOT, phrase, etc) > * It's easier to build scalable realtime systems on top of already > architecturally sound, scalable realtime data system, eg, HBase. > * Scaling realtime search will be as simple as scaling HBase. > Phase 1 - Indexing: > * Integrate Lucene into HBase such that an index mirrors a given region. > This means cascading add, update, and deletes between a Lucene index and an > HBase region (and vice versa). > * Define meta-data to mark a region as indexed, and use a Solr schema to > allow the user to define the fields and analyzers. > * Integrate with the HLog to ensure that index recovery can occur properly > (eg, on region server failure) > * Mirror region splits with indexes (use Lucene's IndexSplitter?) > * When a region is written to HDFS, also write the corresponding Lucene index > to HDFS. > * A row key will be the ID of a given Lucene document. The Lucene docstore > will explicitly not be used because the document/row data is stored in HBase. > We will need to solve what the best data structure for efficiently mapping a > docid -> row key is. It could be a docstore, field cache, column stride > fields, or some other mechanism. > * Write unit tests for the above > Phase 2 - Queries: > * Enable distributed Lucene queries > * Regions that have Lucene indexes are inherently available and may be > searched on, meaning there's no need for a separate search related system in > Zookeeper. > * Integrate search with HBase's RPC mechanism -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3529) Add search to HBase
[ https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999546#comment-12999546 ] ryan rawson commented on HBASE-3529: it's going to be tricky, since with security some people may choose to run hdfs and hbase on different users. Futhermore most hadoop installs have multiple jbod-style disks, and places like /tmp won't have much room (my /tmp has < 2GB). If you can avoid local files as much as possible, I'd try to do that. > Add search to HBase > --- > > Key: HBASE-3529 > URL: https://issues.apache.org/jira/browse/HBASE-3529 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.90.0 >Reporter: Jason Rutherglen > > Using the Apache Lucene library we can add freetext search to HBase. The > advantages of this are: > * HBase is highly scalable and distributed > * HBase is realtime > * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) > * Lucene offers many types of queries not currently available in HBase (eg, > AND, OR, NOT, phrase, etc) > * It's easier to build scalable realtime systems on top of already > architecturally sound, scalable realtime data system, eg, HBase. > * Scaling realtime search will be as simple as scaling HBase. > Phase 1 - Indexing: > * Integrate Lucene into HBase such that an index mirrors a given region. > This means cascading add, update, and deletes between a Lucene index and an > HBase region (and vice versa). > * Define meta-data to mark a region as indexed, and use a Solr schema to > allow the user to define the fields and analyzers. > * Integrate with the HLog to ensure that index recovery can occur properly > (eg, on region server failure) > * Mirror region splits with indexes (use Lucene's IndexSplitter?) > * When a region is written to HDFS, also write the corresponding Lucene index > to HDFS. > * A row key will be the ID of a given Lucene document. The Lucene docstore > will explicitly not be used because the document/row data is stored in HBase. > We will need to solve what the best data structure for efficiently mapping a > docid -> row key is. It could be a docstore, field cache, column stride > fields, or some other mechanism. > * Write unit tests for the above > Phase 2 - Queries: > * Enable distributed Lucene queries > * Regions that have Lucene indexes are inherently available and may be > searched on, meaning there's no need for a separate search related system in > Zookeeper. > * Integrate search with HBase's RPC mechanism -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3529) Add search to HBase
[ https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999684#comment-12999684 ] ryan rawson commented on HBASE-3529: HDFS-347 is not in CDH nor in branch-20-append. As for a plan to implement it, perhaps you should? > Add search to HBase > --- > > Key: HBASE-3529 > URL: https://issues.apache.org/jira/browse/HBASE-3529 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.90.0 >Reporter: Jason Rutherglen > > Using the Apache Lucene library we can add freetext search to HBase. The > advantages of this are: > * HBase is highly scalable and distributed > * HBase is realtime > * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) > * Lucene offers many types of queries not currently available in HBase (eg, > AND, OR, NOT, phrase, etc) > * It's easier to build scalable realtime systems on top of already > architecturally sound, scalable realtime data system, eg, HBase. > * Scaling realtime search will be as simple as scaling HBase. > Phase 1 - Indexing: > * Integrate Lucene into HBase such that an index mirrors a given region. > This means cascading add, update, and deletes between a Lucene index and an > HBase region (and vice versa). > * Define meta-data to mark a region as indexed, and use a Solr schema to > allow the user to define the fields and analyzers. > * Integrate with the HLog to ensure that index recovery can occur properly > (eg, on region server failure) > * Mirror region splits with indexes (use Lucene's IndexSplitter?) > * When a region is written to HDFS, also write the corresponding Lucene index > to HDFS. > * A row key will be the ID of a given Lucene document. The Lucene docstore > will explicitly not be used because the document/row data is stored in HBase. > We will need to solve what the best data structure for efficiently mapping a > docid -> row key is. It could be a docstore, field cache, column stride > fields, or some other mechanism. > * Write unit tests for the above > Phase 2 - Queries: > * Enable distributed Lucene queries > * Regions that have Lucene indexes are inherently available and may be > searched on, meaning there's no need for a separate search related system in > Zookeeper. > * Integrate search with HBase's RPC mechanism -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3529) Add search to HBase
[ https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999688#comment-12999688 ] ryan rawson commented on HBASE-3529: I do not know, the whole thing is pretty green field. There are a few different implementations of HDFS-347, and I haven't actually seen a credible attempt at really getting it into a shipping hadoop yet. The test patches are pretty great, but they are POC and won't actually be shipping (due to hadoop security). You can give it a shot, but be warned you might not get much for your troubles in terms of committed code. > Add search to HBase > --- > > Key: HBASE-3529 > URL: https://issues.apache.org/jira/browse/HBASE-3529 > Project: HBase > Issue Type: Improvement >Affects Versions: 0.90.0 >Reporter: Jason Rutherglen > > Using the Apache Lucene library we can add freetext search to HBase. The > advantages of this are: > * HBase is highly scalable and distributed > * HBase is realtime > * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312) > * Lucene offers many types of queries not currently available in HBase (eg, > AND, OR, NOT, phrase, etc) > * It's easier to build scalable realtime systems on top of already > architecturally sound, scalable realtime data system, eg, HBase. > * Scaling realtime search will be as simple as scaling HBase. > Phase 1 - Indexing: > * Integrate Lucene into HBase such that an index mirrors a given region. > This means cascading add, update, and deletes between a Lucene index and an > HBase region (and vice versa). > * Define meta-data to mark a region as indexed, and use a Solr schema to > allow the user to define the fields and analyzers. > * Integrate with the HLog to ensure that index recovery can occur properly > (eg, on region server failure) > * Mirror region splits with indexes (use Lucene's IndexSplitter?) > * When a region is written to HDFS, also write the corresponding Lucene index > to HDFS. > * A row key will be the ID of a given Lucene document. The Lucene docstore > will explicitly not be used because the document/row data is stored in HBase. > We will need to solve what the best data structure for efficiently mapping a > docid -> row key is. It could be a docstore, field cache, column stride > fields, or some other mechanism. > * Write unit tests for the above > Phase 2 - Queries: > * Enable distributed Lucene queries > * Regions that have Lucene indexes are inherently available and may be > searched on, meaning there's no need for a separate search related system in > Zookeeper. > * Integrate search with HBase's RPC mechanism -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HBASE-3572) memstore lab can leave half inited data structs (bad!)
[ https://issues.apache.org/jira/browse/HBASE-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson reassigned HBASE-3572: -- Assignee: ryan rawson > memstore lab can leave half inited data structs (bad!) > -- > > Key: HBASE-3572 > URL: https://issues.apache.org/jira/browse/HBASE-3572 > Project: HBase > Issue Type: Bug >Reporter: ryan rawson >Assignee: ryan rawson > > in Chunk.init() if new byte[] fails it leaves the Chunk in its uninitialized > state, other threads will assume someone else will init it and get stuck in > an infinite loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HBASE-3572) memstore lab can leave half inited data structs (bad!)
memstore lab can leave half inited data structs (bad!) -- Key: HBASE-3572 URL: https://issues.apache.org/jira/browse/HBASE-3572 Project: HBase Issue Type: Bug Reporter: ryan rawson in Chunk.init() if new byte[] fails it leaves the Chunk in its uninitialized state, other threads will assume someone else will init it and get stuck in an infinite loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3572) memstore lab can leave half inited data structs (bad!)
[ https://issues.apache.org/jira/browse/HBASE-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3572: --- Attachment: memstorelab-oom.txt > memstore lab can leave half inited data structs (bad!) > -- > > Key: HBASE-3572 > URL: https://issues.apache.org/jira/browse/HBASE-3572 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.1, 0.92.0 >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.90.2, 0.92.0 > > Attachments: memstorelab-oom.txt > > > in Chunk.init() if new byte[] fails it leaves the Chunk in its uninitialized > state, other threads will assume someone else will init it and get stuck in > an infinite loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3572) memstore lab can leave half inited data structs (bad!)
[ https://issues.apache.org/jira/browse/HBASE-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryan rawson updated HBASE-3572: --- Fix Version/s: 0.92.0 0.90.2 Affects Version/s: 0.92.0 0.90.1 Status: Patch Available (was: Open) > memstore lab can leave half inited data structs (bad!) > -- > > Key: HBASE-3572 > URL: https://issues.apache.org/jira/browse/HBASE-3572 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.1, 0.92.0 >Reporter: ryan rawson >Assignee: ryan rawson > Fix For: 0.90.2, 0.92.0 > > Attachments: memstorelab-oom.txt > > > in Chunk.init() if new byte[] fails it leaves the Chunk in its uninitialized > state, other threads will assume someone else will init it and get stuck in > an infinite loop. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira