[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179335#comment-13179335
 ] 

Phabricator commented on HBASE-4218:


mcorgan has commented on the revision "[jira] [HBASE-4218] HFile data block 
encoding framework and delta encoding implementation".

  Trying to review this with an eye on schema changes and compactions.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java:241
 What about the situation where regionserver is running for a while with 
ENCODING_IN_MEMORY=true and block cache gets filled with encoded blocks, and 
then user does schema change to disable encoding altogether.  Now the block 
cache may return an old encoded block.  (Assuming online schema change doesn't 
invalidate all blocks for a table?)

  If i'm understanding that correctly, then it shouldn't be an 
IllegalStateException but should be handled normally.  It should probably 
invalidate the encoded block from the block cache if possible, otherwise it 
will expire normally.  Then it should return null so that HfileReaderV2 knows 
to go to the filesystem to get the block.

REVISION DETAIL
  https://reviews.facebook.net/D447


> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179349#comment-13179349
 ] 

Mikhail Bautin commented on HBASE-4218:
---

A brief status update. I am in the process of implementing support for column 
family data block encoding configuration changes. Those changes are coming in 
the next version of the patch that I will post tomorrow. After discussing this 
with Kannan, our solution is:
* Assign an in-cache data block encoding to every HFile reader. This in-cache 
encoding is determined as follows:
** If the HFile is not encoded on disk, the in-cache encoding is set to the 
column family's DATA_BLOCK_ENCODING.
** If the HFile is encoded on disk, the in-cache encoding is set to the HFile 
encoding to avoid the wasted effort of re-encoding blocks for cache.
* When a non-encoded block is loaded from disk, it is encoded using the 
in-cache encoding and put in cache.
* When an encoded block is loaded from disk, its encoding is left as is.
* To reduce the complexity of data block encoding switching, we can include the 
in-cache encoding type in the block cache key. For example, if 
ENCODED_IN_CACHE_ONLY is turned on without encoding on disk, and then the 
encoding is turned off altogether, the cache will be populated with non-encoded 
blocks (since they will have completely different keys) and encoded blocks will 
age out from the cache. While this is suboptimal, the implementation is very 
simple and the common case (when the CF encoding options do not change) is not 
complicated with unnecessary corner cases.


> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179390#comment-13179390
 ] 

Zhihong Yu commented on HBASE-5121:
---

Should null assignment for lastTop be after the debug log ?

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
> Attachments: hbase-5121.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179394#comment-13179394
 ] 

Zhihong Yu commented on HBASE-5081:
---

In the above scenario, would there be many "duplicate log" messages in master 
log file ?

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3939) Some crossports of Hadoop IPC fixes

2012-01-04 Thread Benoit Sigoure (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179450#comment-13179450
 ] 

Benoit Sigoure commented on HBASE-3939:
---

On 04/Nov/11 00:07, Stack wrote:
bq. So why introduce it? Just so its in place when we want to use it later?

Anyone knows the answer?

> Some crossports of Hadoop IPC fixes
> ---
>
> Key: HBASE-3939
> URL: https://issues.apache.org/jira/browse/HBASE-3939
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
> 3939-v6.txt, 3939-v7.txt, 3939-v8.txt, 3939-v9.txt, 3939.txt
>
>
> A few fixes from Hadoop IPC that we should probably cross-port into our copy:
> - HADOOP-7227: remove the protocol version check at call time
> - HADOOP-7146: fix a socket leak in server
> - HADOOP-7121: fix behavior when response serialization throws an exception
> - HADOOP-7346: send back nicer error response when client is using an out of 
> date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179394#comment-13179394
 ] 

Zhihong Yu edited comment on HBASE-5081 at 1/4/12 2:48 PM:
---

In the above scenario, would there be many "duplicate log split scheduled for " 
messages in master log file ?

I think the latest patch should go through verification in a real cluster.

  was (Author: zhi...@ebaysf.com):
In the above scenario, would there be many "duplicate log" messages in 
master log file ?
  
> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5122) Provide flexible method for loading ColumnInterpreters

2012-01-04 Thread Zhihong Yu (Created) (JIRA)
Provide flexible method for loading ColumnInterpreters
--

 Key: HBASE-5122
 URL: https://issues.apache.org/jira/browse/HBASE-5122
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu


See the discussion on user list entitled 'AggregateProtocol Help'
>From Royston:
{noformat}
I re-created my HBase table to contain Bytes.toBytes(Long) values and that 
fixed it.
{noformat}
It was not the intention when AggregateProtocol was designed that users have to 
change their schema to match LongColumnInterpreter.

This JIRA aims to provide a flexible way for users to load their custom 
ColumnInterpreters into region servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5123) Provide more aggregate functions for Aggregations Protocol

2012-01-04 Thread Zhihong Yu (Created) (JIRA)
Provide more aggregate functions for Aggregations Protocol
--

 Key: HBASE-5123
 URL: https://issues.apache.org/jira/browse/HBASE-5123
 Project: HBase
  Issue Type: Improvement
Reporter: Zhihong Yu


Royston requested the following aggregates on top of what we already have:
Median, Weighted Median, Mult

See discussion entitled 'AggregateProtocol Help' on user list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3939) Some crossports of Hadoop IPC fixes

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179551#comment-13179551
 ] 

stack commented on HBASE-3939:
--

@BenoƮt Is this a blocker for you?

> Some crossports of Hadoop IPC fixes
> ---
>
> Key: HBASE-3939
> URL: https://issues.apache.org/jira/browse/HBASE-3939
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3939-v2.txt, 3939-v3.txt, 3939-v4.txt, 3939-v5.txt, 
> 3939-v6.txt, 3939-v7.txt, 3939-v8.txt, 3939-v9.txt, 3939.txt
>
>
> A few fixes from Hadoop IPC that we should probably cross-port into our copy:
> - HADOOP-7227: remove the protocol version check at call time
> - HADOOP-7146: fix a socket leak in server
> - HADOOP-7121: fix behavior when response serialization throws an exception
> - HADOOP-7346: send back nicer error response when client is using an out of 
> date IPC version

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4565) Maven HBase build broken on cygwin with copynativelib.sh call.

2012-01-04 Thread Suraj Varma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179550#comment-13179550
 ] 

Suraj Varma commented on HBASE-4565:


Ah - looks like it is out of sync with 0.92 and trunk now with the 
hbase-security pom updates. I'll rebase and put out a new patch version.

> Maven HBase build broken on cygwin with copynativelib.sh call.
> --
>
> Key: HBASE-4565
> URL: https://issues.apache.org/jira/browse/HBASE-4565
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.92.0
> Environment: cygwin (on xp and win7)
>Reporter: Suraj Varma
>Assignee: Suraj Varma
>  Labels: build, maven
> Fix For: 0.94.0
>
> Attachments: HBASE-4565-0.92.patch, HBASE-4565-v2.patch, 
> HBASE-4565-v3-0.92.patch, HBASE-4565-v3.patch, HBASE-4565.patch
>
>
> This is broken in both 0.92 as well as trunk pom.xml
> Here's a sample maven log snippet from trunk (from Mayuresh on user mailing 
> list)
> [INFO] [antrun:run {execution: package}]
> [INFO] Executing tasks
> main:
>[mkdir] Created dir: 
> D:\workspace\mkshirsa\hbase-trunk\target\hbase-0.93-SNAPSHOT\hbase-0.93-SNAPSHOT\lib\native\${build.platform}
> [exec] ls: cannot access D:workspacemkshirsahbase-trunktarget/nativelib: 
> No such file or directory
> [exec] tar (child): Cannot connect to D: resolve failed
> [INFO] 
> 
> [ERROR] BUILD ERROR
> [INFO] 
> 
> [INFO] An Ant BuildException has occured: exec returned: 3328
> There are two issues: 
> 1) The ant run task below doesn't resolve the windows file separator returned 
> by the project.build.directory - this causes the above resolve failed.
> 
> 
> if [ `ls ${project.build.directory}/nativelib | wc -l` -ne 0]; then
> 2) The tar argument value below also has a similar issue in that the path arg 
> doesn't resolve right.
> 
>  dir="${project.build.directory}/${project.artifactId}-${project.version}">
> 
>  value="/cygdrive/c/workspaces/hbase-0.92-svn/target/${project.artifactId}-${project.version}.tar.gz"/>
> 
> 
> In both cases, the fix would probably be to use a cross-platform way to 
> handle the directory locations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5123) Provide more aggregate functions for Aggregations Protocol

2012-01-04 Thread Tom Wilcox (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179559#comment-13179559
 ] 

Tom Wilcox commented on HBASE-5123:
---

SumProduct is probably another useful one.

> Provide more aggregate functions for Aggregations Protocol
> --
>
> Key: HBASE-5123
> URL: https://issues.apache.org/jira/browse/HBASE-5123
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhihong Yu
>
> Royston requested the following aggregates on top of what we already have:
> Median, Weighted Median, Mult
> See discussion entitled 'AggregateProtocol Help' on user list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179611#comment-13179611
 ] 

Jimmy Xiang commented on HBASE-5081:


@Prakash, cool, that's great.

splitlog is called by master when it starts up, and the server shutdownhandler 
when a rs dies.
The master does wait then retry.  However, server shutdownhandler doesn't wait. 
 Can we make it
wait as the master does?  ServerShutdownHandler.process().

Another thing is the resubmit() method, it's called by multiple threads: the 
monitor chore thread
and the ZK event thread. Access to task's member fields should be synchronized, 
or make them volatile.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-5120:
--

Attachment: HBASE-5120.patch

Patch is attached so that i can access it at home.  Not the final one and not 
fully tested in cluster.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc2

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Comment: was deleted

(was: Patch is attached so that i can access it at home.  Not the final one and 
not fully tested in cluster.)

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Attachment: HBASE-5120.patch

Attaching the patch so that i can access it at home.  First cut versiion and 
not fully tested.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,132563501

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5120:
--

Attachment: (was: HBASE-5120.patch)

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> 201

[jira] [Issue Comment Edited] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179390#comment-13179390
 ] 

Zhihong Yu edited comment on HBASE-5121 at 1/4/12 5:04 PM:
---

{code}
+this.lastTop = null;
+LOG.debug("Storescanner.peek() is changed where before = "
++ this.lastTop.toString() + ",and after = "
{code}
Should null assignment for lastTop be after the debug log ?

  was (Author: zhi...@ebaysf.com):
Should null assignment for lastTop be after the debug log ?
  
> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
> Attachments: hbase-5121.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179650#comment-13179650
 ] 

Lars Hofhansl commented on HBASE-4218:
--

One more thought about ENCODED_IN_CACHE_ONLY (and then I'll shut up about 
this)...

If we ever wanted to extend this in the future and allow disk only encoding, 
maybe a better way would be to have ENCODING and ENCODE_ON_DISK. ENCODE_ON_DISK 
(default false) would just be the inverse of what ENCODED_IN_CACHE_ONLY is. 
That way (if we felt so inclined) we can add ENCODE_IN_CACHE later and allow it 
to be false.


> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179660#comment-13179660
 ] 

Lars Hofhansl commented on HBASE-5120:
--

Hey Ram, I know you said this is not done, yet...
{code}
if (t instanceof NullPointerException) {
  removeRegionInTransition(region);
...
{code}

Could we instead add a null check at the relevant point and deal with it there 
(or maybe throw another exception)? (Dealing with this after the NPE occurred 
leaves a bad taste... What we introduce another bug later that also causes an 
NPE, that will go unnoticed for a while.)


> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 

[jira] [Issue Comment Edited] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179660#comment-13179660
 ] 

Lars Hofhansl edited comment on HBASE-5120 at 1/4/12 5:17 PM:
--

Hey Ram, I know you said this is not done, yet...
{code}
if (t instanceof NullPointerException) {
  removeRegionInTransition(region);
...
{code}

Could we instead add a null check at the relevant point and deal with it there 
(or maybe throw another exception)? (Dealing with this after the NPE occurred 
leaves a bad taste... What if we introduce another bug later that also causes 
an NPE? That would go unnoticed for a while.)


  was (Author: lhofhansl):
Hey Ram, I know you said this is not done, yet...
{code}
if (t instanceof NullPointerException) {
  removeRegionInTransition(region);
...
{code}

Could we instead add a null check at the relevant point and deal with it there 
(or maybe throw another exception)? (Dealing with this after the NPE occurred 
leaves a bad taste... What we introduce another bug later that also causes an 
NPE, that will go unnoticed for a while.)

  
> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is block

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179662#comment-13179662
 ] 

Zhihong Yu commented on HBASE-5120:
---

Thanks for the quick turnaround.

The patch depends on NullPointerException to detect problematic region.
I think we should use more protective measure as Lars commented.
The patch specifies M_ZK_REGION_CLOSING for ZKAssign.deleteNode(). This narrows 
the applicability of the NPE handling.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328,

[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5120:
--

Fix Version/s: 0.94.0
   0.92.0

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=13256363

[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179666#comment-13179666
 ] 

Lars Hofhansl commented on HBASE-5121:
--

I see the patch is against trunk. Does this happen in trunk only?

It might have to do with the various optimization for scanning that the FB 
folks put in (if I recall FB rarely does major compactions).
Maybe Mikhail could have a look at this too.


> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
> Attachments: hbase-5121.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179685#comment-13179685
 ] 

Lars Hofhansl commented on HBASE-2947:
--

I ran the four failing tests locally and they all pass.
I would like to commit this today (the change it pretty uncontentious I think).


> MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
> --
>
> Key: HBASE-2947
> URL: https://issues.apache.org/jira/browse/HBASE-2947
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Jonathan Gray
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 2947-v2.txt, HBASE-2947-v1.patch
>
>
> HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
> operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179685#comment-13179685
 ] 

Lars Hofhansl edited comment on HBASE-2947 at 1/4/12 5:53 PM:
--

I ran the four failing tests locally and they all pass.
I would like to commit this today (the change itself is pretty uncontentious I 
think).


  was (Author: lhofhansl):
I ran the four failing tests locally and they all pass.
I would like to commit this today (the change it pretty uncontentious I think).

  
> MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
> --
>
> Key: HBASE-2947
> URL: https://issues.apache.org/jira/browse/HBASE-2947
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Jonathan Gray
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 2947-v2.txt, HBASE-2947-v1.patch
>
>
> HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
> operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions to be load-balanced by table

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179712#comment-13179712
 ] 

Zhihong Yu commented on HBASE-3373:
---

@Ben, @Jonathan Gray:
What do you think of my patch ?

Thanks

> Allow regions to be load-balanced by table
> --
>
> Key: HBASE-3373
> URL: https://issues.apache.org/jira/browse/HBASE-3373
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.20.6
>Reporter: Ted Yu
> Fix For: 0.94.0
>
> Attachments: 3373.txt, HbaseBalancerTest2.java
>
>
> From our experience, cluster can be well balanced and yet, one table's 
> regions may be badly concentrated on few region servers.
> For example, one table has 839 regions (380 regions at time of table 
> creation) out of which 202 are on one server.
> It would be desirable for load balancer to distribute regions for specified 
> tables evenly across the cluster. Each of such tables has number of regions 
> many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5097:
--

Status: Open  (was: Patch Available)

> RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
> return null can stall the system initialization through NPE
> ---
>
> Key: HBASE-5097
> URL: https://issues.apache.org/jira/browse/HBASE-5097
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-5097.patch, HBASE-5097_1.patch
>
>
> In HRegionServer.java openScanner()
> {code}
>   r.prepareScanner(scan);
>   RegionScanner s = null;
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().preScannerOpen(scan);
>   }
>   if (s == null) {
> s = r.getScanner(scan);
>   }
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().postScannerOpen(scan, s);
>   }
> {code}
> If we dont have implemention for postScannerOpen the RegionScanner is null 
> and so throwing nullpointer 
> {code}
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
> {code}
> Making this defect as blocker.. Pls feel free to change the priority if am 
> wrong.  Also correct me if my way of trying out coprocessors without 
> implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5097:
--

Attachment: HBASE-5097_2.patch

Indention problem occured by mistake.  Corrected patch uploaded.

> RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
> return null can stall the system initialization through NPE
> ---
>
> Key: HBASE-5097
> URL: https://issues.apache.org/jira/browse/HBASE-5097
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch
>
>
> In HRegionServer.java openScanner()
> {code}
>   r.prepareScanner(scan);
>   RegionScanner s = null;
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().preScannerOpen(scan);
>   }
>   if (s == null) {
> s = r.getScanner(scan);
>   }
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().postScannerOpen(scan, s);
>   }
> {code}
> If we dont have implemention for postScannerOpen the RegionScanner is null 
> and so throwing nullpointer 
> {code}
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
> {code}
> Making this defect as blocker.. Pls feel free to change the priority if am 
> wrong.  Also correct me if my way of trying out coprocessors without 
> implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179725#comment-13179725
 ] 

ramkrishna.s.vasudevan commented on HBASE-5120:
---

@Lars and @Ted
Thanks for your reviews.
I was also not convinced in adding the instanceof check for NPE.  It does not 
look good.
I was trying at other option of not calling sendRegionClose if the server is 
null.
Also remove the RIT if anything has been added.  But need to verify some more 
scenarios. Not sure if i can dedicate time tomorrow as we have some internal 
workshop.  I will work on this and give my patch sooner. :)


> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hado

[jira] [Commented] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-01-04 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179727#comment-13179727
 ] 

Andrew Purtell commented on HBASE-5097:
---

+1 on fixed patch.

> RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
> return null can stall the system initialization through NPE
> ---
>
> Key: HBASE-5097
> URL: https://issues.apache.org/jira/browse/HBASE-5097
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch
>
>
> In HRegionServer.java openScanner()
> {code}
>   r.prepareScanner(scan);
>   RegionScanner s = null;
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().preScannerOpen(scan);
>   }
>   if (s == null) {
> s = r.getScanner(scan);
>   }
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().postScannerOpen(scan, s);
>   }
> {code}
> If we dont have implemention for postScannerOpen the RegionScanner is null 
> and so throwing nullpointer 
> {code}
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
> {code}
> Making this defect as blocker.. Pls feel free to change the priority if am 
> wrong.  Also correct me if my way of trying out coprocessors without 
> implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Shrijeet Paliwal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shrijeet Paliwal updated HBASE-5041:


Attachment: 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch

Uploading patch for 0.90 branch. Sorry for the delay Ted. Noticed your comment 
late that you wanted to merge it yesterday itself.

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5124) Backport LoadTestTool to 0.92

2012-01-04 Thread Zhihong Yu (Created) (JIRA)
Backport LoadTestTool to 0.92
-

 Key: HBASE-5124
 URL: https://issues.apache.org/jira/browse/HBASE-5124
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
 Fix For: 0.92.0


LoadTestTool is very useful.
This JIRA backports LoadTestTool to 0.92 so that users don't have to build 
TRUNK in order to use it against 0.92 cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179735#comment-13179735
 ] 

Hadoop QA commented on HBASE-5041:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12509436/0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/664//console

This message is automatically generated.

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

2012-01-04 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179738#comment-13179738
 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
---

@Stack
I tried the option of applying the fix on the master side.  But i felt like the 
history of the region was not available.  (May be am wrong.)
Also the ServerShutdownhandler.processShutDown() removes the online server 
first and also the list of regions.

Then it is after the region opening is done by the balancer flow the new server 
name is again added back.
{code}
if (isServerOnline(sn)) {
this.regions.put(regionInfo, sn);
addToServers(sn, regionInfo);
this.regions.notifyAll();
{code}
bq.Could we make it such that only one thread can transition a region at a time?
This i did not think much on this.  

@Ming Ma
If it is ok, do you mind sharing the patch that you had prepared ?


> The META can hold an entry for a region with a different server name from the 
> one actually in the AssignmentManager thus making the region inaccessible.
> 
>
> Key: HBASE-5094
> URL: https://issues.apache.org/jira/browse/HBASE-5094
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = 
> this.services.getAssignmentManager().isRegionInTransition(e.getKey());
> ServerName addressFromAM = this.services.getAssignmentManager()
> .getRegionServerOfRegion(e.getKey());
> if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>   // Skip regions that were in transition unless CLOSING or
>   // PENDING_CLOSE
>   LOG.info("Skip assigning region " + rit.toString());
> } else if (addressFromAM != null
> && !addressFromAM.equals(this.serverName)) {
>   LOG.debug("Skip assigning region "
> + e.getKey().getRegionNameAsString()
> + " because it has been opened in "
> + addressFromAM.getServerName());
>   }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address 
> is initially null because it is not yet updated after the region was opened 
> .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a 
> new assignment.
> So there is a small window between the online region is actually added in to 
> the online list and the ServerShutdownHandler where we check the existing 
> address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179743#comment-13179743
 ] 

Todd Lipcon commented on HBASE-5121:


No test for this? Can we get a functional test that can at least be run from 
the command line to trigger it?

Nits: the javadoc on the constructors are unnecessary (they don't give any 
extra information)
Using the exception for control flow here looks dirty IMO.

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
> Attachments: hbase-5121.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5081:
---

Attachment: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch

implement feedback

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5121:
-

Assignee: chunhui shen

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-5121.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4773) HBaseAdmin may leak ZooKeeper connections

2012-01-04 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4773:
--

   Resolution: Fixed
Fix Version/s: 0.92.0
   Status: Resolved  (was: Patch Available)

Committed sometime back.

> HBaseAdmin may leak ZooKeeper connections
> -
>
> Key: HBASE-4773
> URL: https://issues.apache.org/jira/browse/HBASE-4773
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4
>Reporter: gaojinchao
>Assignee: xufeng
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 4773.patch, branches_4773.patch, trunk_4773_patch.patch
>
>
> When master crashs, HBaseAdmin will leaks ZooKeeper connections
> I think we should close the zk connetion when throw MasterNotRunningException
>  public HBaseAdmin(Configuration c)
>   throws MasterNotRunningException, ZooKeeperConnectionException {
> this.conf = HBaseConfiguration.create(c);
> this.connection = HConnectionManager.getConnection(this.conf);
> this.pause = this.conf.getLong("hbase.client.pause", 1000);
> this.numRetries = this.conf.getInt("hbase.client.retries.number", 10);
> this.retryLongerMultiplier = 
> this.conf.getInt("hbase.client.retries.longer.multiplier", 10);
> //we should add this code and close the zk connection
> try{
>   this.connection.getMaster();
> }catch(MasterNotRunningException e){
>   HConnectionManager.deleteConnection(conf, false);
>   throw e;  
> }
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179763#comment-13179763
 ] 

Kannan Muthukkaruppan commented on HBASE-4218:
--

I also like ENCODE_ON_DISK instead of ENCODE_IN_CACHE_ONLY (with the reverse 
semantics).

I would say let's keep the default for ENCODE_ON_DISK to true though. This is 
more a testing knob in early stages-- where someone will set it to false before 
publishing a new data block encoder for general use. By the time end users try 
this, the code should be robust enough, and the Column Family setting of which 
data block encoding to use should be ideally the only knob they need to think 
about.



> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3373) Allow regions to be load-balanced by table

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179767#comment-13179767
 ] 

Lars Hofhansl commented on HBASE-3373:
--

I forgot to +1 it above.

> Allow regions to be load-balanced by table
> --
>
> Key: HBASE-3373
> URL: https://issues.apache.org/jira/browse/HBASE-3373
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.20.6
>Reporter: Ted Yu
> Fix For: 0.94.0
>
> Attachments: 3373.txt, HbaseBalancerTest2.java
>
>
> From our experience, cluster can be well balanced and yet, one table's 
> regions may be badly concentrated on few region servers.
> For example, one table has 839 regions (380 regions at time of table 
> creation) out of which 202 are on one server.
> It would be desirable for load balancer to distribute regions for specified 
> tables evenly across the cluster. Each of such tables has number of regions 
> many times the cluster size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179773#comment-13179773
 ] 

Zhihong Yu commented on HBASE-5041:
---

@Shrijeet:
I am running patch for 0.90 through test suite.

Will integrate if there is no test failure.

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5118) Fix Scan documentation

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179776#comment-13179776
 ] 

Lars Hofhansl commented on HBASE-5118:
--

Just a doc change, going to commit.

> Fix Scan documentation
> --
>
> Key: HBASE-5118
> URL: https://issues.apache.org/jira/browse/HBASE-5118
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Trivial
> Attachments: 5118.txt
>
>
> Current documentation for scan states:
> {code}
> Scan scan = new Scan();
> scan.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("attr"));
> scan.setStartRow( Bytes.toBytes("row"));   // start key is 
> inclusive
> scan.setStopRow( Bytes.toBytes("row" +  new byte[] {0}));  // stop key is 
> exclusive
> for(Result result : htable.getScanner(scan)) {
>   // process Result instance
> }
> {code}
> "row" +  new byte[] {0}  is not correct. That should  "row" + (char)0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179779#comment-13179779
 ] 

Lars Hofhansl commented on HBASE-2947:
--

Any objections?

> MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
> --
>
> Key: HBASE-2947
> URL: https://issues.apache.org/jira/browse/HBASE-2947
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Jonathan Gray
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 2947-v2.txt, HBASE-2947-v1.patch
>
>
> HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
> operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179783#comment-13179783
 ] 

Jimmy Xiang commented on HBASE-5081:


Cool, thanks.  Looks good to me.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179789#comment-13179789
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12509441/0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 77 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/665//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/665//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/665//console

This message is automatically generated.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5109) Fix TestAvroServer so that it waits properly for the modifyTable operation to complete

2012-01-04 Thread Ming Ma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179794#comment-13179794
 ] 

Ming Ma commented on HBASE-5109:


Thanks, Ted. Then, it is mostly taken cared of. Just want to point out two 
small issues in the fix for HBase-4621:

1. The loop won't stop if there is a bug in modifyTable call. In that case, the 
unit test will just hang instead of fail. Adding a timeout might be helpful.
2. The two asserts statement after the while statement are redundant. So we can 
take one out.

> Fix TestAvroServer so that it waits properly for the modifyTable operation to 
> complete
> --
>
> Key: HBASE-5109
> URL: https://issues.apache.org/jira/browse/HBASE-5109
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HBASE-5109-0.92.patch
>
>
> TestAvroServer has the following issue
>  
> impl.modifyTable(tableAname, tableA);
> // It can take a while for the change to take effect. Wait here a while.
> while(impl.describeTable(tableAname) == null ) {
>   Threads.sleep(100);
> }
> assertTrue(impl.describeTable(tableAname).maxFileSize == 123456L);
>  
> impl.describeTable(tableAname) returns the default maxSize 256M right away as 
> modifyTable is async. Before HBASE-4328 is fixed, we can fix the test code to 
> wait for say max of 5 seconds to check if 
> impl.describeTable(tableAname).maxFileSize is uploaded to 123456L. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179797#comment-13179797
 ] 

stack commented on HBASE-5081:
--

These failures seem unrelated to this patch -- they happened on previous 
hadoopqa build for unrelated patch.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179798#comment-13179798
 ] 

stack commented on HBASE-5081:
--

I'll commit this unless objection.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Matt Corgan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179802#comment-13179802
 ] 

Matt Corgan commented on HBASE-4218:


Some food for thought - there is probably more complexity to this down the 
road.  There are always going to be trade-offs between encoding speed, 
compression ratio, scan throughput, and seek latency.  These trade-offs can 
actually be quite huge, like 10x when you start considering things like suffix 
compression.  I can see having different encodings in the same column family 
depending on dynamic performance decisions.  For example, use the most compact 
encoding during major compaction, but use the fastest encoding if memstore 
flushes are backlogged.

We probably can't get it perfect in this first iteration.  Just want to avoid 
shooting ourselves in the foot as much as possible.

> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5109) Fix TestAvroServer so that it waits properly for the modifyTable operation to complete

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179805#comment-13179805
 ] 

stack commented on HBASE-5109:
--

+1 on committing the patch.  Its some nice, if small, cleanup.

> Fix TestAvroServer so that it waits properly for the modifyTable operation to 
> complete
> --
>
> Key: HBASE-5109
> URL: https://issues.apache.org/jira/browse/HBASE-5109
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HBASE-5109-0.92.patch
>
>
> TestAvroServer has the following issue
>  
> impl.modifyTable(tableAname, tableA);
> // It can take a while for the change to take effect. Wait here a while.
> while(impl.describeTable(tableAname) == null ) {
>   Threads.sleep(100);
> }
> assertTrue(impl.describeTable(tableAname).maxFileSize == 123456L);
>  
> impl.describeTable(tableAname) returns the default maxSize 256M right away as 
> modifyTable is async. Before HBASE-4328 is fixed, we can fix the test code to 
> wait for say max of 5 seconds to check if 
> impl.describeTable(tableAname).maxFileSize is uploaded to 123456L. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179806#comment-13179806
 ] 

Zhihong Yu commented on HBASE-5081:
---

{code}
+if (oldtask.status == FAILURE) {
+  // wait for status to change to DELETED
+  try {
+oldtask.wait();
+  } catch (InterruptedException e) {
+LOG.warn("Interrupted when waiting for znode delete callback");
+// fall through to return failure
{code}
Should a loop be introduced wrapping oldtask.wait() ? JVM sometimes produces 
spurious notification.
{code}
+LOG.fatal("Logic error. Deleted task still present in tasks map");
+assert false : "Deleted task still present in tasks map";
{code}
The assertion wouldn't be effective at runtime, right ?
I think throwing exception would be better.


> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5093) wiki update for HBase/Scala

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179809#comment-13179809
 ] 

stack commented on HBASE-5093:
--

I just added you as a wiki contributor Joe.  Let me know if you still can't 
edit.

> wiki update for HBase/Scala
> ---
>
> Key: HBASE-5093
> URL: https://issues.apache.org/jira/browse/HBASE-5093
> Project: HBase
>  Issue Type: Improvement
>Reporter: Joe Stein
>
> I tried to edit the wiki but it says immutable page
> would be helpful/nice for folks to know how to get sbt working with Scala
> the following is what I did to get it working, not sure why could not edit 
> the wiki figure i open a JIRA so someone with access could update this
> {code}
> resolvers += "Apache HBase" at 
> "https://repository.apache.org/content/repositories/releases";
> resolvers += "Thrift" at "http://people.apache.org/~rawson/repo/";
> libraryDependencies ++= Seq(
> "org.apache.hadoop" % "hadoop-core" % "0.20.2",
> "org.apache.hbase" % "hbase" % "0.90.4"
> )
> {code}
> or let me access it and I can do it, np

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179811#comment-13179811
 ] 

Zhihong Yu commented on HBASE-5081:
---

See 
http://stackoverflow.com/questions/1050592/do-spurious-wakeups-actually-happen

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179813#comment-13179813
 ] 

stack commented on HBASE-4218:
--

/me hearts this issue

> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179821#comment-13179821
 ] 

Zhihong Yu commented on HBASE-5081:
---

w.r.t. master retry after the following exception is raised:
{code}
throw new IOException("duplicate log split scheduled for "
+ lf.getPath());
{code}
I see this code in MasterFileSystem.splitLog():
{code}
  try {
splitLogSize = splitLogManager.splitLogDistributed(logDirs);
  } catch (OrphanHLogAfterSplitException e) {
LOG.warn("Retrying distributed splitting for " + serverNames, e);
splitLogManager.splitLogDistributed(logDirs);
  }
{code}
Would the retry be carried out as described ?

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5118) Fix Scan documentation

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179826#comment-13179826
 ] 

stack commented on HBASE-5118:
--

+1

> Fix Scan documentation
> --
>
> Key: HBASE-5118
> URL: https://issues.apache.org/jira/browse/HBASE-5118
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.94.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Trivial
> Attachments: 5118.txt
>
>
> Current documentation for scan states:
> {code}
> Scan scan = new Scan();
> scan.addColumn(Bytes.toBytes("cf"),Bytes.toBytes("attr"));
> scan.setStartRow( Bytes.toBytes("row"));   // start key is 
> inclusive
> scan.setStopRow( Bytes.toBytes("row" +  new byte[] {0}));  // stop key is 
> exclusive
> for(Result result : htable.getScanner(scan)) {
>   // process Result instance
> }
> {code}
> "row" +  new byte[] {0}  is not correct. That should  "row" + (char)0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179825#comment-13179825
 ] 

stack commented on HBASE-2947:
--

Is this safe to do Lars?

{code}
-public class Increment implements Writable {
+public class Increment implements Row {
{code}

Should you leave Writable in place?

Else LGTM

> MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
> --
>
> Key: HBASE-2947
> URL: https://issues.apache.org/jira/browse/HBASE-2947
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Jonathan Gray
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 2947-v2.txt, HBASE-2947-v1.patch
>
>
> HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
> operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179834#comment-13179834
 ] 

stack commented on HBASE-5097:
--

+1 on patch

> RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
> return null can stall the system initialization through NPE
> ---
>
> Key: HBASE-5097
> URL: https://issues.apache.org/jira/browse/HBASE-5097
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch
>
>
> In HRegionServer.java openScanner()
> {code}
>   r.prepareScanner(scan);
>   RegionScanner s = null;
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().preScannerOpen(scan);
>   }
>   if (s == null) {
> s = r.getScanner(scan);
>   }
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().postScannerOpen(scan, s);
>   }
> {code}
> If we dont have implemention for postScannerOpen the RegionScanner is null 
> and so throwing nullpointer 
> {code}
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
> {code}
> Making this defect as blocker.. Pls feel free to change the priority if am 
> wrong.  Also correct me if my way of trying out coprocessors without 
> implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5071) HFile has a possible cast issue.

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179838#comment-13179838
 ] 

stack commented on HBASE-5071:
--

lgtm

> HFile has a possible cast issue.
> 
>
> Key: HBASE-5071
> URL: https://issues.apache.org/jira/browse/HBASE-5071
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.90.0
>Reporter: Harsh J
>  Labels: hfile
>
> HBASE-3040 introduced this line originally in HFile.Reader#loadFileInfo(...):
> {code}
> int allIndexSize = (int)(this.fileSize - this.trailer.dataIndexOffset - 
> FixedFileTrailer.trailerSize());
> {code}
> Which on trunk today, for HFile v1 is:
> {code}
> int sizeToLoadOnOpen = (int) (fileSize - trailer.getLoadOnOpenDataOffset() -
> trailer.getTrailerSize());
> {code}
> This computed (and casted) integer is then used to build an array of the same 
> size. But if fileSize is very large (>> Integer.MAX_VALUE), then there's an 
> easy chance this can go negative at some point and spew out exceptions such 
> as:
> {code}
> java.lang.NegativeArraySizeException 
> at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readAllIndex(HFile.java:805) 
> at org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:832) 
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.loadFileInfo(StoreFile.java:1003)
>  
> at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:382) 
> at 
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:438)
>  
> at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:267) 
> at org.apache.hadoop.hbase.regionserver.Store.(Store.java:209) 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2088)
>  
> at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:358) 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2661) 
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2647) 
> {code}
> Did we accidentally limit single region sizes this way?
> (Unsure about HFile v2's structure so far, so do not know if v2 has the same 
> issue.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179841#comment-13179841
 ] 

Zhihong Yu commented on HBASE-5081:
---

About the test suite, from 
https://builds.apache.org/job/PreCommit-HBASE-Build/665/console:
{code}
Running org.apache.hadoop.hbase.master.TestSplitLogManager
Running org.apache.hadoop.hbase.regionserver.TestSeekOptimizations
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.055 sec
{code}
It is a pity that the hung test was masked by known failures.

MAPREDUCE-3583 seems to have gone into oblivion.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179843#comment-13179843
 ] 

stack commented on HBASE-5088:
--

+1 on last patch posted by Lars.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
> Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
> HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179846#comment-13179846
 ] 

stack commented on HBASE-4397:
--

Nice one Ming.

> -ROOT-, .META. tables stay offline for too long in recovery phase after all 
> RSs are shutdown at the same time
> -
>
> Key: HBASE-4397
> URL: https://issues.apache.org/jira/browse/HBASE-4397
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4397-0.92.patch
>
>
> 1. Shutdown all RSs.
> 2. Bring all RS back online.
> The "-ROOT-", ".META." stay in offline state until timeout monitor force 
> assignment 30 minutes later. That is because HMaster can't find a RS to 
> assign the tables to in assign operation.
> 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
> Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
> trying to assign elsewhere instead; retry=0
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
> at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
> at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
> at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
> at $Proxy9.openRegion(Unknown Source)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2011-09-13 13:25:52,743 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
> location to assign region -ROOT-,,0.70236052
> Possible fixes:
> 1. Have serverManager handle "server online" event similar to how 
> RegionServerTracker.java calls servermanager.expireServer in the case server 
> goes down.
> 2. Make timeoutMonitor handle the situation better. This is a special 
> situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2834) Deferred deletes

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179849#comment-13179849
 ] 

stack commented on HBASE-2834:
--

We can close this issue then?  (After adding a bit of 'how to do deferred 
deletes' to the manual?)

> Deferred deletes
> 
>
> Key: HBASE-2834
> URL: https://issues.apache.org/jira/browse/HBASE-2834
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Purtell
>
> Tangentally mentioned in a blog post, James Hamilton talks about deferred 
> deletes:
> {quote}
> If you have an application error, administrative error, or database 
> implementation bug that losses data, then it is simply gone unless you have 
> an offline copy. This, by the way, is why I'm a big fan of deferred delete.  
> This is a technique where deleted items are marked as deleted but not garbage 
> collected until some days or preferably weeks later.  Deferred delete is not 
> full protection but it has saved my butt more than once and I'm a believer. 
> See On Designing and Deploying Internet-Scale Services 
> (http://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf) for more detail.
> {quote}
> (See 
> http://perspectives.mvdirona.com/2010/04/07/StonebrakerOnCAPTheoremAndDatabases.aspx)
> Because deletes -- at least, after the initial write has been flushed from 
> memstore -- are tombstones, deferred delete in HBase could be supported if 
> somehow tombstones could be invalidated, an undelete operation in effect. 
> This could be accomplished by adding support for tombstones for deletes. 
> Would complicate major compaction but otherwise not touch much. A typical use 
> case might be "resurrect any data deleted from _ts1_ to _ts2_ ", a period of 
> 4 hours when an application error was operative. In this case a new write 
> would be issued to the table that is a tombstone covering any deletes over 
> that period of time. Users would defer major compactions until safe 
> checkpoint periods. 
> Such guarantees could optionally be extended to the memstoe by using 
> tombstones there as well. But it would probably be sufficient to provide 
> guidance that forcing a flush is  necessary to insure edits are persisted in 
> a way that allows for undeletion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread stack (Created) (JIRA)
Upgrade hadoop to 1.0.0
---

 Key: HBASE-5125
 URL: https://issues.apache.org/jira/browse/HBASE-5125
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0
 Attachments: 5125.txt



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5125:
-

Attachment: 5125.txt

> Upgrade hadoop to 1.0.0
> ---
>
> Key: HBASE-5125
> URL: https://issues.apache.org/jira/browse/HBASE-5125
> Project: HBase
>  Issue Type: Task
>Reporter: stack
> Fix For: 0.92.0
>
> Attachments: 5125.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5125:
-

Status: Patch Available  (was: Open)

> Upgrade hadoop to 1.0.0
> ---
>
> Key: HBASE-5125
> URL: https://issues.apache.org/jira/browse/HBASE-5125
> Project: HBase
>  Issue Type: Task
>Reporter: stack
> Fix For: 0.92.0
>
> Attachments: 5125.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5093) wiki update for HBase/Scala

2012-01-04 Thread Joe Stein (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein resolved HBASE-5093.
--

Resolution: Fixed

good to go thanks, wiki updated

> wiki update for HBase/Scala
> ---
>
> Key: HBASE-5093
> URL: https://issues.apache.org/jira/browse/HBASE-5093
> Project: HBase
>  Issue Type: Improvement
>Reporter: Joe Stein
>
> I tried to edit the wiki but it says immutable page
> would be helpful/nice for folks to know how to get sbt working with Scala
> the following is what I did to get it working, not sure why could not edit 
> the wiki figure i open a JIRA so someone with access could update this
> {code}
> resolvers += "Apache HBase" at 
> "https://repository.apache.org/content/repositories/releases";
> resolvers += "Thrift" at "http://people.apache.org/~rawson/repo/";
> libraryDependencies ++= Seq(
> "org.apache.hadoop" % "hadoop-core" % "0.20.2",
> "org.apache.hbase" % "hbase" % "0.90.4"
> )
> {code}
> or let me access it and I can do it, np

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179875#comment-13179875
 ] 

Jean-Daniel Cryans commented on HBASE-5125:
---

+1

> Upgrade hadoop to 1.0.0
> ---
>
> Key: HBASE-5125
> URL: https://issues.apache.org/jira/browse/HBASE-5125
> Project: HBase
>  Issue Type: Task
>Reporter: stack
> Fix For: 0.92.0
>
> Attachments: 5125.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179886#comment-13179886
 ] 

Prakash Khemani commented on HBASE-5081:


...

If createTaskIfAbsent() returns a non-null oldtask then that is treated as
an error by the caller. The caller (splitLogDistributed() will throw an
IOException)

If the caller thread is interrupted while waiting for status to change to
DELETED then createTaskIfAbsent returns errror (I.e. Oldtask) to the
caller. It is better to return with error than to loop. I missed setting
the interrupt flag on the thread ... Will do that in the next diff.

If a task is found in the tasks map even after oldtask's state had changed
to DELETED then the found task is returned as an error. So it is OK even
if asserts are not enabled at run time. (This case can actually happen if
another thread sneaks in and create another task with the same name, but
it shouldn't happen for log-splitting tasks.)



On 1/4/12 11:47 AM, "Zhihong Yu (Commented) (JIRA)" 
wrote:




> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179891#comment-13179891
 ] 

Prakash Khemani commented on HBASE-5081:


The retry logic is in HMaster.splitLogAfterStartup(). I will remove the
OrphanLogException handling from MasterFileSystem.

On 1/4/12 12:03 PM, "Zhihong Yu (Commented) (JIRA)" 




> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179890#comment-13179890
 ] 

Zhihong Yu commented on HBASE-5041:
---

TestReplication failed in test suite run.
When I ran it separately, I got:
{code}
Failed tests: 
  queueFailover(org.apache.hadoop.hbase.replication.TestReplication)
{code}
Looking at the patch for 0.90, it doesn't seem to be related to replication.

@J-D:
Your review/comment on the patch would be helpful.

Thanks

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179894#comment-13179894
 ] 

Prakash Khemani commented on HBASE-5081:


Will look into the test failure. I am not sure I know where to find the
test run's output logs.

On 1/4/12 12:35 PM, "Zhihong Yu (Commented) (JIRA)" 




> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Shrijeet Paliwal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179907#comment-13179907
 ] 

Shrijeet Paliwal commented on HBASE-5041:
-

{code}
 mvn clean compile test -Dtest=TestReplication
{code}

Above passes without error for branch 0.90 in my dev machine. 

-Shrijeet

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5081:
---

Attachment: 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch

set the interrupt flag on InterruptedException. Remove OrphanLogException 
handling for distributed log splitting.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179912#comment-13179912
 ] 

Zhihong Yu commented on HBASE-5081:
---

bq. If the caller thread is interrupted while waiting for status to change to 
DELETED ...
My comment @ 04/Jan/12 19:46 was about oldtask.wait() being awakened due to 
spurious wakeup from JVM.

You can use the following to mimic the environment of hadoopx build machines 
when you run unit tests:
{code}
--enableassertions -Xmx1900m 
-Djava.security.egd=file:/dev/./urandom
+-d32 -enableassertions -Xmx1900m 
-Djava.security.egd=file:/dev/./urandom
{code}

Looking at 
https://builds.apache.org/job/PreCommit-HBASE-Build/665/testReport/org.apache.hadoop.hbase.master/TestSplitLogManager/,
 it seems all the tests passed.

Maybe N Keywal would have some idea about this.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5041) Major compaction on non existing table does not throw error

2012-01-04 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179922#comment-13179922
 ] 

Zhihong Yu commented on HBASE-5041:
---

Here is the OS I used:
{code}
Linux A 2.6.38-11-generic #48-Ubuntu SMP Fri Jul 29 19:02:55 UTC 2011 x86_64 
x86_64 x86_64 GNU/Linux
{code}

I will integrate the patches tomorrow if there is no objection.

> Major compaction on non existing table does not throw error 
> 
>
> Key: HBASE-5041
> URL: https://issues.apache.org/jira/browse/HBASE-5041
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver, shell
>Affects Versions: 0.90.3
>Reporter: Shrijeet Paliwal
>Assignee: Shrijeet Paliwal
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0002-HBASE-5041-Throw-error-if-table-does-not-exist.patch, 
> 0003-HBASE-5041-Throw-error-if-table-does-not-exist.0.90.patch
>
>
> Following will not complain even if fubar does not exist
> {code}
> echo "major_compact 'fubar'" | $HBASE_HOME/bin/hbase shell
> {code}
> The downside for this defect is that major compaction may be skipped due to
> a typo by Ops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Prakash Khemani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179926#comment-13179926
 ] 

Prakash Khemani commented on HBASE-5081:


If there is a spurious wakeup before the status has changed to DELETE then
the code will return error (oldtask) to the caller.

Regarding the hung TestSplitLogManager test in
https://builds.apache.org/job/PreCommit-HBASE-Build/665/console - I
couldn't find what failed or what hung.
https://builds.apache.org/job/PreCommit-HBASE-Build/665//testReport/org.apa
che.hadoop.hbase.master/TestSplitLogManager/ shows that everything passed.





> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179931#comment-13179931
 ] 

Hadoop QA commented on HBASE-5125:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509452/5125.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 77 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/667//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/667//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/667//console

This message is automatically generated.

> Upgrade hadoop to 1.0.0
> ---
>
> Key: HBASE-5125
> URL: https://issues.apache.org/jira/browse/HBASE-5125
> Project: HBase
>  Issue Type: Task
>Reporter: stack
> Fix For: 0.92.0
>
> Attachments: 5125.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179933#comment-13179933
 ] 

Lars Hofhansl commented on HBASE-2947:
--

Good point... I just checked, and Row extends WritableComparable, which in turn 
extends Writable... So should be good.
Maybe than Append can be simplified to only implement Row and not also Writable.

> MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
> --
>
> Key: HBASE-2947
> URL: https://issues.apache.org/jira/browse/HBASE-2947
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Jonathan Gray
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 2947-v2.txt, HBASE-2947-v1.patch
>
>
> HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
> operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179935#comment-13179935
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12509461/0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 77 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/668//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/668//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/668//console

This message is automatically generated.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179938#comment-13179938
 ] 

stack commented on HBASE-5120:
--

When would it ever make sense that TimeoutMonitor ask a regionserver RE-close a 
region?  It implies that somehow the regionserver 'forgot' to do the close or 
that it dropped the message.  If either happened, we have bigger problems and 
having the TM do a new close will only make it worse (If a RS 'forgot' or 
'failed' a close, its probably on its way out and its ephemeral node is about 
to vanish and master shutdown handler will do the close fixups; if it does not 
work this way, we should make it so?).

TM might make some sense for region openings but even here, I'd think the above 
applies; i.e. if a RS can't open a region, it should kill itself. If the region 
is bad, its unlikely any RS will open it unless the error transient.  For this 
latter case the TM running every 30 minutes, while conservative, is probably 
fine for a default.

I think we should leave the TM at its current 30 minute timeout and undo this 
issue as critical against 0.92.

There are outstanding race conditions in hbase around shutdown handler and 
master operations that deserve more attention that this IMO.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f

[jira] [Commented] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179940#comment-13179940
 ] 

stack commented on HBASE-5120:
--

On the patch, Ram, should we even try unassign if server in this.regions is 
null; i.e. we should fail way earlier?

IMO, this AM needs a rewrite if only to make its various transitions testable.  
There are also too many methods doing similar things with lots of overlap; its 
really hard to figure whats going on.

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f

[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179945#comment-13179945
 ] 

Mikhail Bautin commented on HBASE-4218:
---

I think that with an 8K line patch we probably should not try to put more 
complexity into the first version of delta encoding. We can always make things 
more complicated later. I like the two-parameter setup: DATA_BLOCK_ENCODING 
sets the encoding type (on-disk and in-cache by default) and ENCODE_ON_DISK 
(true by default) allows to use in-cache-only encoding (when explicitly setting 
ENCODE_ON_DISK=false) and get the benefit of encoding in cache even before we 
are 100% sure that our encoding algorithms and encoded scanners are stable. If 
everyone agrees with that, I will finish the patch by (1) adding a unit test 
for switching data block encoding column family settings; (2) including 
encoding type in the cache key; and (3) simplifying the HFileDataBlockEncoder 
interface, since we assume that the "in-memory format" (used by scanners) is 
always the same as the in-cache format and don't need methods such as 
afterReadFromDiskAndPuttingInCache anymore.

> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179944#comment-13179944
 ] 

stack commented on HBASE-5081:
--

I'm running tests locally.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-04 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5125:
-

Resolution: Fixed
  Assignee: stack
Status: Resolved  (was: Patch Available)

Committed 0.92 and trunk

> Upgrade hadoop to 1.0.0
> ---
>
> Key: HBASE-5125
> URL: https://issues.apache.org/jira/browse/HBASE-5125
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 5125.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Matt Corgan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179955#comment-13179955
 ] 

Matt Corgan commented on HBASE-4218:


{quote}I think that with an 8K line patch we probably should not try to put 
more complexity into the first version of delta encoding.{quote}Yes, totally 
agreeing here.  It is a work in progress, and so these settings in this patch 
don't have to make perfect sense.  I like the latest 
DATA_BLOCK_ENCODING=NONE(?) and ENCODE_ON_DISK=true defaults.

All other comments look sensible.  Have you covered the case where you have 
encoded blocks in the block cache and are compacting to an unencoded hfile?  
You will want to make sure that you are using (not ignoring) the cached blocks.

> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179960#comment-13179960
 ] 

Mikhail Bautin commented on HBASE-4218:
---

Re-reading my previous post, I want to make an addition: we still use cached 
encoded blocks when compacting a fully-encoded column family.

> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179959#comment-13179959
 ] 

Mikhail Bautin commented on HBASE-4218:
---

Actually, I think it is OK to ignore cached encoded blocks on compaction. We 
can get encoded blocks in cache and have a compaction write an unencoded file 
in two cases:
* Encoding is turned on in cache only. In that case we don't want to use 
encoded blocks during compaction at all, because the in-cache-only mode implies 
that we don't trust our encoding algorithms 100% and want to guard against 
possible persistent data corruption.
* Encoding was turned on (either in cache only or everywhere) and it was turned 
off entirely. Since this is not a very frequent case, I think we could probably 
optimize this after the patch is stabilized.


> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-2947:
-

Attachment: 2947-final.txt

Patch that I committed (based on Stack's comment Append does not actually need 
to implement Writable, since Row already extends Writable).

> MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
> --
>
> Key: HBASE-2947
> URL: https://issues.apache.org/jira/browse/HBASE-2947
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Jonathan Gray
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 2947-final.txt, 2947-v2.txt, HBASE-2947-v1.patch
>
>
> HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
> operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2947) MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)

2012-01-04 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-2947:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks for the review Stack!

> MultiIncrement/MultiAppend (MultiGet functionality for increments and appends)
> --
>
> Key: HBASE-2947
> URL: https://issues.apache.org/jira/browse/HBASE-2947
> Project: HBase
>  Issue Type: New Feature
>  Components: client, regionserver
>Reporter: Jonathan Gray
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: 2947-final.txt, 2947-v2.txt, HBASE-2947-v1.patch
>
>
> HBASE-1845 introduced MultiGet and other cross-row/cross-region batch 
> operations.  We should add a way to do that with increments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-04 Thread Matt Corgan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179973#comment-13179973
 ] 

Matt Corgan commented on HBASE-4218:


{quote}I think it is OK to ignore cached encoded blocks on compaction{quote}The 
circumstance i was worried about is if you are doing many small flushes and 
minor compactions.  The blocks to be compacted could mostly be in cache, and 
you would be ignoring them all.  I guess it doesn't matter if it's just for 
testing, but might give a false impression of performance.

> Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
> ---
>
> Key: HBASE-4218
> URL: https://issues.apache.org/jira/browse/HBASE-4218
> Project: HBase
>  Issue Type: Improvement
>  Components: io
>Affects Versions: 0.94.0
>Reporter: Jacek Migdal
>Assignee: Mikhail Bautin
>  Labels: compression
> Fix For: 0.94.0
>
> Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
> 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
> D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
> D447.15.patch, D447.16.patch, D447.17.patch, D447.2.patch, D447.3.patch, 
> D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
> D447.9.patch, Data-block-encoding-2011-12-23.patch, 
> Delta-encoding.patch-2011-12-22_11_52_07.patch, 
> Delta_encoding_with_memstore_TS.patch, open-source.diff
>
>
> A compression for keys. Keys are sorted in HFile and they are usually very 
> similar. Because of that, it is possible to design better compression than 
> general purpose algorithms,
> It is an additional step designed to be used in memory. It aims to save 
> memory in cache as well as speeding seeks within HFileBlocks. It should 
> improve performance a lot, if key lengths are larger than value lengths. For 
> example, it makes a lot of sense to use it when value is a counter.
> Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
> shows that I could achieve decent level of compression:
>  key compression ratio: 92%
>  total compression ratio: 85%
>  LZO on the same data: 85%
>  LZO after delta encoding: 91%
> While having much better performance (20-80% faster decompression ratio than 
> LZO). Moreover, it should allow far more efficient seeking which should 
> improve performance a bit.
> It seems that a simple compression algorithms are good enough. Most of the 
> savings are due to prefix compression, int128 encoding, timestamp diffs and 
> bitfields to avoid duplication. That way, comparisons of compressed data can 
> be much faster than a byte comparator (thanks to prefix compression and 
> bitfields).
> In order to implement it in HBase two important changes in design will be 
> needed:
> -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
> and iterating; access to uncompressed buffer in HFileBlock will have bad 
> performance
> -extend comparators to support comparison assuming that N first bytes are 
> equal (or some fields are equal)
> Link to a discussion about something similar:
> http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windows&subj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Kannan Muthukkaruppan (Created) (JIRA)
AND (USING FilterList) of two ColumnPrefixFilters broken


 Key: HBASE-5126
 URL: https://issues.apache.org/jira/browse/HBASE-5126
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan


[Notice this in 89 branch. Possibly an issue in trunk also.]

A test which does a columnPrefixFilter("tag0") AND columnPrefixFilter("tag1") 
should return 0 kvs; instead it returns kvs with prefix "tag0".

{code}
table = HTable.new(conf, tableName)

put = Put.new(Bytes.toBytes("row"))
put.add(cf1name, Bytes.toBytes("tag0"), Bytes.toBytes("value0"))
put.add(cf1name, Bytes.toBytes("tag1"), Bytes.toBytes("value1"))
put.add(cf1name, Bytes.toBytes("tag2"), Bytes.toBytes("value2"))

table.put(put)

# Test for AND Two Column Prefix Filters

   
filter1 = ColumnPrefixFilter.new(Bytes.toBytes("tag0"));
filter2 = ColumnPrefixFilter.new(Bytes.toBytes("tag2"));

filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
filters.addFilter(filter1);
filters.addFilter(filter1);

get = Get.new(Bytes.toBytes("row"))
get.setFilter(filters)
get.setMaxVersions();
keyValues = table.get(get).raw()

keyValues.each do |keyValue|
  puts "Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
Timestamp=#{keyValue.getTimestamp()}" 
end
{code}

outputs:

{code}
Key=tag0; Value=value0; Timestamp=1325719223523
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Kannan Muthukkaruppan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-5126:
-

Attachment: testAndTwoPrefixFilters.rb

full test attached.

> AND (USING FilterList) of two ColumnPrefixFilters broken
> 
>
> Key: HBASE-5126
> URL: https://issues.apache.org/jira/browse/HBASE-5126
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
> Attachments: testAndTwoPrefixFilters.rb
>
>
> [Notice this in 89 branch. Possibly an issue in trunk also.]
> A test which does a columnPrefixFilter("tag0") AND columnPrefixFilter("tag1") 
> should return 0 kvs; instead it returns kvs with prefix "tag0".
> {code}
> table = HTable.new(conf, tableName)
> put = Put.new(Bytes.toBytes("row"))
> put.add(cf1name, Bytes.toBytes("tag0"), Bytes.toBytes("value0"))
> put.add(cf1name, Bytes.toBytes("tag1"), Bytes.toBytes("value1"))
> put.add(cf1name, Bytes.toBytes("tag2"), Bytes.toBytes("value2"))
> table.put(put)
> # Test for AND Two Column Prefix Filters  
>   
>
> filter1 = ColumnPrefixFilter.new(Bytes.toBytes("tag0"));
> filter2 = ColumnPrefixFilter.new(Bytes.toBytes("tag2"));
> filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter1);
> get = Get.new(Bytes.toBytes("row"))
> get.setFilter(filters)
> get.setMaxVersions();
> keyValues = table.get(get).raw()
> keyValues.each do |keyValue|
>   puts "Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
> Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
> Timestamp=#{keyValue.getTimestamp()}" 
> end
> {code}
> outputs:
> {code}
> Key=tag0; Value=value0; Timestamp=1325719223523
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5031) [89-fb] Remove hard-coded non-existent host name from TestScanner

2012-01-04 Thread Nicolas Spiegelberg (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg resolved HBASE-5031.


  Resolution: Fixed
Release Note: Fixed in 0.89-fb.  Not necessary in trunk

> [89-fb] Remove hard-coded non-existent host name from TestScanner 
> --
>
> Key: HBASE-5031
> URL: https://issues.apache.org/jira/browse/HBASE-5031
> Project: HBase
>  Issue Type: Bug
>Reporter: Mikhail Bautin
>Priority: Minor
> Attachments: D867.1.patch
>
>
> TestScanner is failing on 0.89-fb because it has a hard-coded fake host name 
> that it is trying to look up. Replacing this with 127.0.0.1: 
> instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Madhuwanti Vaidya (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179991#comment-13179991
 ] 

Madhuwanti Vaidya commented on HBASE-5126:
--

Kannan: You added filter1 twice.

> AND (USING FilterList) of two ColumnPrefixFilters broken
> 
>
> Key: HBASE-5126
> URL: https://issues.apache.org/jira/browse/HBASE-5126
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
> Attachments: testAndTwoPrefixFilters.rb
>
>
> [Notice this in 89 branch. Possibly an issue in trunk also.]
> A test which does a columnPrefixFilter("tag0") AND columnPrefixFilter("tag1") 
> should return 0 kvs; instead it returns kvs with prefix "tag0".
> {code}
> table = HTable.new(conf, tableName)
> put = Put.new(Bytes.toBytes("row"))
> put.add(cf1name, Bytes.toBytes("tag0"), Bytes.toBytes("value0"))
> put.add(cf1name, Bytes.toBytes("tag1"), Bytes.toBytes("value1"))
> put.add(cf1name, Bytes.toBytes("tag2"), Bytes.toBytes("value2"))
> table.put(put)
> # Test for AND Two Column Prefix Filters  
>   
>
> filter1 = ColumnPrefixFilter.new(Bytes.toBytes("tag0"));
> filter2 = ColumnPrefixFilter.new(Bytes.toBytes("tag2"));
> filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter1);
> get = Get.new(Bytes.toBytes("row"))
> get.setFilter(filters)
> get.setMaxVersions();
> keyValues = table.get(get).raw()
> keyValues.each do |keyValue|
>   puts "Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
> Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
> Timestamp=#{keyValue.getTimestamp()}" 
> end
> {code}
> outputs:
> {code}
> Key=tag0; Value=value0; Timestamp=1325719223523
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5124) Backport LoadTestTool to 0.92

2012-01-04 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5124:
--

Attachment: hbase-5124.txt

The patch, v1

> Backport LoadTestTool to 0.92
> -
>
> Key: HBASE-5124
> URL: https://issues.apache.org/jira/browse/HBASE-5124
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
> Fix For: 0.92.0
>
> Attachments: hbase-5124.txt
>
>
> LoadTestTool is very useful.
> This JIRA backports LoadTestTool to 0.92 so that users don't have to build 
> TRUNK in order to use it against 0.92 cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5124) Backport LoadTestTool to 0.92

2012-01-04 Thread Zhihong Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5124:
-

Assignee: Zhihong Yu

> Backport LoadTestTool to 0.92
> -
>
> Key: HBASE-5124
> URL: https://issues.apache.org/jira/browse/HBASE-5124
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Fix For: 0.92.0
>
> Attachments: hbase-5124.txt
>
>
> LoadTestTool is very useful.
> This JIRA backports LoadTestTool to 0.92 so that users don't have to build 
> TRUNK in order to use it against 0.92 cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1318#comment-1318
 ] 

stack commented on HBASE-5081:
--

All tests but TestClassLoading (unrelated) pass for me locally:

{code}
...
Results :

Failed tests:   
testClassLoadingFromHDFS(org.apache.hadoop.hbase.coprocessor.TestClassLoading): 
Class TestCP1 was missing on a region

Tests in error: 
  
testEnableTableRoundRobinAssignment(org.apache.hadoop.hbase.client.TestAdmin): 
org.apache.hadoop.hbase.TableNotEnabledException: testEnableTableAssignment

Tests run: 792, Failures: 1, Errors: 1, Skipped: 10

[INFO] 
[ERROR] BUILD FAILURE
[INFO] 
[INFO] There are test failures.
{code}

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-04 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180002#comment-13180002
 ] 

stack commented on HBASE-5081:
--

You down w/ my committing Ted?

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5126) AND (USING FilterList) of two ColumnPrefixFilters broken

2012-01-04 Thread Amitanand Aiyer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180008#comment-13180008
 ] 

Amitanand Aiyer commented on HBASE-5126:


Fixed the bug in the test. 

But the issue still persists. There are rows that get printed.

> AND (USING FilterList) of two ColumnPrefixFilters broken
> 
>
> Key: HBASE-5126
> URL: https://issues.apache.org/jira/browse/HBASE-5126
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
> Attachments: testAndTwoPrefixFilters.rb
>
>
> [Notice this in 89 branch. Possibly an issue in trunk also.]
> A test which does a columnPrefixFilter("tag0") AND columnPrefixFilter("tag1") 
> should return 0 kvs; instead it returns kvs with prefix "tag0".
> {code}
> table = HTable.new(conf, tableName)
> put = Put.new(Bytes.toBytes("row"))
> put.add(cf1name, Bytes.toBytes("tag0"), Bytes.toBytes("value0"))
> put.add(cf1name, Bytes.toBytes("tag1"), Bytes.toBytes("value1"))
> put.add(cf1name, Bytes.toBytes("tag2"), Bytes.toBytes("value2"))
> table.put(put)
> # Test for AND Two Column Prefix Filters  
>   
>
> filter1 = ColumnPrefixFilter.new(Bytes.toBytes("tag0"));
> filter2 = ColumnPrefixFilter.new(Bytes.toBytes("tag2"));
> filters = FilterList.new(FilterList::Operator::MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter1);
> get = Get.new(Bytes.toBytes("row"))
> get.setFilter(filters)
> get.setMaxVersions();
> keyValues = table.get(get).raw()
> keyValues.each do |keyValue|
>   puts "Key=#{Bytes.toStringBinary(keyValue.getQualifier())}; 
> Value=#{Bytes.toStringBinary(keyValue.getValue())}; 
> Timestamp=#{keyValue.getTimestamp()}" 
> end
> {code}
> outputs:
> {code}
> Key=tag0; Value=value0; Timestamp=1325719223523
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >