date:20120601


 [ 
https://issues.apache.org/jira/browse/HBASE-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6142:
-

  Tags: noob  (was: noo)
Labels: noob  (was: )

Tagging as noob.  Would be grand if an angel from heaven would write little 
unit tests to verify which way the filter blows and fix the javadoc accordingly 
(Lars George, if we read that book of yours, would it answer the questions Joep 
raises?  IIRC, its good on filters?)

> Javadoc in some Filters ambiguous
> -
>
> Key: HBASE-6142
> URL: https://issues.apache.org/jira/browse/HBASE-6142
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Joep Rottinghuis
>Priority: Minor
>  Labels: noob
>
> The javadoc on some of the filter is somewhat confusing.
> The main Filter interface has methods that behave like a sieve; when 
> filterRowKey returns true, that means that the row is filtered _out_ (not 
> included).
> Many of the Filter implementations work the other way around. When the 
> condition is met the value passes (ie, the row is returned).
> Most Filters make it clear when a values passes (passing through the filter 
> meaning the values are returned from the scan).
> Some are less clear in light of how the Filter interface works: 
> WhileMatchFilter and SingleColumnValueFilter are examples.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287859#comment-13287859
 ] 

Hudson commented on HBASE-5936:
---

Integrated in HBase-TRUNK #2974 (See 
[https://builds.apache.org/job/HBase-TRUNK/2974/])
HBASE-5936 Addendum adds changes for TestHMasterRPCException that were 
missed in previous checkin (Revision 1345441)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestHMasterRPCException.java


> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: 5936-addendum-v2.txt, HBASE-5936-v3.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v4.patch, HBASE-5936-v5.patch, 
> HBASE-5936-v6.patch, HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6142) Javadoc in some Filters ambiguous


 [ 
https://issues.apache.org/jira/browse/HBASE-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6142:
-

Tags: noo

> Javadoc in some Filters ambiguous
> -
>
> Key: HBASE-6142
> URL: https://issues.apache.org/jira/browse/HBASE-6142
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Joep Rottinghuis
>Priority: Minor
>
> The javadoc on some of the filter is somewhat confusing.
> The main Filter interface has methods that behave like a sieve; when 
> filterRowKey returns true, that means that the row is filtered _out_ (not 
> included).
> Many of the Filter implementations work the other way around. When the 
> condition is met the value passes (ie, the row is returned).
> Most Filters make it clear when a values passes (passing through the filter 
> meaning the values are returned from the scan).
> Some are less clear in light of how the Filter interface works: 
> WhileMatchFilter and SingleColumnValueFilter are examples.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6145) Fix site target post modularization


[ 
https://issues.apache.org/jira/browse/HBASE-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287857#comment-13287857
 ] 

stack commented on HBASE-6145:
--

The above failure seems unrelated (and fixed since by Ted addendum I think).

Jesse, what you think of this patch?  I just messed more w/ it and I think its 
good enough to commit.  Adds site and assembly (if I undo an assembly tgz, 
inside in it, I can build another assembly that runs, etc., so all source etc. 
and docs are present).  What you think?  You ok on commit?

> Fix site target post modularization
> ---
>
> Key: HBASE-6145
> URL: https://issues.apache.org/jira/browse/HBASE-6145
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: site.txt, site2.txt, sitev3.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287856#comment-13287856
 ] 

stack commented on HBASE-5936:
--

Thanks Ted.

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: 5936-addendum-v2.txt, HBASE-5936-v3.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v4.patch, HBASE-5936-v5.patch, 
> HBASE-5936-v6.patch, HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287855#comment-13287855
 ] 

Zhihong Yu commented on HBASE-5936:
---

Addendum v2 integrated to trunk.

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: 5936-addendum-v2.txt, HBASE-5936-v3.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v4.patch, HBASE-5936-v5.patch, 
> HBASE-5936-v6.patch, HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


 [ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5936:
--

Attachment: 5936-addendum-v2.txt

The exception came out of HBaseRPC.getProxy() call.
Addendum v2 passes TestHMasterRPCException.

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: 5936-addendum-v2.txt, HBASE-5936-v3.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v4.patch, HBASE-5936-v5.patch, 
> HBASE-5936-v6.patch, HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


 [ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5936:
--

Attachment: (was: 5936-addendum.txt)

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
> HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


 [ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5936:
--

Attachment: 5936-addendum.txt

Looks like the snippet from patch v6 for TestHMasterRPCException wasn't applied 
to trunk.
Addendum attached.

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: 5936-addendum.txt, HBASE-5936-v3.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v4.patch, HBASE-5936-v5.patch, 
> HBASE-5936-v6.patch, HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287852#comment-13287852
 ] 

stack commented on HBASE-5936:
--

That seems like easy enough to work around.  Any chance of your taking a look 
Gregory?  Separate issue?   Thanks boss.

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
> HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


 [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6060:
--

Attachment: 6060-94-v4.patch

Patch v4 addresses Rajesh's comment and some of my own comments.

TestAssignmentManager passes.

Running test suite.

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: 6060-94-v3.patch, 6060-94-v4.patch, HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6152) Split abort is not handled properly

2012-06-01 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287838#comment-13287838
 ] 

Enis Soztutar commented on HBASE-6152:
--

I think the problem is that the master offlines the region at step 3, however, 
the parent region is recovered, and onlined by RS. So all other region 
transitions fail for the master. 

> Split abort is not handled properly
> ---
>
> Key: HBASE-6152
> URL: https://issues.apache.org/jira/browse/HBASE-6152
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>
> I ran into this:
> 1. RegionServer started to split a region(R), but the split was taking a long 
> time, and hence the split was aborted
> 2. As part of cleanup, the RS deleted the ZK node that it created initially 
> for R
> 3. The master (AssignmentManager) noticed the node deletion, and made R 
> offline
> 4. The RS recovered from the failure, and at some point of time, tried to do 
> the split again.
> 5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error 
> like - "Received SPLIT for region R from server RS but it doesn't exist 
> anymore,.."
> 6. The RS apparently did the split successfully this time, but is stuck on 
> the master to delete the znode for the region. It kept on saying - 
> "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
> master to process the split for R" and it was stuck there forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread rajeshbabu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287831#comment-13287831
 ] 

rajeshbabu commented on HBASE-6060:
---

@Ted 
in v3 patch small change.
{code}
  RegionPlan plan = getRegionPlan(state, forceNewPlan);
  if (plan == null) {
LOG.debug("Unable to determine a plan to assign " + state);
this.timeoutMonitor.setAllRegionServersOffline(true);
return; // Should get reassigned later when RIT times out.
  }

{code}
in this place also instead of null check, we need to give
{code}
plan == RegionPlan.NO_SERVERS_TO_ASSIGN
{code}

Thanks.

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: 6060-94-v3.patch, HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-06-01 Thread Matt Corgan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287829#comment-13287829
 ] 

Matt Corgan commented on HBASE-4676:


A little more detail... it would help where you have long qualifiers but only a 
few of them, like if you migrated a narrow relational db table over.  If you 
have wide rows with long qualifiers, then you would want to take advantage of 
the qualifier trie.  Not sure which case you have, but migrating relational 
style tables over will be pretty common so i wanted to handle that common case 
well.

If you get a chance to do a random read benchmark on it I'd love to hear the 
results.  I've only done a few small benchmarks at the Store level and haven't 
benchmarked a whole cluster.

> Prefix Compression - Trie data block encoding
> -
>
> Key: HBASE-4676
> URL: https://issues.apache.org/jira/browse/HBASE-4676
> Project: HBase
>  Issue Type: New Feature
>  Components: io, performance, regionserver
>Affects Versions: 0.90.6
>Reporter: Matt Corgan
>Assignee: Matt Corgan
> Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf, 
> PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png, 
> hbase-prefix-trie-0.1.jar
>
>
> The HBase data block format has room for 2 significant improvements for 
> applications that have high block cache hit ratios.  
> First, there is no prefix compression, and the current KeyValue format is 
> somewhat metadata heavy, so there can be tremendous memory bloat for many 
> common data layouts, specifically those with long keys and short values.
> Second, there is no random access to KeyValues inside data blocks.  This 
> means that every time you double the datablock size, average seek time (or 
> average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
> size is ~10x slower for random seeks than a 4KB block size, but block sizes 
> as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
> or more may be more efficient from a disk access and block-cache perspective 
> in many big-data applications, but doing so is infeasible from a random seek 
> perspective.
> The PrefixTrie block encoding format attempts to solve both of these 
> problems.  Some features:
> * trie format for row key encoding completely eliminates duplicate row keys 
> and encodes similar row keys into a standard trie structure which also saves 
> a lot of space
> * the column family is currently stored once at the beginning of each block.  
> this could easily be modified to allow multiple family names per block
> * all qualifiers in the block are stored in their own trie format which 
> caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
>  the size of this trie determines the width of the block's qualifier 
> fixed-width-int
> * the minimum timestamp is stored at the beginning of the block, and deltas 
> are calculated from that.  the maximum delta determines the width of the 
> block's timestamp fixed-width-int
> The block is structured with metadata at the beginning, then a section for 
> the row trie, then the column trie, then the timestamp deltas, and then then 
> all the values.  Most work is done in the row trie, where every leaf node 
> (corresponding to a row) contains a list of offsets/references corresponding 
> to the cells in that row.  Each cell is fixed-width to enable binary 
> searching and is represented by [1 byte operationType, X bytes qualifier 
> offset, X bytes timestamp delta offset].
> If all operation types are the same for a block, there will be zero per-cell 
> overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
> So, the compression aspect is very strong, but makes a few small sacrifices 
> on VarInt size to enable faster binary searches in trie fan-out nodes.
> A more compressed but slower version might build on this by also applying 
> further (suffix, etc) compression on the trie nodes at the cost of slower 
> write speed.  Even further compression could be obtained by using all VInts 
> instead of FInts with a sacrifice on random seek speed (though not huge).
> One current drawback is the current write speed.  While programmed with good 
> constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
> programmed with the same level of optimization as the read path.  Work will 
> need to be done to optimize the data structures used for encoding and could 
> probably show a 10x increase.  It will still be slower than delta encoding, 
> but with a much higher decode speed.  I have not yet created a thorough 
> benchmark for write speed nor sequential read speed.
> Though the trie is reaching a point where it is internally very efficient 
> (probably within half or a quart

[jira] [Created] (HBASE-6153) RS aborted due to rename problem (maybe a race)

2012-06-01 Thread Devaraj Das (JIRA)

Devaraj Das created HBASE-6153:
--

 Summary: RS aborted due to rename problem (maybe a race)
 Key: HBASE-6153
 URL: https://issues.apache.org/jira/browse/HBASE-6153
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Devaraj Das
Assignee: Devaraj Das


I had a RS crash with the following:

2012-05-31 18:34:42,534 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
Renaming flushed file at 
hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
 to 
hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
2012-05-31 18:34:42,536 WARN org.apache.hadoop.hbase.regionserver.Store: Unable 
to rename 
hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
 to 
hdfs://ip-10-140-14-134.ec2.internal:8020/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
2012-05-31 18:34:42,541 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
ip-10-68-7-146.ec2.internal,60020,1338343120038: Replay of HLog required. 
Forcing server shutdown
org.apache.hadoop.hbase.DroppedSnapshotException: region: 
TestLoadAndVerify_1338488017181,\x15\xD9\x01\x00\x00\x00\x00\x00/87_0,1338491364569.8974506aa04c5a04e5cc23c11de0039d.
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1288)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1172)
at 
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1114)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:400)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:374)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: File does not exist: 
/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1901)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1892)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:636)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:387)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.(StoreFile.java:1008)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:470)
at 
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:548)
at 
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:595)


On the NameNode logs:
2012-05-31 18:34:42,588 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
FSDirectory.unprotectedRenameTo: failed to rename 
/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/.tmp/294a7a31f04949b8bf07682a43157b35
 to 
/apps/hbase/data/TestLoadAndVerify_1338488017181/8974506aa04c5a04e5cc23c11de0039d/f1/294a7a31f04949b8bf07682a43157b35
 because destination's parent does not exist


I haven't looked deeply yet but I guess it is a race of some sort.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6152) Split abort is not handled properly

2012-06-01 Thread Devaraj Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6152:
---

Description: 
I ran into this:
1. RegionServer started to split a region(R), but the split was taking a long 
time, and hence the split was aborted
2. As part of cleanup, the RS deleted the ZK node that it created initially for 
R
3. The master (AssignmentManager) noticed the node deletion, and made R offline
4. The RS recovered from the failure, and at some point of time, tried to do 
the split again.
5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error like 
- "Received SPLIT for region R from server RS but it doesn't exist anymore,.."
6. The RS apparently did the split successfully this time, but is stuck on the 
master to delete the znode for the region. It kept on saying - 
"org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
master to process the split for R" and it was stuck there forever. 

  was:
I ran into this:
1. RegionServer started to split a region(R), but the split was taking a long 
time, and hence the split was aborted
2. As part of cleanup, the RS deleted the ZK node that it created initially for 
R
3. The master (AssignmentManager) noticed the node deletion, and made R offline
4. The RS recovered from the failure, and at some point of time, tried to do 
the split again.
5. The master got an event RS_ZK_REGION_SPLITTING but the server gave an error 
like - "Received SPLIT for region R from server RS but it doesn't exist 
anymore,.."
6. The RS apparently did the split successfully this time, but is stuck on the 
master to delete the znode for the region. It kept on saying - 
"org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
master to process the split for R" and it was stuck there forever. 


> Split abort is not handled properly
> ---
>
> Key: HBASE-6152
> URL: https://issues.apache.org/jira/browse/HBASE-6152
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>
> I ran into this:
> 1. RegionServer started to split a region(R), but the split was taking a long 
> time, and hence the split was aborted
> 2. As part of cleanup, the RS deleted the ZK node that it created initially 
> for R
> 3. The master (AssignmentManager) noticed the node deletion, and made R 
> offline
> 4. The RS recovered from the failure, and at some point of time, tried to do 
> the split again.
> 5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error 
> like - "Received SPLIT for region R from server RS but it doesn't exist 
> anymore,.."
> 6. The RS apparently did the split successfully this time, but is stuck on 
> the master to delete the znode for the region. It kept on saying - 
> "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
> master to process the split for R" and it was stuck there forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6145) Fix site target post modularization


[ 
https://issues.apache.org/jira/browse/HBASE-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287815#comment-13287815
 ] 

Hadoop QA commented on HBASE-6145:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530620/sitev3.txt
  against trunk revision .

-1 @author.  The patch appears to contain 2 @author tags which the Hadoop 
community has agreed to not allow in code contributions.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestHMasterRPCException

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2087//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2087//console

This message is automatically generated.

> Fix site target post modularization
> ---
>
> Key: HBASE-6145
> URL: https://issues.apache.org/jira/browse/HBASE-6145
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: site.txt, site2.txt, sitev3.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6152) Split abort is not handled properly

2012-06-01 Thread Devaraj Das (JIRA)

Devaraj Das created HBASE-6152:
--

 Summary: Split abort is not handled properly
 Key: HBASE-6152
 URL: https://issues.apache.org/jira/browse/HBASE-6152
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Devaraj Das
Assignee: Devaraj Das


I ran into this:
1. RegionServer started to split a region(R), but the split was taking a long 
time, and hence the split was aborted
2. As part of cleanup, the RS deleted the ZK node that it created initially for 
R
3. The master (AssignmentManager) noticed the node deletion, and made R offline
4. The RS recovered from the failure, and at some point of time, tried to do 
the split again.
5. The master got an event RS_ZK_REGION_SPLITTING but the server gave an error 
like - "Received SPLIT for region R from server RS but it doesn't exist 
anymore,.."
6. The RS apparently did the split successfully this time, but is stuck on the 
master to delete the znode for the region. It kept on saying - 
"org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
master to process the split for R" and it was stuck there forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6151) Master can die if RegionServer throws ServerNotRunningYet


 [ 
https://issues.apache.org/jira/browse/HBASE-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6151:
--

Description: 
See, for example:

{noformat}
2012-05-23 16:49:22,745 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1240)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:444)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:343)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:540)
at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:474)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:412)
{noformat}

The HRegionServer calls HBaseServer:
{code}
  public void start() {
startThreads();
openServer();
  }
{code}

but the server can start accepting RPCs once the threads have been started, but 
if they do, they throw ServerNotRunningException until openServer runs.  We 
should probably
1) Catch the remote exception and retry on the master
2) Look into whether the start() behavior of HBaseServer makes any sense.  Why 
would you start accepting RPCs only to throw back ServerNotRunningException?


  was:
See, for example:

{noformat}
2012-05-23 16:49:22,745 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1240)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:444)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:343)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:540)
at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:474)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:412)
{noformat}

{code}
The HRegionServer calls HBaseServer:
  public void start() {
startThreads();
openServer();
  }
{code}

but the server can start accepting RPCs once the threads have been started, but 
if they do, they throw ServerNotRunningException until openServer runs.  We 
should probably
1) Catch the remote exception and retry on the master
2) Look into whether the start() behavior of HBaseServer makes any sense.  Why 
would you start accepting RPCs only to throw back ServerNotRunningException?



> Master can die if RegionServer throws ServerNotRunningYet
> -
>
> Key: HBASE-6151
> URL: https://issues.apache.org/jira/browse/HBASE-6151
> Project: HBase
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
>
> See, for example:
> {noformat}
> 2012-05-23 16:49:22,745 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown.
> org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running 
> yet
>   at 
> org.apa

[jira] [Created] (HBASE-6151) Master can die if RegionServer throws ServerNotRunningYet

Gregory Chanan created HBASE-6151:
-

 Summary: Master can die if RegionServer throws ServerNotRunningYet
 Key: HBASE-6151
 URL: https://issues.apache.org/jira/browse/HBASE-6151
 Project: HBase
  Issue Type: Bug
  Components: ipc
Affects Versions: 0.90.7, 0.92.2, 0.96.0, 0.94.1
Reporter: Gregory Chanan
Assignee: Gregory Chanan


See, for example:

{noformat}
2012-05-23 16:49:22,745 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1240)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:444)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:343)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:540)
at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:474)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:412)
{noformat}

The HRegionServer calls HBaseServer:
  public void start() {
startThreads();
openServer();
  }
but the server can start accepting RPCs once the threads have been started, but 
if they do, they throw ServerNotRunningException until openServer runs.  We 
should probably
1) Catch the remote exception and retry on the master
2) Look into whether the start() behavior of HBaseServer makes any sense.  Why 
would you start accepting RPCs only to throw back ServerNotRunningException?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6151) Master can die if RegionServer throws ServerNotRunningYet


 [ 
https://issues.apache.org/jira/browse/HBASE-6151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6151:
--

Description: 
See, for example:

{noformat}
2012-05-23 16:49:22,745 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1240)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:444)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:343)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:540)
at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:474)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:412)
{noformat}

{code}
The HRegionServer calls HBaseServer:
  public void start() {
startThreads();
openServer();
  }
{code}

but the server can start accepting RPCs once the threads have been started, but 
if they do, they throw ServerNotRunningException until openServer runs.  We 
should probably
1) Catch the remote exception and retry on the master
2) Look into whether the start() behavior of HBaseServer makes any sense.  Why 
would you start accepting RPCs only to throw back ServerNotRunningException?


  was:
See, for example:

{noformat}
2012-05-23 16:49:22,745 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1240)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:444)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:343)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:540)
at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:474)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:412)
{noformat}

The HRegionServer calls HBaseServer:
  public void start() {
startThreads();
openServer();
  }
but the server can start accepting RPCs once the threads have been started, but 
if they do, they throw ServerNotRunningException until openServer runs.  We 
should probably
1) Catch the remote exception and retry on the master
2) Look into whether the start() behavior of HBaseServer makes any sense.  Why 
would you start accepting RPCs only to throw back ServerNotRunningException?



> Master can die if RegionServer throws ServerNotRunningYet
> -
>
> Key: HBASE-6151
> URL: https://issues.apache.org/jira/browse/HBASE-6151
> Project: HBase
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
>
> See, for example:
> {noformat}
> 2012-05-23 16:49:22,745 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unhandled exception. Starting shutdown.
> org.apache.hadoop.hbase.ipc.ServerNotRunningException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running 
> yet
>   at 
> org.apache.hadoop.hbas

[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287813#comment-13287813
 ] 

Zhihong Yu commented on HBASE-5936:
---

I can easily reproduce one of the test failures seen on Jenkins 
(https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2972/testReport/org.apache.hadoop.hbase.master/TestHMasterRPCException/testRPCException/):
{code}
Failed tests:   
testRPCException(org.apache.hadoop.hbase.master.TestHMasterRPCException): 
Unexpected throwable: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running 
yet
{code}

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
> HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287792#comment-13287792
 ] 

Hudson commented on HBASE-5936:
---

Integrated in HBase-TRUNK #2972 (See 
[https://builds.apache.org/job/HBase-TRUNK/2972/])
HBASE-5936 Add Column-level PB-based calls to HMasterInterface (Revision 
1345390)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/hbase-server/src/main/protobuf/Master.proto
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
> HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6138) HadoopQA not running findbugs [Trunk]


[ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287793#comment-13287793
 ] 

Hudson commented on HBASE-6138:
---

Integrated in HBase-TRUNK #2972 (See 
[https://builds.apache.org/job/HBase-TRUNK/2972/])
HBASE-6138 HadoopQA not running findbugs [Trunk] (Anoop Sam John) (Revision 
1345391)

 Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/pom.xml


> HadoopQA not running findbugs [Trunk]
> -
>
> Key: HBASE-6138
> URL: https://issues.apache.org/jira/browse/HBASE-6138
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 0.96.0
>
> Attachments: 6138.txt
>
>
> HadoopQA shows like
>  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
> But not able to see any reports link
> When I checked the console output for the build I can see
> {code}
> [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
> ---
> [INFO] Fork Value is true
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] HBase . SUCCESS [1.890s]
> [INFO] HBase - Common  FAILURE [2.238s]
> [INFO] HBase - Server  SKIPPED
> [INFO] HBase - Assembly .. SKIPPED
> [INFO] HBase - Site .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 4.856s
> [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
> [INFO] Final Memory: 23M/154M
> [INFO] 
> 
> [ERROR] Could not find resource 
> '${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
> [ERROR] 
> {code}
> Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6150) Remove empty files causing rat check fail


[ 
https://issues.apache.org/jira/browse/HBASE-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287781#comment-13287781
 ] 

Hudson commented on HBASE-6150:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #36 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/36/])
HBASE-6150 Remove empty files causing rat check fail (Revision 1345369)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/ExponentiallyDecayingSample.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Sample.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Snapshot.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/UniformSample.java


> Remove empty files causing rat check fail
> -
>
> Key: HBASE-6150
> URL: https://issues.apache.org/jira/browse/HBASE-6150
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
>
> Set of empty files found by Jesse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287779#comment-13287779
 ] 

Hudson commented on HBASE-5936:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #36 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/36/])
HBASE-5936 Add Column-level PB-based calls to HMasterInterface (Revision 
1345390)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/WritableRpcEngine.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/RequestConverter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/hbase-server/src/main/protobuf/Master.proto
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
> HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6149) Fix TestFSUtils creating dirs under top level dir


[ 
https://issues.apache.org/jira/browse/HBASE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287780#comment-13287780
 ] 

Hudson commented on HBASE-6149:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #36 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/36/])
HBASE-6149 Fix TestFSUtils creating dirs under top level dir (Revision 
1345343)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestFSUtils.java


> Fix TestFSUtils creating dirs under top level dir
> -
>
> Key: HBASE-6149
> URL: https://issues.apache.org/jira/browse/HBASE-6149
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
> Attachments: fixtestdir.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6145) Fix site target post modularization


 [ 
https://issues.apache.org/jira/browse/HBASE-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6145:
-

Attachment: sitev3.txt

I went back to trying to make build work w/ hbase-assembly.  It looked 
attractive because it almost does the right thing.  The big problem w/ this 
route that maven wants you to take is that its bound to the package phase.  
That means whenever you do a mvn package, it'll take for ever was maven builds 
the world then copies it all over the place including jars to make you your 
.tar.gz.  Most of the time when folks do package, they just want jars made and 
installed is my thinking so this would just be a fat annoyance.

I went back to removing hbase-assembly and having assembly done by the parent.  
You invoke an assembly by doing assembly:assembly after doing a package and 
site (not assembly:single -- thats something else).  The attached patch is 
pretty much there.  I need to do a bit more polishing.  All src is included, 
its buildable and it runs.  Let me do some more testing before committing.

> Fix site target post modularization
> ---
>
> Key: HBASE-6145
> URL: https://issues.apache.org/jira/browse/HBASE-6145
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: site.txt, site2.txt, sitev3.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6138) HadoopQA not running findbugs [Trunk]


[ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287771#comment-13287771
 ] 

Zhihong Yu commented on HBASE-6138:
---

Integrated to trunk.

Thanks for the patch Anoop.

> HadoopQA not running findbugs [Trunk]
> -
>
> Key: HBASE-6138
> URL: https://issues.apache.org/jira/browse/HBASE-6138
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 0.96.0
>
> Attachments: 6138.txt
>
>
> HadoopQA shows like
>  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
> But not able to see any reports link
> When I checked the console output for the build I can see
> {code}
> [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
> ---
> [INFO] Fork Value is true
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] HBase . SUCCESS [1.890s]
> [INFO] HBase - Common  FAILURE [2.238s]
> [INFO] HBase - Server  SKIPPED
> [INFO] HBase - Assembly .. SKIPPED
> [INFO] HBase - Site .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 4.856s
> [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
> [INFO] Final Memory: 23M/154M
> [INFO] 
> 
> [ERROR] Could not find resource 
> '${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
> [ERROR] 
> {code}
> Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6138) HadoopQA not running findbugs [Trunk]


 [ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-6138:
-

Assignee: Anoop Sam John

> HadoopQA not running findbugs [Trunk]
> -
>
> Key: HBASE-6138
> URL: https://issues.apache.org/jira/browse/HBASE-6138
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 0.96.0
>
> Attachments: 6138.txt
>
>
> HadoopQA shows like
>  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
> But not able to see any reports link
> When I checked the console output for the build I can see
> {code}
> [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
> ---
> [INFO] Fork Value is true
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] HBase . SUCCESS [1.890s]
> [INFO] HBase - Common  FAILURE [2.238s]
> [INFO] HBase - Server  SKIPPED
> [INFO] HBase - Assembly .. SKIPPED
> [INFO] HBase - Site .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 4.856s
> [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
> [INFO] Final Memory: 23M/154M
> [INFO] 
> 
> [ERROR] Could not find resource 
> '${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
> [ERROR] 
> {code}
> Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


 [ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5936:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.  I ran the three failing tests locally and they passed for 
me w/ this patch applied.  Thanks Gregory for your doggedness getting this in.

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
> HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6138) HadoopQA not running findbugs [Trunk]


 [ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6138:
--

Attachment: 6138.txt

Patch that I am going to apply.

> HadoopQA not running findbugs [Trunk]
> -
>
> Key: HBASE-6138
> URL: https://issues.apache.org/jira/browse/HBASE-6138
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Anoop Sam John
> Fix For: 0.96.0
>
> Attachments: 6138.txt
>
>
> HadoopQA shows like
>  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
> But not able to see any reports link
> When I checked the console output for the build I can see
> {code}
> [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
> ---
> [INFO] Fork Value is true
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] HBase . SUCCESS [1.890s]
> [INFO] HBase - Common  FAILURE [2.238s]
> [INFO] HBase - Server  SKIPPED
> [INFO] HBase - Assembly .. SKIPPED
> [INFO] HBase - Site .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 4.856s
> [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
> [INFO] Final Memory: 23M/154M
> [INFO] 
> 
> [ERROR] Could not find resource 
> '${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
> [ERROR] 
> {code}
> Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-06-01 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287766#comment-13287766
 ] 

James Taylor commented on HBASE-4676:
-

Our qualifiers tend to be long, so trie-encoding them helps quite a bit. We 
could optimize this ourselves by managing our own mapping, but it's great that 
the trie-encoding does it for us. We haven't seen this impact our scan times.

> Prefix Compression - Trie data block encoding
> -
>
> Key: HBASE-4676
> URL: https://issues.apache.org/jira/browse/HBASE-4676
> Project: HBase
>  Issue Type: New Feature
>  Components: io, performance, regionserver
>Affects Versions: 0.90.6
>Reporter: Matt Corgan
>Assignee: Matt Corgan
> Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf, 
> PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png, 
> hbase-prefix-trie-0.1.jar
>
>
> The HBase data block format has room for 2 significant improvements for 
> applications that have high block cache hit ratios.  
> First, there is no prefix compression, and the current KeyValue format is 
> somewhat metadata heavy, so there can be tremendous memory bloat for many 
> common data layouts, specifically those with long keys and short values.
> Second, there is no random access to KeyValues inside data blocks.  This 
> means that every time you double the datablock size, average seek time (or 
> average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
> size is ~10x slower for random seeks than a 4KB block size, but block sizes 
> as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
> or more may be more efficient from a disk access and block-cache perspective 
> in many big-data applications, but doing so is infeasible from a random seek 
> perspective.
> The PrefixTrie block encoding format attempts to solve both of these 
> problems.  Some features:
> * trie format for row key encoding completely eliminates duplicate row keys 
> and encodes similar row keys into a standard trie structure which also saves 
> a lot of space
> * the column family is currently stored once at the beginning of each block.  
> this could easily be modified to allow multiple family names per block
> * all qualifiers in the block are stored in their own trie format which 
> caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
>  the size of this trie determines the width of the block's qualifier 
> fixed-width-int
> * the minimum timestamp is stored at the beginning of the block, and deltas 
> are calculated from that.  the maximum delta determines the width of the 
> block's timestamp fixed-width-int
> The block is structured with metadata at the beginning, then a section for 
> the row trie, then the column trie, then the timestamp deltas, and then then 
> all the values.  Most work is done in the row trie, where every leaf node 
> (corresponding to a row) contains a list of offsets/references corresponding 
> to the cells in that row.  Each cell is fixed-width to enable binary 
> searching and is represented by [1 byte operationType, X bytes qualifier 
> offset, X bytes timestamp delta offset].
> If all operation types are the same for a block, there will be zero per-cell 
> overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
> So, the compression aspect is very strong, but makes a few small sacrifices 
> on VarInt size to enable faster binary searches in trie fan-out nodes.
> A more compressed but slower version might build on this by also applying 
> further (suffix, etc) compression on the trie nodes at the cost of slower 
> write speed.  Even further compression could be obtained by using all VInts 
> instead of FInts with a sacrifice on random seek speed (though not huge).
> One current drawback is the current write speed.  While programmed with good 
> constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
> programmed with the same level of optimization as the read path.  Work will 
> need to be done to optimize the data structures used for encoding and could 
> probably show a 10x increase.  It will still be slower than delta encoding, 
> but with a much higher decode speed.  I have not yet created a thorough 
> benchmark for write speed nor sequential read speed.
> Though the trie is reaching a point where it is internally very efficient 
> (probably within half or a quarter of its max read speed) the way that hbase 
> currently uses it is far from optimal.  The KeyValueScanner and related 
> classes that iterate through the trie will eventually need to be smarter and 
> have methods to do things like skipping to the next row of results without 
> scanning every cell in between.  When that is accomplished it will also

[jira] [Comment Edited] (HBASE-6055) Snapshots in HBase 0.96

2012-06-01 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286862#comment-13286862
]

Jonathan Hsieh edited comment on HBASE-6055 at 6/1/12 11:02 PM:

_(jon: I made a minor formatting tweak to make this easier to read the dir
structure)_

But before a detailed description of how timestamp-based snapshots work
internally, lets answer some comments!

@Jon: I'll add more info to the document to cover this stuff, but for the
moment, lets just get it out there.

{quote}
What is the read mechanism for snapshots like? Does the snapshot act like a
read-only table or is there some special external mechanism needed to read the
data from a snapshot? You mention having to rebuild in-memory state by
replaying wals – is this a recovery situation or needed in normal reads?
{quote}

Its almost, but not quite like a table. Read of a snapshot is going to require
an external tool but after hooking up the snapshot via the external tool, it
should act just like a real table.

Snapshots are intended to happen as fast as possible, to minimize downtime for
the table. To enable that, we are just creating reference files in the snapshot
directory. My vision is that once you take a snapshot, at some point (maybe
weekly), you export the snapshot to a backup area. In the export you actually
do the copy of the referenced files - you do a direct scan of the HFile
(avoiding the top-level interface and going right to HDFS) and the WAL files.
Then when you want to read the snapshot, you can just bulk-import the HFIles
and replay the WAL files (with the WALPlayer this is relatively easy) to
rebuild the state of the table at the time of the snapshot. Its not an exact
copy (META isn't preserved), but all the actual data is there.

The caveat here is since everything is references, one of the WAL files you
reference may not actually have been closed (and therefore not readable). In
the common case this won't happen, but if you snap and immediately export, its
possible. In that case, you need to roll the WAL for the RS that haven't rolled
them yet. However, this is in the export process, so a little latency there is
tolerable, whereas avoiding this means adding latency to taking a snapshot -
bad news bears.

Keep in mind that the log files and hfiles will get regularly cleaned up. The
former will be moved to the .oldlogs directory and periodically cleaned up and
the latter get moved to the .archive directory (again with a parallel file
hierarchy, as per HBASE-5547). If the snapshot goes to read the reference file,
which tracks down to the original file and it doesn't find it, then it will
need to lookup the same file in its respective archive directory. If its not
there, then you are really hosed (except for the case mentioned in the doc
about the WALs getting cleaned up by an aggressive log cleaner, which it is
shown, is not a problem).

Haven't gotten around to implementing this yet, but it seems reasonable to
finish up (and I think Matteo was interested in working on that part).

{quote}
What is a representation of a snapshot look like in terms of META and file
system contents?
{quote}

The way I see the implementation in the end is just a bunch of files in the
/hbase/.snapshot directory. Like I mentioned above, the layout is very similar
to the layout of a table.

Lets look at an example of a table named "stuff" (snapshot names need to be
valid directory names - same as a table or CF) and has column "column" which is
hosted on servers rs-1 and rs-2. Originally, the file system will look
something like (with license taken on file names - its not exact, I know, this
is just an example) :
{code}
/hbase/
.logs/
rs-1/
WAL-rs1-1
WAL-rs1-2
rs-2/
WAL-rs2-1
WAL-rs2-2
stuff/
.tableinfo
region1
column
region1-hfile-1
region2
column
region2-hfile-1
{code}

The snapshot named "tuesday-at-nine", when completed, then just adds the
following to the directory structure (or close enough):

{code}
.snapshot/
tuesday-at-nine/
.tableinfo
.snapshotinfo
.logs
rs-1/
WAL-rs1-1.reference
WAL-rs1-2.reference
rs-2/
WAL-rs2-1.reference
WAL-rs2-2.reference
stuff/
.tableinfo
region1

[jira] [Commented] (HBASE-6150) Remove empty files causing rat check fail


[ 
https://issues.apache.org/jira/browse/HBASE-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287758#comment-13287758
 ] 

Hudson commented on HBASE-6150:
---

Integrated in HBase-TRUNK #2971 (See 
[https://builds.apache.org/job/HBase-TRUNK/2971/])
HBASE-6150 Remove empty files causing rat check fail (Revision 1345369)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/ExponentiallyDecayingSample.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Sample.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Snapshot.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/UniformSample.java


> Remove empty files causing rat check fail
> -
>
> Key: HBASE-6150
> URL: https://issues.apache.org/jira/browse/HBASE-6150
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
>
> Set of empty files found by Jesse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6138) HadoopQA not running findbugs [Trunk]


[ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287753#comment-13287753
 ] 

Zhihong Yu commented on HBASE-6138:
---

Will integrate the suggested fix if there is no objection.

> HadoopQA not running findbugs [Trunk]
> -
>
> Key: HBASE-6138
> URL: https://issues.apache.org/jira/browse/HBASE-6138
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Anoop Sam John
> Fix For: 0.96.0
>
>
> HadoopQA shows like
>  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
> But not able to see any reports link
> When I checked the console output for the build I can see
> {code}
> [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
> ---
> [INFO] Fork Value is true
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] HBase . SUCCESS [1.890s]
> [INFO] HBase - Common  FAILURE [2.238s]
> [INFO] HBase - Server  SKIPPED
> [INFO] HBase - Assembly .. SKIPPED
> [INFO] HBase - Site .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 4.856s
> [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
> [INFO] Final Memory: 23M/154M
> [INFO] 
> 
> [ERROR] Could not find resource 
> '${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
> [ERROR] 
> {code}
> Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5892) [hbck] Refactor parallel WorkItem* to Futures.

2012-06-01 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287748#comment-13287748
 ] 

Andrew Wang commented on HBASE-5892:


I don't know why Findbugs is erroring. Maybe the modularization change?

{code}
[ERROR] Could not find resource 
'${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
{code}

No tests because there's no functionality change, it's a refactor.

> [hbck] Refactor parallel WorkItem* to Futures.
> --
>
> Key: HBASE-5892
> URL: https://issues.apache.org/jira/browse/HBASE-5892
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jonathan Hsieh
>Assignee: Andrew Wang
>  Labels: noob
> Attachments: hbase-5892-1.patch, hbase-5892-2-0.90.patch, 
> hbase-5892-2.patch, hbase-5892-3.patch, hbase-5892-4-0.90.patch, 
> hbase-5892-4.patch, hbase-5892.patch
>
>
> This would convert WorkItem* logic (with low level notifies, and rough 
> exception handling)  into a more canonical Futures pattern.
> Currently there are two instances of this pattern (for loading hdfs dirs, for 
> contacting regionservers for assignments, and soon -- for loading hdfs 
> .regioninfo files).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6150) Remove empty files causing rat check fail


[ 
https://issues.apache.org/jira/browse/HBASE-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287724#comment-13287724
 ] 

stack commented on HBASE-6150:
--

Removed these four files:

{code}
D   
hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Snapshot.java
D   
hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/ExponentiallyDecayingSample.java
D   
hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Sample.java
D   
hbase-server/src/main/java/org/apache/hadoop/hbase/metrics/histogram/UniformSample.java
{code}

> Remove empty files causing rat check fail
> -
>
> Key: HBASE-6150
> URL: https://issues.apache.org/jira/browse/HBASE-6150
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
> Fix For: 0.96.0
>
>
> Set of empty files found by Jesse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6150) Remove empty files causing rat check fail


 [ 
https://issues.apache.org/jira/browse/HBASE-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-6150.
--

   Resolution: Fixed
Fix Version/s: 0.96.0
 Assignee: stack

Committed to trunk.

> Remove empty files causing rat check fail
> -
>
> Key: HBASE-6150
> URL: https://issues.apache.org/jira/browse/HBASE-6150
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
>
> Set of empty files found by Jesse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6150) Remove empty files causing rat check fail

stack created HBASE-6150:


 Summary: Remove empty files causing rat check fail
 Key: HBASE-6150
 URL: https://issues.apache.org/jira/browse/HBASE-6150
 Project: HBase
  Issue Type: Bug
Reporter: stack


Set of empty files found by Jesse.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6149) Fix TestFSUtils creating dirs under top level dir


[ 
https://issues.apache.org/jira/browse/HBASE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287705#comment-13287705
 ] 

Hudson commented on HBASE-6149:
---

Integrated in HBase-TRUNK #2970 (See 
[https://builds.apache.org/job/HBase-TRUNK/2970/])
HBASE-6149 Fix TestFSUtils creating dirs under top level dir (Revision 
1345343)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestFSUtils.java


> Fix TestFSUtils creating dirs under top level dir
> -
>
> Key: HBASE-6149
> URL: https://issues.apache.org/jira/browse/HBASE-6149
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
> Attachments: fixtestdir.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6149) Fix TestFSUtils creating dirs under top level dir


 [ 
https://issues.apache.org/jira/browse/HBASE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-6149.
--

   Resolution: Fixed
Fix Version/s: 0.96.0
 Assignee: stack

Committed to trunk

> Fix TestFSUtils creating dirs under top level dir
> -
>
> Key: HBASE-6149
> URL: https://issues.apache.org/jira/browse/HBASE-6149
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.96.0
>
> Attachments: fixtestdir.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


 [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-6060:
--

Attachment: 6060-94-v3.patch

Patch v3 illustrates my proposal.

I also created a singleton for the null RegionPlan that signifies there is no 
server to assign region.

TestAssignmentManager passes.

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: 6060-94-v3.patch, HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6149) Fix TestFSUtils creating dirs under top level dir


 [ 
https://issues.apache.org/jira/browse/HBASE-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6149:
-

Attachment: fixtestdir.txt

Minor fix

> Fix TestFSUtils creating dirs under top level dir
> -
>
> Key: HBASE-6149
> URL: https://issues.apache.org/jira/browse/HBASE-6149
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
> Attachments: fixtestdir.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6149) Fix TestFSUtils creating dirs under top level dir

stack created HBASE-6149:


 Summary: Fix TestFSUtils creating dirs under top level dir
 Key: HBASE-6149
 URL: https://issues.apache.org/jira/browse/HBASE-6149
 Project: HBase
  Issue Type: Bug
Reporter: stack




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6145) Fix site target post modularization


[ 
https://issues.apache.org/jira/browse/HBASE-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287648#comment-13287648
 ] 

Hadoop QA commented on HBASE-6145:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530584/site2.txt
  against trunk revision .

-1 @author.  The patch appears to contain 2 @author tags which the Hadoop 
community has agreed to not allow in code contributions.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFromClientSide

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2085//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2085//console

This message is automatically generated.

> Fix site target post modularization
> ---
>
> Key: HBASE-6145
> URL: https://issues.apache.org/jira/browse/HBASE-6145
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: site.txt, site2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5609) Add the ability to pass additional information for slow query logging

2012-06-01 Thread Jesse Yates (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287636#comment-13287636
 ] 

Jesse Yates commented on HBASE-5609:


I'm +1 on Drz's latest patch on RB. Would love to see another patch to abstract 
out the commonality in toMap(), but that can go in another ticket.

> Add the ability to pass additional information for slow query logging
> -
>
> Key: HBASE-5609
> URL: https://issues.apache.org/jira/browse/HBASE-5609
> Project: HBase
>  Issue Type: New Feature
>  Components: client, ipc
>Reporter: Michael Drzal
>Assignee: Michael Drzal
>Priority: Minor
> Attachments: HBASE-5609-v2.patch, HBASE-5609.patch
>
>
> HBase-4117 added the ability to log information about queries that returned 
> too much data or ran for too long.  There is some information written as a 
> fingerprint that can be used to tell what table/column families/... are 
> affected.  I would like to extend this functionality to allow the client to 
> insert an identifier into the operation that gets output in the log.  The 
> idea behind this would be that if there were N places in the client 
> application that touched a given table in a certain way, you could quickly 
> narrow things down by inserting a className:functionName or similar 
> identifier.  I'm fully willing to go back on this if people think that it 
> isn't a problem in real life and it would just add complexity to the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-06-01 Thread Matt Corgan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287637#comment-13287637
 ] 

Matt Corgan commented on HBASE-4676:


Great to hear James.  I'm working on migrating it to trunk, but wrestling with 
git and maven, my two worst friends.

The last change I've been thinking about making to it before finalizing this 
version is to add an option controlling whether to trie-encode the qualifiers 
vs merely de-duping them.  There's some read-time expense to decoding the 
trie-encoded qualifiers, and if you have a narrow table then you may not be 
saving much memory anyway.  So it would be an option to trade a little memory 
for faster scans/decoding.

> Prefix Compression - Trie data block encoding
> -
>
> Key: HBASE-4676
> URL: https://issues.apache.org/jira/browse/HBASE-4676
> Project: HBase
>  Issue Type: New Feature
>  Components: io, performance, regionserver
>Affects Versions: 0.90.6
>Reporter: Matt Corgan
>Assignee: Matt Corgan
> Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf, 
> PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png, 
> hbase-prefix-trie-0.1.jar
>
>
> The HBase data block format has room for 2 significant improvements for 
> applications that have high block cache hit ratios.  
> First, there is no prefix compression, and the current KeyValue format is 
> somewhat metadata heavy, so there can be tremendous memory bloat for many 
> common data layouts, specifically those with long keys and short values.
> Second, there is no random access to KeyValues inside data blocks.  This 
> means that every time you double the datablock size, average seek time (or 
> average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
> size is ~10x slower for random seeks than a 4KB block size, but block sizes 
> as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
> or more may be more efficient from a disk access and block-cache perspective 
> in many big-data applications, but doing so is infeasible from a random seek 
> perspective.
> The PrefixTrie block encoding format attempts to solve both of these 
> problems.  Some features:
> * trie format for row key encoding completely eliminates duplicate row keys 
> and encodes similar row keys into a standard trie structure which also saves 
> a lot of space
> * the column family is currently stored once at the beginning of each block.  
> this could easily be modified to allow multiple family names per block
> * all qualifiers in the block are stored in their own trie format which 
> caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
>  the size of this trie determines the width of the block's qualifier 
> fixed-width-int
> * the minimum timestamp is stored at the beginning of the block, and deltas 
> are calculated from that.  the maximum delta determines the width of the 
> block's timestamp fixed-width-int
> The block is structured with metadata at the beginning, then a section for 
> the row trie, then the column trie, then the timestamp deltas, and then then 
> all the values.  Most work is done in the row trie, where every leaf node 
> (corresponding to a row) contains a list of offsets/references corresponding 
> to the cells in that row.  Each cell is fixed-width to enable binary 
> searching and is represented by [1 byte operationType, X bytes qualifier 
> offset, X bytes timestamp delta offset].
> If all operation types are the same for a block, there will be zero per-cell 
> overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
> So, the compression aspect is very strong, but makes a few small sacrifices 
> on VarInt size to enable faster binary searches in trie fan-out nodes.
> A more compressed but slower version might build on this by also applying 
> further (suffix, etc) compression on the trie nodes at the cost of slower 
> write speed.  Even further compression could be obtained by using all VInts 
> instead of FInts with a sacrifice on random seek speed (though not huge).
> One current drawback is the current write speed.  While programmed with good 
> constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
> programmed with the same level of optimization as the read path.  Work will 
> need to be done to optimize the data structures used for encoding and could 
> probably show a 10x increase.  It will still be slower than delta encoding, 
> but with a much higher decode speed.  I have not yet created a thorough 
> benchmark for write speed nor sequential read speed.
> Though the trie is reaching a point where it is internally very efficient 
> (probably within half or a quarter of its max read speed) the way that hbase 
> currentl

[jira] [Commented] (HBASE-5251) Some commands return "0 rows" when > 0 rows were processed successfully

2012-06-01 Thread s...@hotmail.com (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287631#comment-13287631
 ] 

s...@hotmail.com commented on HBASE-5251:
-

I am working on fixing this issue. Will have a patch soon.

> Some commands return "0 rows" when > 0 rows were processed successfully
> ---
>
> Key: HBASE-5251
> URL: https://issues.apache.org/jira/browse/HBASE-5251
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 0.90.5
>Reporter: David S. Wang
>Assignee: Himanshu Vashishtha
>Priority: Minor
>  Labels: noob
>
> From the hbase shell, I see this:
> hbase(main):049:0> scan 't1'
> ROW   COLUMN+CELL 
>   
>  r1   column=f1:c1, timestamp=1327104295560, value=value  
>   
>  r1   column=f1:c2, timestamp=1327104330625, value=value  
>   
> 1 row(s) in 0.0300 seconds
> hbase(main):050:0> deleteall 't1', 'r1'
> 0 row(s) in 0.0080 seconds  <== I expected this to read 
> "2 row(s)"
> hbase(main):051:0> scan 't1'   
> ROW   COLUMN+CELL 
>   
> 0 row(s) in 0.0090 seconds
> I expected the deleteall command to return "1 row(s)" instead of 0, because 1 
> row was deleted.  Similar behavior for delete and some other commands.  Some 
> commands such as "put" work fine.
> Looking at the ruby shell code, it seems that formatter.footer() is called 
> even for commands that will not actually increment the number of rows 
> reported, such as deletes.  Perhaps there should be another similar function 
> to formatter.footer(), but that will not print out @row_count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287632#comment-13287632
 ] 

Zhihong Yu commented on HBASE-6046:
---

Patch v2 looks good.
Minor comment:
{code}
-  ".splitLogManagerTimeoutMonitor");
+  public void finishInitialization(boolean masterRecovery) {
{code}
Add javadoc for the method and masterRecovery parameter.

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch, 
> HBASE_6046_0.94_2.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287628#comment-13287628
 ] 

Hadoop QA commented on HBASE-6046:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12530583/HBASE_6046_0.94_2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2086//console

This message is automatically generated.

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch, 
> HBASE_6046_0.94_2.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287599#comment-13287599
 ] 

Zhihong Yu commented on HBASE-6060:
---

Thinking more about the usable RegionPlan flag, we don't really need it.
We can introduce an 'unusable' RegionPlan singleton which signifies the fact 
that it is not to be used.

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6145) Fix site target post modularization


 [ 
https://issues.apache.org/jira/browse/HBASE-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6145:
-

Attachment: site2.txt

v2 removes hbase-assembly and assembly in general -- for now.

At the moment after looking at assembly trying to make the assembly:assembly 
work -- i.e. doing assembly up in the parent rather than down in hbase-assembly 
module -- it seems like it could be made work (you just do assembly:assembly 
after doing package) but its totally a manual affair shaping the end product 
while meantime maven throws cryptic exceptions.  It will take hours.

Trying to read up on how others have done this is a trip through other people's 
misery.  I'm tempted to just write a shell script to do the packaging.

> Fix site target post modularization
> ---
>
> Key: HBASE-6145
> URL: https://issues.apache.org/jira/browse/HBASE-6145
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: site.txt, site2.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6148) [89-fb] Avoid allocating large objects when reading corrupted RPC

2012-06-01 Thread Liyin Tang (JIRA)

Liyin Tang created HBASE-6148:
-

 Summary: [89-fb] Avoid allocating large objects when reading 
corrupted RPC
 Key: HBASE-6148
 URL: https://issues.apache.org/jira/browse/HBASE-6148
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


Recently RegionServer allocates very large objects when reading some corrupted 
RPC calls, which may caused by client-server version incompatibility. We need 
to add a protection before allocating the objects. 

Apache trunk won't suffer from this problem since it had moved to the versioned 
invocation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287595#comment-13287595
 ] 

Zhihong Yu commented on HBASE-5924:
---

For #2 above, I think we can remove the callback in 0.96

> In the client code, don't wait for all the requests to be executed before 
> resubmitting a request in error.
> --
>
> Key: HBASE-5924
> URL: https://issues.apache.org/jira/browse/HBASE-5924
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
>
> The client (in the function HConnectionManager#processBatchCallback) works in 
> two steps:
>  - make the requests
>  - collect the failures and successes and prepare for retry
> It means that when there is an immediate error (region moved, split, dead 
> server, ...) we still wait for all the initial requests to be executed before 
> submitting again the failed request. If we have a scenario with all the 
> requests taking 5 seconds we have a final execution time of: 5 (initial 
> requests) + 1 (wait time) + 5 (final request) = 11s.
> We could improve this by analyzing immediately the results. This would lead 
> us, for the scenario mentioned above, to 6 seconds. 
> So we could have a performance improvement of nearly 50% in many cases, and 
> much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


 [ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-6046:
--

Attachment: HBASE_6046_0.94_2.patch

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch, 
> HBASE_6046_0.94_2.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-06-01 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287587#comment-13287587
 ] 

James Taylor commented on HBASE-4676:
-

This is fantastic, Matt. We're testing your patch on 0.94 in our dev cluster 
here at Salesforce and are seeing 5-15x compression with no degradation on scan 
performance.

> Prefix Compression - Trie data block encoding
> -
>
> Key: HBASE-4676
> URL: https://issues.apache.org/jira/browse/HBASE-4676
> Project: HBase
>  Issue Type: New Feature
>  Components: io, performance, regionserver
>Affects Versions: 0.90.6
>Reporter: Matt Corgan
>Assignee: Matt Corgan
> Attachments: HBASE-4676-0.94-v1.patch, PrefixTrie_Format_v1.pdf, 
> PrefixTrie_Performance_v1.pdf, SeeksPerSec by blockSize.png, 
> hbase-prefix-trie-0.1.jar
>
>
> The HBase data block format has room for 2 significant improvements for 
> applications that have high block cache hit ratios.  
> First, there is no prefix compression, and the current KeyValue format is 
> somewhat metadata heavy, so there can be tremendous memory bloat for many 
> common data layouts, specifically those with long keys and short values.
> Second, there is no random access to KeyValues inside data blocks.  This 
> means that every time you double the datablock size, average seek time (or 
> average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
> size is ~10x slower for random seeks than a 4KB block size, but block sizes 
> as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
> or more may be more efficient from a disk access and block-cache perspective 
> in many big-data applications, but doing so is infeasible from a random seek 
> perspective.
> The PrefixTrie block encoding format attempts to solve both of these 
> problems.  Some features:
> * trie format for row key encoding completely eliminates duplicate row keys 
> and encodes similar row keys into a standard trie structure which also saves 
> a lot of space
> * the column family is currently stored once at the beginning of each block.  
> this could easily be modified to allow multiple family names per block
> * all qualifiers in the block are stored in their own trie format which 
> caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
>  the size of this trie determines the width of the block's qualifier 
> fixed-width-int
> * the minimum timestamp is stored at the beginning of the block, and deltas 
> are calculated from that.  the maximum delta determines the width of the 
> block's timestamp fixed-width-int
> The block is structured with metadata at the beginning, then a section for 
> the row trie, then the column trie, then the timestamp deltas, and then then 
> all the values.  Most work is done in the row trie, where every leaf node 
> (corresponding to a row) contains a list of offsets/references corresponding 
> to the cells in that row.  Each cell is fixed-width to enable binary 
> searching and is represented by [1 byte operationType, X bytes qualifier 
> offset, X bytes timestamp delta offset].
> If all operation types are the same for a block, there will be zero per-cell 
> overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
> So, the compression aspect is very strong, but makes a few small sacrifices 
> on VarInt size to enable faster binary searches in trie fan-out nodes.
> A more compressed but slower version might build on this by also applying 
> further (suffix, etc) compression on the trie nodes at the cost of slower 
> write speed.  Even further compression could be obtained by using all VInts 
> instead of FInts with a sacrifice on random seek speed (though not huge).
> One current drawback is the current write speed.  While programmed with good 
> constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
> programmed with the same level of optimization as the read path.  Work will 
> need to be done to optimize the data structures used for encoding and could 
> probably show a 10x increase.  It will still be slower than delta encoding, 
> but with a much higher decode speed.  I have not yet created a thorough 
> benchmark for write speed nor sequential read speed.
> Though the trie is reaching a point where it is internally very efficient 
> (probably within half or a quarter of its max read speed) the way that hbase 
> currently uses it is far from optimal.  The KeyValueScanner and related 
> classes that iterate through the trie will eventually need to be smarter and 
> have methods to do things like skipping to the next row of results without 
> scanning every cell in between.  When that is accomplished it will also allow 
> much faster compactions because the full row key will

[jira] [Commented] (HBASE-5936) Add Column-level PB-based calls to HMasterInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287586#comment-13287586
 ] 

Gregory Chanan commented on HBASE-5936:
---

I ran these failed tests multiple times locally and they passed.

> Add Column-level PB-based calls to HMasterInterface
> ---
>
> Key: HBASE-5936
> URL: https://issues.apache.org/jira/browse/HBASE-5936
> Project: HBase
>  Issue Type: Task
>  Components: ipc, master, migration
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.96.0
>
> Attachments: HBASE-5936-v3.patch, HBASE-5936-v4.patch, 
> HBASE-5936-v4.patch, HBASE-5936-v5.patch, HBASE-5936-v6.patch, 
> HBASE-5936.patch
>
>
> This should be a subtask of HBASE-5445, but since that is a subtask, I can't 
> also make this a subtask (apparently).
> This is for converting the column-level calls, i.e.:
> addColumn
> deleteColumn
> modifyColumn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.

2012-06-01 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287582#comment-13287582
 ] 

nkeywal commented on HBASE-5924:


This leads to a complete rewriting of the processBatchCallback function.
3 comments:
1) I don't see how this piece of code can happen, and I ran the complete test 
suite without getting into this part. Do I miss anything?
{noformat}
  for (Pair regionResult : regionResults) {
if (regionResult == null) {
  // if the first/only record is 'null' the entire region 
failed.
  LOG.debug("Failures for region: " +
  Bytes.toStringBinary(regionName) +
  ", removing from cache");
} else {
{noformat}

2) The callback is never used internally. Is this something we should keep for 
customer code?

3) Do I move it to HTable? There is a comment saying that it does not belong to 
Connection, and it's true. But it's public, so...



> In the client code, don't wait for all the requests to be executed before 
> resubmitting a request in error.
> --
>
> Key: HBASE-5924
> URL: https://issues.apache.org/jira/browse/HBASE-5924
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
>Priority: Minor
>
> The client (in the function HConnectionManager#processBatchCallback) works in 
> two steps:
>  - make the requests
>  - collect the failures and successes and prepare for retry
> It means that when there is an immediate error (region moved, split, dead 
> server, ...) we still wait for all the initial requests to be executed before 
> submitting again the failed request. If we have a scenario with all the 
> requests taking 5 seconds we have a final execution time of: 5 (initial 
> requests) + 1 (wait time) + 5 (final request) = 11s.
> We could improve this by analyzing immediately the results. This would lead 
> us, for the scenario mentioned above, to 6 seconds. 
> So we could have a performance improvement of nearly 50% in many cases, and 
> much more than 50% if the request execution time is different.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287579#comment-13287579
 ] 

Zhihong Yu commented on HBASE-6060:
---

Thanks for working on this issue.

I will review the next version in more detail :-)

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287568#comment-13287568
 ] 

ramkrishna.s.vasudevan commented on HBASE-6060:
---

@Ted
bq.why cannot we return null above so that we don't need to add the boolean 
member to RegionPlan ?
The reason here is incase of null region plan we have a different behaviour.  
We consider it as a case where we dont have any live RS and hence set some flag 
such that the timeoutmonitor can be skipped. Hence we need to differentiate the 
null behaviour from this.
{code}
if (plan == null) {
LOG.debug("Unable to determine a plan to assign " + state);
this.timeoutMonitor.setAllRegionServersOffline(true);
return; // Should get reassigned later when RIT times out.
  }
{code}
I was not sure of which name to give for the usePlan.  May be 'usable' is 
better.
{code}
Before making the deadServerRegionsFromRegionPlan.put() call
{code}
I think as the SSH calls per server it should be ok. 
I will check on other comments before changing it. Thanks for your detailed 
review.

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287561#comment-13287561
 ] 

Zhihong Yu commented on HBASE-6060:
---

I ran the tests in TestAssignmentManager and they passed.
{code}
 synchronized (this.regionPlans) {
+  regionsOnDeadServer = new RegionsOnDeadServer();
+  regionsFromRegionPlansForServer = new 
ConcurrentSkipListSet();
+  this.deadServerRegionsFromRegionPlan.put(sn, regionsOnDeadServer);
{code}
Can the first two assignments be placed outside synchronized block ?
Before making the deadServerRegionsFromRegionPlan.put() call, I think we should 
check that sn isn't currently in deadServerRegionsFromRegionPlan.
For isRegionOnline(HRegionInfo hri):
{code}
+return true;
+  } else {
+// Remove the assignment mapping for sn.
+Set hriSet = this.servers.get(sn);
+if (hriSet != null) {
+  hriSet.remove(hri);
+}
{code}
The else keyword isn't needed.
What if hriSet contains other regions apart from hri, should they be removed as 
well ?

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287547#comment-13287547
 ] 

Zhihong Yu commented on HBASE-6060:
---

{code}
+  if(!plan.canUsePlan()){
+return;
{code}
Please insert a space after if. It would be helpful to add LOG.debug() before 
returning.
{code}
+  public void usePlan(boolean usePlan) {
+this.usePlan = usePlan;
+  }
{code}
I would name the boolean 'usable'. The setter can be named setUsable().
A bigger question is:
{code}
+  if (newPlan) {
+randomPlan.usePlan(false);
+this.regionPlans.remove(randomPlan.getRegionName());
+  } else {
+existingPlan.usePlan(false);
+this.regionPlans.remove(existingPlan.getRegionName());
+  }
{code}
why cannot we return null above so that we don't need to add the boolean member 
to RegionPlan ?
At least we shouldn't return an unusable randomPlan.

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss


 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5179:
-

Fix Version/s: (was: 0.92.2)
   0.92.3

> Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
> region to be assigned before log splitting is completed, causing data loss
> 
>
> Key: HBASE-5179
> URL: https://issues.apache.org/jira/browse/HBASE-5179
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.3
>
> Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 
> 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 
> 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 
> 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 
> 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 
> 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
> Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, 
> hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, 
> hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch
>
>
> If master's processing its failover and ServerShutdownHandler's processing 
> happen concurrently, it may appear following  case.
> 1.master completed splitLogAfterStartup()
> 2.RegionserverA restarts, and ServerShutdownHandler is processing.
> 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
> dead server.
> 4.master starts to assign regions of RegionserverA because it is a dead 
> server by step3.
> However, when doing step4(assigning region), ServerShutdownHandler may be 
> doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab


 [ 
https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3680:
-

Fix Version/s: (was: 0.92.2)
   0.92.3

> Publish more metrics about mslab
> 
>
> Key: HBASE-3680
> URL: https://issues.apache.org/jira/browse/HBASE-3680
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Assignee: Todd Lipcon
> Fix For: 0.92.3
>
> Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it 
> tends to OOME or send us into GC loops of death a lot more than it used to. 
> For example, one RS with mslab enabled and 7GB of heap died out of OOME this 
> afternoon; it had .55GB in the block cache and 2.03GB in the memstores which 
> doesn't account for much... but it could be that because of mslab a lot of 
> space was lost in those incomplete 2MB blocks and without metrics we can't 
> really tell. Compactions were running at the time of the OOME and I see block 
> cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even 
> take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5492) Caching StartKeys and EndKeys of Regions


 [ 
https://issues.apache.org/jira/browse/HBASE-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5492:
-

Fix Version/s: (was: 0.92.2)
   0.92.3

> Caching StartKeys and EndKeys of Regions
> 
>
> Key: HBASE-5492
> URL: https://issues.apache.org/jira/browse/HBASE-5492
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.92.0
> Environment: all
>Reporter: honghua zhu
> Fix For: 0.92.3
>
> Attachments: HBASE-5492.patch
>
>
> Each call for HTable.getStartEndKeys will read meta table.
> In particular, 
> in the case of client side multi-threaded concurrency statistics, 
> we must call HTable.coprocessorExec== > getStartKeysInRange ==> 
> getStartEndKeys,
> resulting in the need to always scan the meta table.
> This is not necessary,
> we can implement the 
> HConnectionManager.HConnectionImplementation.locateRegions(byte[] tableName) 
> method,
> then, get the StartKeys and EndKeys from the cachedRegionLocations of 
> HConnectionImplementation.
> Combined with https://issues.apache.org/jira/browse/HBASE-5491, can improve 
> the performance of statistical

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5821) Incorrect handling of null value in Coprocessor aggregation function min()


 [ 
https://issues.apache.org/jira/browse/HBASE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5821:
-

Fix Version/s: (was: 0.94.1)
   (was: 0.96.0)
   (was: 0.92.2)
   0.92.3

> Incorrect handling of null value in Coprocessor aggregation function min()
> --
>
> Key: HBASE-5821
> URL: https://issues.apache.org/jira/browse/HBASE-5821
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.1
>Reporter: Maryann Xue
>Assignee: Maryann Xue
> Fix For: 0.92.3
>
> Attachments: HBASE-5821.patch
>
>
> Both in AggregateImplementation and AggregationClient, the evaluation of the 
> current minimum value is like:
> min = (min == null || ci.compare(result, min) < 0) ? result : min;
> The LongColumnInterpreter takes null value is treated as the least value, 
> while the above expression takes min as the greater value when it is null. 
> Thus, the real minimum value gets discarded if a null value comes later.
> max() could also be wrong if a different ColumnInterpreter other than 
> LongColumnInterpreter treats null value differently (as the greatest).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4415) Add configuration script for setup HBase (hbase-setup-conf.sh)


 [ 
https://issues.apache.org/jira/browse/HBASE-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4415:
-

Fix Version/s: (was: 0.94.1)
   (was: 0.92.2)
   0.92.3

> Add configuration script for setup HBase (hbase-setup-conf.sh)
> --
>
> Key: HBASE-4415
> URL: https://issues.apache.org/jira/browse/HBASE-4415
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
>Affects Versions: 0.90.4, 0.92.0
> Environment: Java 6, Linux
>Reporter: Eric Yang
>Assignee: Eric Yang
> Fix For: 0.92.3
>
> Attachments: HBASE-4415-1.patch, HBASE-4415-2.patch, 
> HBASE-4415-3.patch, HBASE-4415-4.patch, HBASE-4415-5.patch, 
> HBASE-4415-6.patch, HBASE-4415-7.patch, HBASE-4415-8.patch, 
> HBASE-4415-9.patch, HBASE-4415.patch
>
>
> The goal of this jura is to provide a installation script for configuring 
> HBase environment and configuration.  By using the same pattern of 
> *-setup-conf.sh for all Hadoop related projects.  For HBase, the usage of the 
> script looks like this:
> {noformat}
> usage: ./hbase-setup-conf.sh 
>   Optional parameters:
> --hadoop-conf=/etc/hadoopSet Hadoop configuration directory 
> location
> --hadoop-home=/usr   Set Hadoop directory location
> --hadoop-namenode=localhost  Set Hadoop namenode hostname
> --hadoop-replication=3   Set HDFS replication
> --hbase-home=/usrSet HBase directory location
> --hbase-conf=/etc/hbase  Set HBase configuration 
> directory location
> --hbase-log=/var/log/hbase   Set HBase log directory location
> --hbase-pid=/var/run/hbase   Set HBase pid directory location
> --hbase-user=hbase   Set HBase user
> --java-home=/usr/java/defaultSet JAVA_HOME directory location
> --kerberos-realm=KERBEROS.EXAMPLE.COMSet Kerberos realm
> --kerberos-principal-id=_HOSTSet Kerberos principal ID 
> --keytab-dir=/etc/security/keytabs   Set keytab directory
> --regionservers=localhostSet regionservers hostnames
> --zookeeper-home=/usrSet ZooKeeper directory location
> --zookeeper-quorum=localhost Set ZooKeeper Quorum
> --zookeeper-snapshot=/var/lib/zookeeper  Set ZooKeeper snapshot location
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.


 [ 
https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5578:
-

Fix Version/s: (was: 0.92.2)
   0.92.3

> NPE when regionserver reported server load, caused rs stop.
> ---
>
> Key: HBASE-5578
> URL: https://issues.apache.org/jira/browse/HBASE-5578
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.92.0
> Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0
>Reporter: Storm Lee
>Priority: Critical
> Fix For: 0.92.3
>
> Attachments: 5589.txt
>
>
> The regeionserver log:
> 2012-03-11 11:55:37,808 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> data3,60020,1331286604591: Unhandled exception: null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678)
>   at java.lang.Thread.run(Thread.java:662)
> 2012-03-11 11:55:37,808 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: 
> loaded coprocessors are: []
> 2012-03-11 11:55:37,808 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
> requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, 
> numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, 
> totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, 
> memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, 
> compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, 
> blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, 
> blockCacheHitCount=87713, blockCacheMissCount=22144560, 
> blockCacheEvictedCount=122, blockCacheHitRatio=0%, 
> blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100
> 2012-03-11 11:55:37,992 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled 
> exception: null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places


 [ 
https://issues.apache.org/jira/browse/HBASE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4523:
-

Fix Version/s: (was: 0.92.2)
   0.92.3

> dfs.support.append config should be present in the hadoop configs, we should 
> remove them from hbase so the user is not confused when they see the config 
> in 2 places
> 
>
> Key: HBASE-4523
> URL: https://issues.apache.org/jira/browse/HBASE-4523
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4, 0.92.0
>Reporter: Arpit Gupta
>Assignee: Eric Yang
> Fix For: 0.92.3
>
> Attachments: HBASE-4523.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4467) Handle inconsistencies in Hadoop libraries naming in hbase script


 [ 
https://issues.apache.org/jira/browse/HBASE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4467:
-

Fix Version/s: (was: 0.92.2)
   0.92.3

> Handle inconsistencies in Hadoop libraries naming in hbase script
> -
>
> Key: HBASE-4467
> URL: https://issues.apache.org/jira/browse/HBASE-4467
> Project: HBase
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Lars George
>Assignee: Lars George
>Priority: Trivial
> Fix For: 0.92.3
>
> Attachments: HBASE-4467.patch
>
>
> When using an Hadoop tarball that has a library naming of "hadoop-x.y.z-core" 
> as opposed to "hadoop-core-x.y.z" then the hbase script throws errors.
> {noformat}
> $ bin/start-hbase.sh 
> ls: /projects/opensource/hadoop-0.20.2-append/hadoop-core*.jar: No such file 
> or directory
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/util/PlatformName
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.util.PlatformName
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> ls: /projects/opensource/hadoop-0.20.2-append/hadoop-core*.jar: No such file 
> or directory
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/util/PlatformName
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.util.PlatformName
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> localhost: starting zookeeper, logging to 
> /projects/opensource/hbase-trunk-rw//logs/hbase-larsgeorge-zookeeper-de1-app-mbp-2.out
> localhost: /projects/opensource/hadoop-0.20.2-append
> localhost: ls: /projects/opensource/hadoop-0.20.2-append/hadoop-core*.jar: No 
> such file or directory
> localhost: Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/util/PlatformName
> localhost: Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.util.PlatformName
> localhost:at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> localhost:at java.security.AccessController.doPrivileged(Native Method)
> localhost:at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> localhost:at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> localhost:at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> localhost:at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> starting master, logging to 
> /projects/opensource/hbase-trunk-rw/bin/../logs/hbase-larsgeorge-master-de1-app-mbp-2.out
> /projects/opensource/hadoop-0.20.2-append
> ls: /projects/opensource/hadoop-0.20.2-append/hadoop-core*.jar: No such file 
> or directory
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/util/PlatformName
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.util.PlatformName
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> localhost: starting regionserver, logging to 
> /projects/opensource/hbase-trunk-rw//logs/hbase-larsgeorge-regionserver-de1-app-mbp-2.out
> localhost: /projects/opensource/hadoop-0.20.2-append
> localhost: ls: /projects/opensource/hadoop-0.20.2-append/hadoop-core*.jar: No 
> such file or directory
> localhost: Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/util/PlatformName
> localhost: Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.util.PlatformName
> localhost:at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> localhost:at java.security.AccessController.doPrivileged(Native Method)
> localhost:at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> localhost:at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> lo

[jira] [Updated] (HBASE-4457) Starting region server on non-default info port is resulting in broken URL's in master UI


 [ 
https://issues.apache.org/jira/browse/HBASE-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4457:
-

Fix Version/s: (was: 0.94.1)
   (was: 0.92.2)
   0.92.3

> Starting region server on non-default info port is resulting in broken URL's 
> in master UI
> -
>
> Key: HBASE-4457
> URL: https://issues.apache.org/jira/browse/HBASE-4457
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: Praveen Patibandla
>  Labels: newbie
> Fix For: 0.92.3
>
> Attachments: 4457-V1.patch, 4457.patch
>
>
> When  "hbase.regionserver.info.port" is set to non-default port, Master UI 
> has broken URL's in the region server table because it's hard coded to 
> default port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287489#comment-13287489
 ] 

Zhihong Yu edited comment on HBASE-6060 at 6/1/12 4:47 PM:
---

This patch is also a backport of HBASE-5396.  But this is more exhaustive and 
also tries to address HBASE-5816.
HBASE-6147 has been raised to solve other assign related issues that comes from 
SSH and joincluster.  Pls review and provide your comments.

  was (Author: rajesh23):
This patch is also a backport of HBASe-5396.  But this is more exhaustive 
and also tries to address HBASE-5816.
HBASE-6147 has been raised to solve other assign related issues that comes from 
SSH and joincluster.  Pls review and provide your comments.
  
> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287526#comment-13287526
 ] 

Zhihong Yu commented on HBASE-6046:
---

I ran the new test and it passed.
{code}
   }
+  public void finishInitialization() {
+finishInitialization(false);
{code}
Please add javadoc for the above method. Leave one empty line between the 
previous method and finishInitialization().

In test code:
{code}
+  public static class MockLoadBalancer extends DefaultLoadBalancer {
{code}
The above class can be private.


> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5360) [uberhbck] Add options for how to handle offline split parents.

2012-06-01 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-5360:
--

Assignee: Jimmy Xiang

> [uberhbck] Add options for how to handle offline split parents. 
> 
>
> Key: HBASE-5360
> URL: https://issues.apache.org/jira/browse/HBASE-5360
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Affects Versions: 0.90.7, 0.92.1, 0.94.0
>Reporter: Jonathan Hsieh
>Assignee: Jimmy Xiang
>
> In a recent case, we attempted to repair a cluster that suffered from 
> HBASE-4238 that had about 6-7 generations of "leftover" split data.  The hbck 
> repair options in an development version of HBASE-5128 treat HDFS as ground 
> truth but didn't check SPLIT and OFFLINE flags only found in meta.  The net 
> effect was that it essentially attempted to merge many regions back into its 
> eldest geneneration's parent's range.  
> More safe guards to prevent "mega-merges" are being added on HBASE-5128.
> This issue would automate the handling of the "mega-merge" avoiding cases 
> such as "lingering grandparents".  The strategy here would be to add more 
> checks against .META., and perform part of the catalog janitor's 
> responsibilities for lingering grandparents.  This would potentially include 
> options to sideline regions, deleting grandparent regions, min size for 
> sidelining, and mechanisms for cleaning .META..  
> Note: There already exists an mechanism to reload these regions -- the bulk 
> loaded mechanisms in LoadIncrementalHFiles can be used to re-add grandparents 
> (automatically splitting them if necessary) to HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread rajeshbabu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287489#comment-13287489
 ] 

rajeshbabu commented on HBASE-6060:
---

This patch is also a backport of HBASe-5396.  But this is more exhaustive and 
also tries to address HBASE-5816.
HBASE-6147 has been raised to solve other assign related issues that comes from 
SSH and joincluster.  Pls review and provide your comments.

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-06-01 Thread rajeshbabu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6060:
--

Attachment: HBASE-6060-94.patch

> Regions's in OPENING state from failed regionservers takes a long time to 
> recover
> -
>
> Key: HBASE-6060
> URL: https://issues.apache.org/jira/browse/HBASE-6060
> Project: HBase
>  Issue Type: Bug
>  Components: master, regionserver
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Attachments: HBASE-6060-94.patch
>
>
> we have seen a pattern in tests, that the regions are stuck in OPENING state 
> for a very long time when the region server who is opening the region fails. 
> My understanding of the process: 
>  
>  - master calls rs to open the region. If rs is offline, a new plan is 
> generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
> master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
> HMaster.assign()
>  - RegionServer, starts opening a region, changes the state in znode. But 
> that znode is not ephemeral. (see ZkAssign)
>  - Rs transitions zk node from OFFLINE to OPENING. See 
> OpenRegionHandler.process()
>  - rs then opens the region, and changes znode from OPENING to OPENED
>  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
> state, and the master just waits for rs to change the region state, but since 
> rs is down, that wont happen. 
>  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
> against these kind of conditions. It periodically checks (every 10 sec by 
> default) the regions in transition to see whether they timedout 
> (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
> which explains what you and I are seeing. 
>  - ServerShutdownHandler in Master does not reassign regions in OPENING 
> state, although it handles other states. 
> Lowering that threshold from the configuration is one option, but still I 
> think we can do better. 
> Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287476#comment-13287476
 ] 

Hadoop QA commented on HBASE-6046:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12530559/HBASE_6046_0.94_1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2084//console

This message is automatically generated.

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287475#comment-13287475
 ] 

Ashutosh Jindal commented on HBASE-6046:


@Anoop 
Thanks for the review. Uplaoded patch addressing the comments.

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


 [ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-6046:
---

Hadoop Flags:   (was: Reviewed)
  Status: Patch Available  (was: Open)

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.92.1
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


 [ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-6046:
---

Status: Open  (was: Patch Available)

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.92.1
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


 [ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-6046:
---

Attachment: HBASE_6046_0.94_1.patch

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch, HBASE_6046_0.94_1.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287467#comment-13287467
 ] 

Ashutosh Jindal commented on HBASE-6046:


Please check the second testcase added 
testLogSplittingAfterMasterRecoveryDueToZKExpiry() .If the testcase is run 
without the patch , stackOverFlow exception is thrown.
{code}
java.lang.StackOverflowError
at java.lang.System.getProperty(System.java:647)
at sun.security.action.GetPropertyAction.run(GetPropertyAction.java:67)
at sun.security.action.GetPropertyAction.run(GetPropertyAction.java:32)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.PrintWriter.(PrintWriter.java:78)
at java.io.PrintWriter.(PrintWriter.java:62)
at 
org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:58)
at 
org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
at 
org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:485)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:623)
at 
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)
at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)
at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)
at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1274)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)
at 
org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:626)
at 
org.apache.hadoop.hbase.master.SplitLogManager.access$17(SplitLogManager.java:620)
at 
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1104)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)
at 
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)
at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)
at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)
at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1274)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)
at 
org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:626)
at 
org.apache.hadoop.hbase.master.SplitLogManager.access$17(SplitLogManager.java:620)
at 
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1104)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)
at 
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)
{code}

This is coming because the listener for splitLogManager is not registered after 
the master recovers from expired zk session.

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287460#comment-13287460
 ] 

Anoop Sam John commented on HBASE-6046:
---

One immediate comment after seeing the patch
{code}
+this.fileSystemManager = new MasterFileSystem(this, this, metrics, 
masterRecovery ? true
+: false);
{code}
Can pass the boolean variable masterRecovery  directly.

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287453#comment-13287453
 ] 

Hadoop QA commented on HBASE-6046:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12530554/HBASE_6046_0.94.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2083//console

This message is automatically generated.

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


[ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287451#comment-13287451
 ] 

Ashutosh Jindal commented on HBASE-6046:


Attached patch for 0.94 version. Please review and provide your 
suggestion/comments.

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


 [ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-6046:
---

Fix Version/s: (was: 0.94.1)
   (was: 0.92.2)
 Hadoop Flags: Reviewed
   Status: Patch Available  (was: Open)

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.92.1
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.


 [ 
https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-6046:
---

Attachment: HBASE_6046_0.94.patch

> Master retry on ZK session expiry causes inconsistent region assignments.
> -
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE_6046_0.94.patch
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the 
> RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or 
> backup comes up) all the node created in the ZK will now be tried to reassign 
> to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the 
> region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6138) HadoopQA not running findbugs [Trunk]


 [ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-6138:
--

Description: 
HadoopQA shows like
 -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
But not able to see any reports link

When I checked the console output for the build I can see
{code}
[INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common ---
[INFO] Fork Value is true
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] HBase . SUCCESS [1.890s]
[INFO] HBase - Common  FAILURE [2.238s]
[INFO] HBase - Server  SKIPPED
[INFO] HBase - Assembly .. SKIPPED
[INFO] HBase - Site .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 4.856s
[INFO] Finished at: Thu May 31 03:35:35 UTC 2012
[INFO] Final Memory: 23M/154M
[INFO] 
[ERROR] Could not find resource 
'${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
[ERROR] 
{code}
Because of this error Findbugs is getting run!



  was:
HadoopQA shows like
 -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
But not able to see any reports link
When I checked the console output for the build I can see
{code}
[INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common ---
[INFO] Fork Value is true
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] HBase . SUCCESS [1.890s]
[INFO] HBase - Common  FAILURE [2.238s]
[INFO] HBase - Server  SKIPPED
[INFO] HBase - Assembly .. SKIPPED
[INFO] HBase - Site .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 4.856s
[INFO] Finished at: Thu May 31 03:35:35 UTC 2012
[INFO] Final Memory: 23M/154M
[INFO] 
[ERROR] Could not find resource 
'${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
[ERROR] 
{code}



Summary: HadoopQA not running findbugs [Trunk]  (was: HadoopQA not 
showing the findbugs report[Trunk])

> HadoopQA not running findbugs [Trunk]
> -
>
> Key: HBASE-6138
> URL: https://issues.apache.org/jira/browse/HBASE-6138
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Anoop Sam John
> Fix For: 0.96.0
>
>
> HadoopQA shows like
>  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
> But not able to see any reports link
> When I checked the console output for the build I can see
> {code}
> [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
> ---
> [INFO] Fork Value is true
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] HBase . SUCCESS [1.890s]
> [INFO] HBase - Common  FAILURE [2.238s]
> [INFO] HBase - Server  SKIPPED
> [INFO] HBase - Assembly .. SKIPPED
> [INFO] HBase - Site .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 4.856s
> [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
> [INFO] Final Memory: 23M/154M
> [INFO] 
> 
> [ERROR] Could not find resource 
> '${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
> [ERROR] 
> {code}
> Because of this error Findbugs is getting run!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6145) Fix site target post modularization


[ 
https://issues.apache.org/jira/browse/HBASE-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287427#comment-13287427
 ] 

stack commented on HBASE-6145:
--

Test failures probably not related.

I can't commit this yet until I fix assembly.  Thinking on this more, the way 
I've done site up in parent might not jibe well doing assembly down in an 
assembly module (which I believe you cannot avoid).

I love maven!

> Fix site target post modularization
> ---
>
> Key: HBASE-6145
> URL: https://issues.apache.org/jira/browse/HBASE-6145
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Attachments: site.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6012) AssignmentManager#asyncSetOfflineInZooKeeper wouldn't force node offline


[ 
https://issues.apache.org/jira/browse/HBASE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287416#comment-13287416
 ] 

ramkrishna.s.vasudevan commented on HBASE-6012:
---

@Chunhui
Please check the issue HBASE-6147.  Along with the scenario mentioned there if 
we take the current patch here I feel the following problem can come
-> Because as per the current patch attached here ,we force the znode to 
OFFLINE state,  if there was any current assignment going on due to SSH in one 
of the RS that will get stopped because of znode version mismatch.
Internally that failure to OPEN regin will make the znode to FAILED_OPEN.  Now 
based on this master will again start a new assignment. So this along with the 
issue in HBASE-6147 may lead to double assignment.  So what we felt here is 
this patch should go along with some changes in HBASE-6147.  
Pls feel free to correct me if am wrong.

> AssignmentManager#asyncSetOfflineInZooKeeper wouldn't force node offline
> 
>
> Key: HBASE-6012
> URL: https://issues.apache.org/jira/browse/HBASE-6012
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.96.0
>
> Attachments: HBASE-6012.patch
>
>
> As the javadoc of method and the log message
> {code}
> /**
>* Set region as OFFLINED up in zookeeper asynchronously.
>*/
> boolean asyncSetOfflineInZooKeeper(
> ...
> master.abort("Unexpected ZK exception creating/setting node OFFLINE", e);
> ...
> }
> {code}
> I think AssignmentManager#asyncSetOfflineInZooKeeper should also force node 
> offline, just like AssignmentManager#setOfflineInZooKeeper do. Otherwise, it 
> may cause bulk assign failed which called this method.
> Error log on the master caused by the issue
> 2012-05-12 01:40:09,437 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=writetest,1YTQDPGLXBTICHOPQ6IL,1336590857771.674da422fc7cb9a7d42c74499ace1d93.
>  state=PENDING_CLOSE, ts=1336757876856 
> 2012-05-12 01:40:09,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x23736bf74780082 Async create of unassigned node for 
> 674da422fc7cb9a7d42c74499ace1d93 with OFFLINE state 
> 2012-05-12 01:40:09,446 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
>  rc != 0 for /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93 -- 
> retryable connectionloss -- FIX see 
> http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2 
> 2012-05-12 01:40:09,447 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Connectionloss writing unassigned at 
> /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93, rc=-110 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6147) SSH and AM.joinCluster leads to region assignment inconsistency in many cases.


[ 
https://issues.apache.org/jira/browse/HBASE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287410#comment-13287410
 ] 

ramkrishna.s.vasudevan commented on HBASE-6147:
---

We got the following case
-> Initially we had 2 RS and 1 Master with few regions
-> Stopped the cluster and restarted the master and 2 RS.
-> One of the RS znode was not yet deleted but the master started coming up.
-> Here we will now see that there is a server which dead and not yet expired 
so we wil call expireServer which inturn calls SSH.
-> After this the master sees this as a clean cluster startup.
-> Now SSH triggers one assignment and master startup starts bulk assignment.
-> Now when the znode is present already the Bulk assignment will make the 
master go down.
So we need to handle such cases.  Solving this should help us to solve most of 
the double assignment cases.  There can be more such scenarios.

> SSH and AM.joinCluster leads to region assignment inconsistency in many cases.
> --
>
> Key: HBASE-6147
> URL: https://issues.apache.org/jira/browse/HBASE-6147
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
>
> We are facing few issues in the master restart and SSH going in parallel.
> Chunhui also suggested that we need to rework on this part.  This JIRA is 
> aimed at solving all such possibilities of region assignment inconsistency

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6147) SSH and AM.joinCluster leads to region assignment inconsistency in many cases.

ramkrishna.s.vasudevan created HBASE-6147:
-

 Summary: SSH and AM.joinCluster leads to region assignment 
inconsistency in many cases.
 Key: HBASE-6147
 URL: https://issues.apache.org/jira/browse/HBASE-6147
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1


We are facing few issues in the master restart and SSH going in parallel.
Chunhui also suggested that we need to rework on this part.  This JIRA is aimed 
at solving all such possibilities of region assignment inconsistency

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6138) HadoopQA not showing the findbugs report[Trunk]


[ 
https://issues.apache.org/jira/browse/HBASE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287375#comment-13287375
 ] 

Anoop Sam John commented on HBASE-6138:
---

I think below change can fix the issue
In pom.xml
{code}
-
${parent.basedir}/dev-support/findbugs-exclude.xml
+
${parent.basedir}/../dev-support/findbugs-exclude.xml
{code}

> HadoopQA not showing the findbugs report[Trunk]
> ---
>
> Key: HBASE-6138
> URL: https://issues.apache.org/jira/browse/HBASE-6138
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.96.0
>Reporter: Anoop Sam John
> Fix For: 0.96.0
>
>
> HadoopQA shows like
>  -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.
> But not able to see any reports link
> When I checked the console output for the build I can see
> {code}
> [INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase-common 
> ---
> [INFO] Fork Value is true
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] HBase . SUCCESS [1.890s]
> [INFO] HBase - Common  FAILURE [2.238s]
> [INFO] HBase - Server  SKIPPED
> [INFO] HBase - Assembly .. SKIPPED
> [INFO] HBase - Site .. SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 4.856s
> [INFO] Finished at: Thu May 31 03:35:35 UTC 2012
> [INFO] Final Memory: 23M/154M
> [INFO] 
> 
> [ERROR] Could not find resource 
> '${parent.basedir}/dev-support/findbugs-exclude.xml'. -> [Help 1]
> [ERROR] 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6146) Disabling of Catalog tables should not be allowed


[ 
https://issues.apache.org/jira/browse/HBASE-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287373#comment-13287373
 ] 

Anoop Sam John commented on HBASE-6146:
---

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.replication.TestReplication
This seems not at all related to this patch

-1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.
This is coming because of HBASE-6138


> Disabling of Catalog tables should not be allowed
> -
>
> Key: HBASE-6146
> URL: https://issues.apache.org/jira/browse/HBASE-6146
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-6146_94.patch, HBASE-6146_Trunk.patch
>
>
> HBaseAdmin#disableTable() when called with META or ROOT table, it will pass 
> the disable instruction to Master and table is actually getting disabled. 
> Later this API call will fail as there is a call to 
> HBaseAdmin#isTableDisabled() which is having a check like 
> isLegalTableName(tableName).So this call makes the catalog table to be in 
> disabled state.
> We can have same kind of isLegalTableName(tableName) checks in disableTable() 
> and enableTable() APIs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6116) Allow parallel HDFS writes for HLogs.

2012-06-01 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6116:
-

Attachment: 6116-v1.txt

Initial patch, which includes HBASE-5954.
This also fixes building HBase trunk with Hadoop trunk (3.0.0-SNAPSHOT).

In order to test, HDFS-1783 needs to be applied to Hadoop (trunk) first.
Then build Hadoop with:
mvn -Pnative -Pdist -Dtar -DskipTests install
And then HBase with:
mvn -DskipTests -Dhadoop.profile=3.0 ...

Parallel writes can be enable in hbase-site.xml with:
hbase.regionserver.wal.parallel.writes

Since this patch include HBASE-5954, durable sync can also be enabled:
hbase.regionserver.wal.durable.sync
hbase.regionserver.hfile.durable.sync

(all options can be set to "true")

@Andy: If your offer to do a quick test in EC2 still stands that'd be awesome!


> Allow parallel HDFS writes for HLogs.
> -
>
> Key: HBASE-6116
> URL: https://issues.apache.org/jira/browse/HBASE-6116
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 6116-v1.txt
>
>
> In HDFS-1783 I adapted Dhrubas changes to be used in Hadoop trunk.
> This issue will include the necessary reflection changes to optionally enable 
> this for the WALs in HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6146) Disabling of Catalog tables should not be allowed


[ 
https://issues.apache.org/jira/browse/HBASE-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287350#comment-13287350
 ] 

Hadoop QA commented on HBASE-6146:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12530537/HBASE-6146_Trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2082//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2082//console

This message is automatically generated.

> Disabling of Catalog tables should not be allowed
> -
>
> Key: HBASE-6146
> URL: https://issues.apache.org/jira/browse/HBASE-6146
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-6146_94.patch, HBASE-6146_Trunk.patch
>
>
> HBaseAdmin#disableTable() when called with META or ROOT table, it will pass 
> the disable instruction to Master and table is actually getting disabled. 
> Later this API call will fail as there is a call to 
> HBaseAdmin#isTableDisabled() which is having a check like 
> isLegalTableName(tableName).So this call makes the catalog table to be in 
> disabled state.
> We can have same kind of isLegalTableName(tableName) checks in disableTable() 
> and enableTable() APIs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6146) Disabling of Catalog tables should not be allowed


 [ 
https://issues.apache.org/jira/browse/HBASE-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-6146:
--

Attachment: HBASE-6146_Trunk.patch
HBASE-6146_94.patch

Patches for 0.94 and Trunk

> Disabling of Catalog tables should not be allowed
> -
>
> Key: HBASE-6146
> URL: https://issues.apache.org/jira/browse/HBASE-6146
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-6146_94.patch, HBASE-6146_Trunk.patch
>
>
> HBaseAdmin#disableTable() when called with META or ROOT table, it will pass 
> the disable instruction to Master and table is actually getting disabled. 
> Later this API call will fail as there is a call to 
> HBaseAdmin#isTableDisabled() which is having a check like 
> isLegalTableName(tableName).So this call makes the catalog table to be in 
> disabled state.
> We can have same kind of isLegalTableName(tableName) checks in disableTable() 
> and enableTable() APIs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6146) Disabling of Catalog tables should not be allowed