[jira] [Updated] (HBASE-7757) Add web UI to REST server and Thrift server

2013-02-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7757:
---

Attachment: thrift-0.94.png

 Add web UI to REST server and Thrift server
 ---

 Key: HBASE-7757
 URL: https://issues.apache.org/jira/browse/HBASE-7757
 Project: HBase
  Issue Type: Improvement
  Components: REST, Thrift, UI
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 0.94_7757_v1.patch, rest-0.94.png, rest-0.96.png, 
 thrift-0.94.png, thrift-0.96.png


 Add Hadoop HttpServer (web UI) to REST server and Thrift server. The Hadoop 
 HttpServer supports metrics/jmx/conf/logLevel/stacks, which is useful to 
 monitor REST/Thrift server.
 For REST server, use a separate listener/context to avoid path mapping 
 conflicts. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7757) Add web UI to REST server and Thrift server

2013-02-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7757:
---

Attachment: trunk-7757_v1.patch

 Add web UI to REST server and Thrift server
 ---

 Key: HBASE-7757
 URL: https://issues.apache.org/jira/browse/HBASE-7757
 Project: HBase
  Issue Type: Improvement
  Components: REST, Thrift, UI
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 0.94_7757_v1.patch, rest-0.94.png, rest-0.96.png, 
 thrift-0.94.png, thrift-0.96.png, trunk-7757_v1.patch


 Add Hadoop HttpServer (web UI) to REST server and Thrift server. The Hadoop 
 HttpServer supports metrics/jmx/conf/logLevel/stacks, which is useful to 
 monitor REST/Thrift server.
 For REST server, use a separate listener/context to avoid path mapping 
 conflicts. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7757) Add web UI to REST server and Thrift server

2013-02-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7757:
---

Status: Patch Available  (was: Open)

 Add web UI to REST server and Thrift server
 ---

 Key: HBASE-7757
 URL: https://issues.apache.org/jira/browse/HBASE-7757
 Project: HBase
  Issue Type: Improvement
  Components: REST, Thrift, UI
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 0.94_7757_v1.patch, rest-0.94.png, rest-0.96.png, 
 thrift-0.94.png, thrift-0.96.png, trunk-7757_v1.patch


 Add Hadoop HttpServer (web UI) to REST server and Thrift server. The Hadoop 
 HttpServer supports metrics/jmx/conf/logLevel/stacks, which is useful to 
 monitor REST/Thrift server.
 For REST server, use a separate listener/context to avoid path mapping 
 conflicts. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7776) Use ErrorReporter instead of Log/System.out in hbck

2013-02-05 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-7776:
--

 Summary: Use ErrorReporter instead of Log/System.out in hbck
 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Hbck has a pluggable ErrorReporter. However, there are still lots places to log 
messages with LOG/System.out.  We should use ErrorReporter instead, so that it 
can catch all information from hbck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7776:
---

Description: There are lots places to log messages with System.out.  We 
should use ErrorReporter or Log instead, which can be configured to catch what 
we want.  (was: Hbck has a pluggable ErrorReporter. However, there are still 
lots places to log messages with LOG/System.out.  We should use ErrorReporter 
instead, so that it can catch all information from hbck.)
Summary: Use ErrorReporter/Log instead of System.out in hbck  (was: Use 
ErrorReporter instead of Log/System.out in hbck)

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor

 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572533#comment-13572533
 ] 

Jimmy Xiang commented on HBASE-7776:


I noticed the limitation of ErrorReporter. Probably it is intended to work this 
way. The ErrorReporter is pluggable.  If someone plugs in his/her own reporter 
but it gets only limited information, it is probably not good.

For the usage of system.out and system.err in hbck, we can either convert it to 
LOG, or somehow enhanced ErrorReporter.

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor

 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7757) Add web UI to REST server and Thrift server

2013-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7757:
---

   Resolution: Fixed
Fix Version/s: 0.94.5
   0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Integrated into trunk and 0.94 (with minor fix). Thanks all for reviewing it.

 Add web UI to REST server and Thrift server
 ---

 Key: HBASE-7757
 URL: https://issues.apache.org/jira/browse/HBASE-7757
 Project: HBase
  Issue Type: Improvement
  Components: REST, Thrift, UI
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0, 0.94.5

 Attachments: 0.94_7757_v1.patch, rest-0.94.png, rest-0.96.png, 
 thrift-0.94.png, thrift-0.96.png, trunk-7757_v1.patch


 Add Hadoop HttpServer (web UI) to REST server and Thrift server. The Hadoop 
 HttpServer supports metrics/jmx/conf/logLevel/stacks, which is useful to 
 monitor REST/Thrift server.
 For REST server, use a separate listener/context to avoid path mapping 
 conflicts. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572597#comment-13572597
 ] 

Jimmy Xiang commented on HBASE-7776:


I'd like to encapsulate all of these (system.out/system.err) in ErrorReporter 
so that we don't directly using them.

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor

 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6782) HBase shell's 'status 'detailed'' should escape the printed keys

2013-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6782:
---

Assignee: Viji

 HBase shell's 'status 'detailed'' should escape the printed keys
 

 Key: HBASE-6782
 URL: https://issues.apache.org/jira/browse/HBASE-6782
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.90.1
Reporter: Viji
Assignee: Viji
Priority: Minor
 Attachments: HBASE-6782.patch


 Currently the HBase shell's status command prints unescaped keys on the 
 terminal causing the terminal to print garbage characters. We should escape 
 the printed keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6782) HBase shell's 'status 'detailed'' should escape the printed keys

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572714#comment-13572714
 ] 

Jimmy Xiang commented on HBASE-6782:


+1

 HBase shell's 'status 'detailed'' should escape the printed keys
 

 Key: HBASE-6782
 URL: https://issues.apache.org/jira/browse/HBASE-6782
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.90.1
Reporter: Viji
Assignee: Viji
Priority: Minor
 Attachments: HBASE-6782.patch


 Currently the HBase shell's status command prints unescaped keys on the 
 terminal causing the terminal to print garbage characters. We should escape 
 the printed keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572736#comment-13572736
 ] 

Jimmy Xiang commented on HBASE-7698:


The code doesn't look efficient to me (an existing issue of course).  Not a big 
deal.  I was wondering if we can move calling 
tryTransitionFromOpeningToFailedOpen(regionInfo) to the final block, so that we 
don't need the local variable transitionToFailedOpen, and we can cover all 
scenarios as long as openSuccessful is not true. @Ram, what do you think?

 race between RS shutdown thread and openregionhandler causes region to get 
 stuck
 

 Key: HBASE-7698
 URL: https://issues.apache.org/jira/browse/HBASE-7698
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.4
Reporter: Sergey Shelukhin
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, 
 HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, 
 HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch


 2013-01-22 17:59:03,237 INFO  [Shutdown of 
 org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] 
 hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing 
 fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08
 ...
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): 
 Closing 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.:
  disabling compactions amp; flushes
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): 
 Updates disabled for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:03,415 ERROR 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.io.IOException: java.io.IOException: java.io.IOException: Filesystem 
 closed
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never 
 called and region can get stuck.
 As an added benefit, the meta is already written by that time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572757#comment-13572757
 ] 

Jimmy Xiang commented on HBASE-7698:


Maybe we move cleanupFailedOpen to the final block as well.  Both are for clean 
up after some issues, which sounds to me better to be in the final block so 
that it's done all the time if not open successfully.

We can address this in a separate jira if you prefer.

 race between RS shutdown thread and openregionhandler causes region to get 
 stuck
 

 Key: HBASE-7698
 URL: https://issues.apache.org/jira/browse/HBASE-7698
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.4
Reporter: Sergey Shelukhin
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, 
 HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, 
 HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch


 2013-01-22 17:59:03,237 INFO  [Shutdown of 
 org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] 
 hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing 
 fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08
 ...
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): 
 Closing 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.:
  disabling compactions amp; flushes
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): 
 Updates disabled for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:03,415 ERROR 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.io.IOException: java.io.IOException: java.io.IOException: Filesystem 
 closed
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never 
 called and region can get stuck.
 As an added benefit, the meta is already written by that time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572812#comment-13572812
 ] 

Jimmy Xiang commented on HBASE-7698:


Yes, that's right.

 race between RS shutdown thread and openregionhandler causes region to get 
 stuck
 

 Key: HBASE-7698
 URL: https://issues.apache.org/jira/browse/HBASE-7698
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.4
Reporter: Sergey Shelukhin
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, 
 HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, 
 HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch


 2013-01-22 17:59:03,237 INFO  [Shutdown of 
 org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] 
 hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing 
 fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08
 ...
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): 
 Closing 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.:
  disabling compactions amp; flushes
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): 
 Updates disabled for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:03,415 ERROR 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.io.IOException: java.io.IOException: java.io.IOException: Filesystem 
 closed
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never 
 called and region can get stuck.
 As an added benefit, the meta is already written by that time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7678) make storefile management pluggable, together with compaction

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572909#comment-13572909
 ] 

Jimmy Xiang commented on HBASE-7678:


This is a big change. Do we have a design doc? It will be great if you can 
outline the purpose of the change, some design choices, the relationship among 
those entities such as StoreEngine, StoreFileManager, Compactor, 
CompactionPolicy, HStore, etc., how this change will be used for stripe/level 
compaction, and so on. Good stuff.

 make storefile management pluggable, together with compaction
 -

 Key: HBASE-7678
 URL: https://issues.apache.org/jira/browse/HBASE-7678
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7678--and-7603.patch, HBASE-7678-v0.patch, 
 HBASE-7678-v1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7741) Don't use bulk assigner if assigning just several regions

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572942#comment-13572942
 ] 

Jimmy Xiang commented on HBASE-7741:


It is read from configuration and passed to bulk assigner from AM.  It won't 
wait for ever if waitTillAllAssigned is set to true.  It will time out in some 
time if some regions stuck in transition. Since nobody depends on the regions 
to be assigned, I think it is better to let the bulk assigner run and not to 
wait for it. This will speed up the starts up a little. If there are several 
region servers crash at the same time, this can free up the SSH thread a little 
sooner too. 

 Don't use bulk assigner if assigning just several regions
 -

 Key: HBASE-7741
 URL: https://issues.apache.org/jira/browse/HBASE-7741
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: trunk-7741.patch, trunk-7741_v2.patch, 
 trunk-7741_v3_1.patch, trunk-7741_v3.patch


 If just assign one region, bulk assigner may be slower.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7776:
---

Attachment: trunk-7776_v1.patch

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-7776_v1.patch


 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7776:
---

Status: Patch Available  (was: Open)

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-7776_v1.patch


 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck

2013-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573083#comment-13573083
 ] 

Jimmy Xiang commented on HBASE-7698:


It is ok with me.

 race between RS shutdown thread and openregionhandler causes region to get 
 stuck
 

 Key: HBASE-7698
 URL: https://issues.apache.org/jira/browse/HBASE-7698
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.4
Reporter: Sergey Shelukhin
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, 
 HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, 
 HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch


 2013-01-22 17:59:03,237 INFO  [Shutdown of 
 org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] 
 hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing 
 fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08
 ...
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): 
 Closing 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.:
  disabling compactions amp; flushes
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): 
 Updates disabled for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:03,415 ERROR 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.io.IOException: java.io.IOException: java.io.IOException: Filesystem 
 closed
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never 
 called and region can get stuck.
 As an added benefit, the meta is already written by that time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7698) race between RS shutdown thread and openregionhandler causes region to get stuck

2013-02-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13573683#comment-13573683
 ] 

Jimmy Xiang commented on HBASE-7698:


@Ram, sure, +1 for commit.

 race between RS shutdown thread and openregionhandler causes region to get 
 stuck
 

 Key: HBASE-7698
 URL: https://issues.apache.org/jira/browse/HBASE-7698
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.4
Reporter: Sergey Shelukhin
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7698_0.94.patch, HBASE-7698.patch, 
 HBASE-7698_trunk_final.patch, HBASE-7698_withtestcase_1.patch, 
 HBASE-7698_withtestcase_1.patch, HBASE-7698_withtestcase.patch


 2013-01-22 17:59:03,237 INFO  [Shutdown of 
 org.apache.hadoop.hbase.fs.HFileSystem@5984cf08] 
 hbase.MiniHBaseCluster$SingleFileSystemShutdownThread(186): Hook closing 
 fs=org.apache.hadoop.hbase.fs.HFileSystem@5984cf08
 ...
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1001): 
 Closing 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.:
  disabling compactions amp; flushes
 2013-01-22 17:59:03,411 DEBUG 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] regionserver.HRegion(1023): 
 Updates disabled for region 
 IntegrationTestRebalanceAndKillServersTargeted,6660,1358906196709.0200b366bc37c5afd1185f7d487c7dfb.
 2013-01-22 17:59:03,415 ERROR 
 [RS_OPEN_REGION-10.11.2.92,50661,1358906192942-0] executor.EventHandler(205): 
 Caught throwable while processing event M_RS_OPEN_REGION
 java.io.IOException: java.io.IOException: java.io.IOException: Filesystem 
 closed
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1058)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:974)
   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:945)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.cleanupFailedOpen(OpenRegionHandler.java:459)
   at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:143)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:202)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 tryTransitionFromOpeningToFailedOpen or transitionToOpened below is never 
 called and region can get stuck.
 As an added benefit, the meta is already written by that time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7741) Don't use bulk assigner if assigning just several regions

2013-02-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7741:
---

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Integrated into trunk. Thanks Stack and Ram for the review.

 Don't use bulk assigner if assigning just several regions
 -

 Key: HBASE-7741
 URL: https://issues.apache.org/jira/browse/HBASE-7741
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: trunk-7741.patch, trunk-7741_v2.patch, 
 trunk-7741_v3_1.patch, trunk-7741_v3.patch


 If just assign one region, bulk assigner may be slower.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6782) HBase shell's 'status 'detailed'' should escape the printed keys

2013-02-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6782:
---

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Integrated into trunk.  Thanks Viji for the patch.

 HBase shell's 'status 'detailed'' should escape the printed keys
 

 Key: HBASE-6782
 URL: https://issues.apache.org/jira/browse/HBASE-6782
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.90.1
Reporter: Viji
Assignee: Viji
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6782.patch


 Currently the HBase shell's status command prints unescaped keys on the 
 terminal causing the terminal to print garbage characters. We should escape 
 the printed keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7776:
---

Status: Open  (was: Patch Available)

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.patch


 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7776:
---

Attachment: 0.94-7776_v1.1.patch

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.patch


 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7776:
---

Attachment: trunk-7776_v1.1.patch

Minor change. I verified that nothing goes to console using a mock error 
reporter, and with log4j going to a file.

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.1.patch, 
 trunk-7776_v1.patch


 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7776:
---

Fix Version/s: 0.94.5
   0.96.0
   Status: Patch Available  (was: Open)

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0, 0.94.5

 Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.1.patch, 
 trunk-7776_v1.patch


 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7776) Use ErrorReporter/Log instead of System.out in hbck

2013-02-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7776:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Integrated into trunk and 0.94.  Thanks Jon for reviewing it.

 Use ErrorReporter/Log instead of System.out in hbck
 ---

 Key: HBASE-7776
 URL: https://issues.apache.org/jira/browse/HBASE-7776
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0, 0.94.5

 Attachments: 0.94-7776_v1.1.patch, trunk-7776_v1.1.patch, 
 trunk-7776_v1.patch


 There are lots places to log messages with System.out.  We should use 
 ErrorReporter or Log instead, which can be configured to catch what we want.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7784) move the code related to selection that is specific to default compaction policy, into default compaction policy (from HStore)

2013-02-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574138#comment-13574138
 ] 

Jimmy Xiang commented on HBASE-7784:


Looks good to me. One question, why do we want to do that?
{code}
+  if (priority == Store.PRIORITY_USER) {
+++priority; // System compactions cannot have user priority, make less 
important.
+  }
{code}

If there are lots of files to compact, priority could be negative here, which 
is higher than PRIORITY_USER, right?

 move the code related to selection that is specific to default compaction 
 policy, into default compaction policy (from HStore)
 --

 Key: HBASE-7784
 URL: https://issues.apache.org/jira/browse/HBASE-7784
 Project: HBase
  Issue Type: Sub-task
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7784-v0.patch


 There are some TODO [level compaction] in HBASE-7603 patch, there may also be 
 few other similar places.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6976) Change assignment related logging to TRACE level

2012-10-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-6976.


Resolution: Won't Fix

Based on my testing on the latest trunk code, turning off the logging seems not 
to help the assignment performance very much.  This may be because the code has 
been changed a lot, compared to 0.94. We can look into this again if we have 
clear proof this will improve the performance.

 Change assignment related logging to TRACE level
 

 Key: HBASE-6976
 URL: https://issues.apache.org/jira/browse/HBASE-6976
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Priority: Minor
  Labels: noob

 As JD and Elliot mentioned, turning off ZKAssign logging can improve AM 
 performance a lot.  During the testing, I also noticed that: after all 
 regions are already opened, master UI still shows lots of regions in 
 transition. It is because AM hasn't finished the ZK event processing yet.
 Changing the logging level from debug to trace will improve AM performance. 
 With HBASE-6611, I think AM is getting stable and reliable. I hope we don't 
 need to see these logging any more.
 The logging is still available after turning trace logging level on for AM 
 and ZKAssign class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6611) Forcing region state offline cause double assignment

2012-10-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6611:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Integrated into trunk.  TestHBaseFsck is fine locally.

 Forcing region state offline cause double assignment
 

 Key: HBASE-6611
 URL: https://issues.apache.org/jira/browse/HBASE-6611
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: trunk-6611_v2.patch, trunk-6611_v5.patch


 In assigning a region, assignment manager forces the region state offline if 
 it is not. This could cause double assignment, for example, if the region is 
 already assigned and in the Open state, you should not just change it's state 
 to Offline, and assign it again.
 I think this could be the root cause for all double assignments IF the region 
 state is reliable.
 After this loophole is closed, TestHBaseFsck should come up a different way 
 to create some assignment inconsistencies, for example, calling region server 
 to open a region directly. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7022) Use multi to batch offline regions in zookeeper

2012-10-19 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-7022:
--

 Summary: Use multi to batch offline regions in zookeeper
 Key: HBASE-7022
 URL: https://issues.apache.org/jira/browse/HBASE-7022
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Bulk assigner needs to set regions offline in zookeeper one by one. I was 
wondering if we can have some performance improvement if we batch these 
operations using ZooKeeper#multi.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Priority: Minor  (was: Major)

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor

 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Attachment: trunk-6977_v1.patch

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-19 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Status: Patch Available  (was: Open)

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6977) Multithread processing ZK assignment events

2012-10-19 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480634#comment-13480634
 ] 

Jimmy Xiang commented on HBASE-6977:


The patch is posted on RB: https://reviews.apache.org/r/7682/ for easy review.

In the meantime, I was thinking about HBASE-7022, using multi to batch those zk 
operations in bulk assigner.
I think it will improve the performance a little.  I need to play with it and 
find out.

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-20 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Status: Open  (was: Patch Available)

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6977) Multithread processing ZK assignment events

2012-10-22 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481815#comment-13481815
 ] 

Jimmy Xiang commented on HBASE-6977:


Thanks Ted for the review.  I posted the second patch to RB: 
https://reviews.apache.org/r/7682/

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6744) Per table balancing could cause regions unbalanced overall

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-6744.


Resolution: Won't Fix

 Per table balancing could cause regions unbalanced overall
 --

 Key: HBASE-6744
 URL: https://issues.apache.org/jira/browse/HBASE-6744
 Project: HBase
  Issue Type: Improvement
Reporter: Jimmy Xiang

 Per table balancing just balances regions based on tables.  However, overall, 
 regions could be seriously unbalanced.
 For example, if you shutdown all most all region serves in a cluster, then 
 create tons of new tables (no region pre-split), then start up all region 
 servers.  You will see the regions won't move to other region servers since 
 they are balanced per table (only one region for a table at this moment).
 If we can make the balance algorithm sophisticated enough, we don't need the 
 configuration hbase.master.loadbalance.bytable.  We can do the regular and 
 bytable balancing at the same time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling if possible

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6896:
---

Priority: Minor  (was: Major)
Assignee: Jimmy Xiang
 Summary: sync bulk and regular assigment handling if possible  (was: sync 
bulk and regular assigment handling for RegionAlreadyInTransitionException)

 sync bulk and regular assigment handling if possible
 

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor

 Currently, for bulk assignment, when a region is already in transition, it 
 thinks the region is opened. However, in regular assignment, it throws 
 RegionAlreadyInTransitionException.
 In regular assignment, in case of RegionAlreadyInTransitionException, it 
 tries to call openRegion again and again without change the region plan, ZK 
 offline node,
 till the region is out of transition.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6896) sync bulk and regular assigment handling if possible

2012-10-22 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481886#comment-13481886
 ] 

Jimmy Xiang commented on HBASE-6896:


Thought about this and I think the existing RegionAlreadyInTransitionException 
handling in bulk assigner is fine.  Instead, probably there is no need to retry 
in the regular assignment instead.  So I changed the title of the jira.

Instead, I will sync up the socket.timeout exception handling.

 sync bulk and regular assigment handling if possible
 

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor

 Currently, for bulk assignment, when a region is already in transition, it 
 thinks the region is opened. However, in regular assignment, it throws 
 RegionAlreadyInTransitionException.
 In regular assignment, in case of RegionAlreadyInTransitionException, it 
 tries to call openRegion again and again without change the region plan, ZK 
 offline node,
 till the region is out of transition.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling if possible

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6896:
---

Attachment: trunk-6896.patch

 sync bulk and regular assigment handling if possible
 

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch


 Currently, for bulk assignment, when a region is already in transition, it 
 thinks the region is opened. However, in regular assignment, it throws 
 RegionAlreadyInTransitionException.
 In regular assignment, in case of RegionAlreadyInTransitionException, it 
 tries to call openRegion again and again without change the region plan, ZK 
 offline node,
 till the region is out of transition.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling if possible

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6896:
---

Status: Patch Available  (was: Open)

 sync bulk and regular assigment handling if possible
 

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch


 Currently, for bulk assignment, when a region is already in transition, it 
 thinks the region is opened. However, in regular assignment, it throws 
 RegionAlreadyInTransitionException.
 In regular assignment, in case of RegionAlreadyInTransitionException, it 
 tries to call openRegion again and again without change the region plan, ZK 
 offline node,
 till the region is out of transition.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7022) Use multi to batch offline regions in zookeeper

2012-10-22 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481969#comment-13481969
 ] 

Jimmy Xiang commented on HBASE-7022:


Currently, I think ZooKeeper#multi doesn’t help very much since it's 
synchronous, transactional, and doesn’t support something like upsert in SQL 
(i.e.
create it if a node doesn’t exist, otherwise, update its data). It supports 
batch of size less than 1MB. If we can have an asynchronous, non-transactional 
multi zookeeper client function, which also supports upsert, it will really 
help.


 Use multi to batch offline regions in zookeeper
 ---

 Key: HBASE-7022
 URL: https://issues.apache.org/jira/browse/HBASE-7022
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 Bulk assigner needs to set regions offline in zookeeper one by one. I was 
 wondering if we can have some performance improvement if we batch these 
 operations using ZooKeeper#multi.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Attachment: trunk-6977_v2-1.patch

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-5119) Set the TimeoutMonitor's timeout back down

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-5119:
--

Assignee: (was: Jimmy Xiang)

Based on my testing for HBASE-6611/HBASE-6977, in my testing cluster, I can 
easily open 10+ regions on 4 nodes in around 4 minutes.  I think we are fine to 
set the timeout back to 5 minutes.

 Set the TimeoutMonitor's timeout back down
 --

 Key: HBASE-5119
 URL: https://issues.apache.org/jira/browse/HBASE-5119
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0


 The TimeoutMonitor used to be extremely racy and caused more troubles than it 
 fixed, but most of this has been fixed I believe in the context of 0.92 so I 
 think we should set it down back to a useful level. Currently it's 30 
 minutes, what should the new value be?
 I think 5 minutes should be good, will do some testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-7022) Use multi to batch offline regions in zookeeper

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-7022:
--

Assignee: (was: Jimmy Xiang)

 Use multi to batch offline regions in zookeeper
 ---

 Key: HBASE-7022
 URL: https://issues.apache.org/jira/browse/HBASE-7022
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang

 Bulk assigner needs to set regions offline in zookeeper one by one. I was 
 wondering if we can have some performance improvement if we batch these 
 operations using ZooKeeper#multi.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6880) Failure in assigning root causes system hang

2012-10-22 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-6880:
--

Assignee: Jimmy Xiang

 Failure in assigning root causes system hang
 

 Key: HBASE-6880
 URL: https://issues.apache.org/jira/browse/HBASE-6880
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 In looking into a TestReplication failure, I found out sometimes assignRoot 
 could fail, for example, RS is not serving traffic yet.  In this case, the 
 master will keep waiting for root to be available, which could never happen.
  
 Need to gracefully terminate master if root is not assigned properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception

2012-10-23 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6896:
---

Description: 
In regular assignment, in case of socket network timeout, it tries to call 
openRegion again and again without change the region plan, ZK offline node,
till the region is out of transition, in case the region server is still up.

We may need to sync them up and make sure bulk assignment does the same in this 
case.

  was:
Currently, for bulk assignment, when a region is already in transition, it 
thinks the region is opened. However, in regular assignment, it throws 
RegionAlreadyInTransitionException.

In regular assignment, in case of RegionAlreadyInTransitionException, it tries 
to call openRegion again and again without change the region plan, ZK offline 
node,
till the region is out of transition.

We may need to sync them up and make sure bulk assignment does the same in this 
case.

Summary: sync bulk and regular assigment handling socket timeout 
exception  (was: sync bulk and regular assigment handling if possible)

 sync bulk and regular assigment handling socket timeout exception
 -

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch


 In regular assignment, in case of socket network timeout, it tries to call 
 openRegion again and again without change the region plan, ZK offline node,
 till the region is out of transition, in case the region server is still up.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception

2012-10-23 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482527#comment-13482527
 ] 

Jimmy Xiang commented on HBASE-6896:


Thanks a lot for the review.  I updated the title and description of the bug, 
since we are not going to sync up the processing of already in transition 
exception.

I will change isn't gone - didn't go

As to include the region names, it is bulk assign.  There may be a lot of 
regions.



 sync bulk and regular assigment handling socket timeout exception
 -

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch


 In regular assignment, in case of socket network timeout, it tries to call 
 openRegion again and again without change the region plan, ZK offline node,
 till the region is out of transition, in case the region server is still up.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6880) Failure in assigning root causes system hang

2012-10-23 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-6880.


Resolution: Won't Fix

I thought about this issue again.  Probably it is right to keep waiting and let 
the timeout monitor to deal with it.

With HBASE-5119, the timeout out period will be dropped from 20 minutes to 5 
minutes, which is a little bit helpful.

The issue here is that we can't see the assignment of root is completely 
failed. At least it is put in transition.  TM will fix it.  The same assign 
call is used for user regions as well.  Changing it to return true/false may be 
confusing.

 Failure in assigning root causes system hang
 

 Key: HBASE-6880
 URL: https://issues.apache.org/jira/browse/HBASE-6880
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 In looking into a TestReplication failure, I found out sometimes assignRoot 
 could fail, for example, RS is not serving traffic yet.  In this case, the 
 master will keep waiting for root to be available, which could never happen.
  
 Need to gracefully terminate master if root is not assigned properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception

2012-10-23 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482742#comment-13482742
 ] 

Jimmy Xiang commented on HBASE-6896:


Thanks for the review. Yes, handling is the same in case of socket timeout.  It 
is just condition checking, probably too simple to have a separate method.

 sync bulk and regular assigment handling socket timeout exception
 -

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch


 In regular assignment, in case of socket network timeout, it tries to call 
 openRegion again and again without change the region plan, ZK offline node,
 till the region is out of transition, in case the region server is still up.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception

2012-10-23 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6896:
---

Status: Open  (was: Patch Available)

 sync bulk and regular assigment handling socket timeout exception
 -

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch


 In regular assignment, in case of socket network timeout, it tries to call 
 openRegion again and again without change the region plan, ZK offline node,
 till the region is out of transition, in case the region server is still up.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception

2012-10-23 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6896:
---

Attachment: trunk-6896_v2.patch

 sync bulk and regular assigment handling socket timeout exception
 -

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch, trunk-6896_v2.patch


 In regular assignment, in case of socket network timeout, it tries to call 
 openRegion again and again without change the region plan, ZK offline node,
 till the region is out of transition, in case the region server is still up.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception

2012-10-23 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6896:
---

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 sync bulk and regular assigment handling socket timeout exception
 -

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch, trunk-6896_v2.patch


 In regular assignment, in case of socket network timeout, it tries to call 
 openRegion again and again without change the region plan, ZK offline node,
 till the region is out of transition, in case the region server is still up.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5162) Basic client pushback mechanism

2012-10-23 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482767#comment-13482767
 ] 

Jimmy Xiang commented on HBASE-5162:


Isn't the exception way much cleaner and simpler?  I think the exception way is 
greedy, and the hbase client code needs minimal change. Based on the retry 
count, it can adjust the delay time in the middle.

For the load monitoring, we assume the load trend remains the same, which may 
not be that case actually. The client side has to track the regions whose 
memstore is under pressure.  Every client needs to do the same tracking.

 Basic client pushback mechanism
 ---

 Key: HBASE-5162
 URL: https://issues.apache.org/jira/browse/HBASE-5162
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0

 Attachments: java_HBASE-5162.patch


 The current blocking we do when we are close to some limits (memstores over 
 the multiplier factor, too many store files, global memstore memory) is bad, 
 too coarse and confusing. After hitting HBASE-5161, it really becomes obvious 
 that we need something better.
 I did a little brainstorm with Stack, we came up quickly with two solutions:
  - Send some exception to the client, like OverloadedException, that's thrown 
 when some situation happens like getting past the low memory barrier. It 
 would be thrown when the client gets a handler and does some check while 
 putting or deleting. The client would treat this a retryable exception but 
 ideally wouldn't check .META. for a new location. It could be fancy and have 
 multiple levels of pushback, like send the exception to 25% of the clients, 
 and then go up if the situation persists. Should be easy to implement but 
 we'll be using a lot more IO to send the payload over and over again (but at 
 least it wouldn't sit in the RS's memory).
  - Send a message alongside a successful put or delete to tell the client to 
 slow down a little, this way we don't have to do back and forth with the 
 payload between the client and the server. It's a cleaner (I think) but more 
 involved solution.
 In every case the RS should do very obvious things to notify the operators of 
 this situation, through logs, web UI, metrics, etc.
 Other ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5162) Basic client pushback mechanism

2012-10-23 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482827#comment-13482827
 ] 

Jimmy Xiang commented on HBASE-5162:


I was thinking RegionTooBusyException, so close. As long as the server releases 
the IPC handler in such scenario, accessing other regions should not be 
blocked.  The point is that we don't want a busy region blocks a whole region 
server.

As to the old clients, right, the behavior is a little different.  But they 
should not fail, as currently they should expect exceptions and handle them 
properly, for example, retry, although it may not be as efficient as delaying 
more when the retry count is bigger.

As to measure the load, I think it is a good idea.  I just have some concern in 
spending too much efforts on it without trying the simple one at first, which 
is known to work.

 Basic client pushback mechanism
 ---

 Key: HBASE-5162
 URL: https://issues.apache.org/jira/browse/HBASE-5162
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0

 Attachments: java_HBASE-5162.patch


 The current blocking we do when we are close to some limits (memstores over 
 the multiplier factor, too many store files, global memstore memory) is bad, 
 too coarse and confusing. After hitting HBASE-5161, it really becomes obvious 
 that we need something better.
 I did a little brainstorm with Stack, we came up quickly with two solutions:
  - Send some exception to the client, like OverloadedException, that's thrown 
 when some situation happens like getting past the low memory barrier. It 
 would be thrown when the client gets a handler and does some check while 
 putting or deleting. The client would treat this a retryable exception but 
 ideally wouldn't check .META. for a new location. It could be fancy and have 
 multiple levels of pushback, like send the exception to 25% of the clients, 
 and then go up if the situation persists. Should be easy to implement but 
 we'll be using a lot more IO to send the payload over and over again (but at 
 least it wouldn't sit in the RS's memory).
  - Send a message alongside a successful put or delete to tell the client to 
 slow down a little, this way we don't have to do back and forth with the 
 payload between the client and the server. It's a cleaner (I think) but more 
 involved solution.
 In every case the RS should do very obvious things to notify the operators of 
 this situation, through logs, web UI, metrics, etc.
 Other ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5162) Basic client pushback mechanism

2012-10-23 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482829#comment-13482829
 ] 

Jimmy Xiang commented on HBASE-5162:


For the extra load due to retry, it is mostly on network, not very much on the 
server.  The retry time is configurable based on the tries:

ConnectionUtils.getPauseTime(pause, tries)

As long as the retry is gradually slowing down, is it acceptable?


 Basic client pushback mechanism
 ---

 Key: HBASE-5162
 URL: https://issues.apache.org/jira/browse/HBASE-5162
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0

 Attachments: java_HBASE-5162.patch


 The current blocking we do when we are close to some limits (memstores over 
 the multiplier factor, too many store files, global memstore memory) is bad, 
 too coarse and confusing. After hitting HBASE-5161, it really becomes obvious 
 that we need something better.
 I did a little brainstorm with Stack, we came up quickly with two solutions:
  - Send some exception to the client, like OverloadedException, that's thrown 
 when some situation happens like getting past the low memory barrier. It 
 would be thrown when the client gets a handler and does some check while 
 putting or deleting. The client would treat this a retryable exception but 
 ideally wouldn't check .META. for a new location. It could be fancy and have 
 multiple levels of pushback, like send the exception to 25% of the clients, 
 and then go up if the situation persists. Should be easy to implement but 
 we'll be using a lot more IO to send the payload over and over again (but at 
 least it wouldn't sit in the RS's memory).
  - Send a message alongside a successful put or delete to tell the client to 
 slow down a little, this way we don't have to do back and forth with the 
 payload between the client and the server. It's a cleaner (I think) but more 
 involved solution.
 In every case the RS should do very obvious things to notify the operators of 
 this situation, through logs, web UI, metrics, etc.
 Other ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-24 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Status: Open  (was: Patch Available)

Will address Stack's comments and upload a new patch.

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6896) sync bulk and regular assigment handling socket timeout exception

2012-10-24 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6896:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Integrated into trunk.  Thanks all for the review.

 sync bulk and regular assigment handling socket timeout exception
 -

 Key: HBASE-6896
 URL: https://issues.apache.org/jira/browse/HBASE-6896
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6896.patch, trunk-6896_v2.patch


 In regular assignment, in case of socket network timeout, it tries to call 
 openRegion again and again without change the region plan, ZK offline node,
 till the region is out of transition, in case the region server is still up.
 We may need to sync them up and make sure bulk assignment does the same in 
 this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Attachment: trunk-6977_v3.patch

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch, 
 trunk-6977_v3.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

Status: Patch Available  (was: Open)

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch, 
 trunk-6977_v3.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-5179) Concurrent processing of processFaileOver and ServerShutdownHandler may cause region to be assigned before log splitting is completed, causing data loss

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5179:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Concurrent processing of processFaileOver and ServerShutdownHandler may cause 
 region to be assigned before log splitting is completed, causing data loss
 

 Key: HBASE-5179
 URL: https://issues.apache.org/jira/browse/HBASE-5179
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.92.3

 Attachments: 5179-90.txt, 5179-90v10.patch, 5179-90v11.patch, 
 5179-90v12.patch, 5179-90v13.txt, 5179-90v14.patch, 5179-90v15.patch, 
 5179-90v16.patch, 5179-90v17.txt, 5179-90v18.txt, 5179-90v2.patch, 
 5179-90v3.patch, 5179-90v4.patch, 5179-90v5.patch, 5179-90v6.patch, 
 5179-90v7.patch, 5179-90v8.patch, 5179-90v9.patch, 5179-92v17.patch, 
 5179-v11-92.txt, 5179-v11.txt, 5179-v2.txt, 5179-v3.txt, 5179-v4.txt, 
 Errorlog, hbase-5179.patch, hbase-5179v10.patch, hbase-5179v12.patch, 
 hbase-5179v17.patch, hbase-5179v5.patch, hbase-5179v6.patch, 
 hbase-5179v7.patch, hbase-5179v8.patch, hbase-5179v9.patch


 If master's processing its failover and ServerShutdownHandler's processing 
 happen concurrently, it may appear following  case.
 1.master completed splitLogAfterStartup()
 2.RegionserverA restarts, and ServerShutdownHandler is processing.
 3.master starts to rebuildUserRegions, and RegionserverA is considered as 
 dead server.
 4.master starts to assign regions of RegionserverA because it is a dead 
 server by step3.
 However, when doing step4(assigning region), ServerShutdownHandler may be 
 doing split log, Therefore, it may cause data loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6977) Multithread processing ZK assignment events

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6977:
---

   Resolution: Fixed
Fix Version/s: 0.96.0
   Status: Resolved  (was: Patch Available)

Integrated into trunk.  Thanks all for reviewing it.

 Multithread processing ZK assignment events
 ---

 Key: HBASE-6977
 URL: https://issues.apache.org/jira/browse/HBASE-6977
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0

 Attachments: trunk-6977_v1.patch, trunk-6977_v2-1.patch, 
 trunk-6977_v3.patch


 Related to HBASE-6976 and HBASE-6611.  ZK events processing is a bottle neck 
 for assignments, since there is only one ZK event thread.  If we can use 
 multiple threads, it should be better.
 With multiple threads, the order of events could be messed up. However, if we 
 pass all events related to one region always to the same worker thread, the 
 order should be kept.
 We need to play with it and find out how much performance imrovement we can 
 get.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6381) AssignmentManager should use the same logic for clean startup and failover

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6381:
---

Fix Version/s: 0.96.0

 AssignmentManager should use the same logic for clean startup and failover
 --

 Key: HBASE-6381
 URL: https://issues.apache.org/jira/browse/HBASE-6381
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-6381-notes.pdf, hbase-6381.pdf, 
 trunk-6381_v5.patch, trunk-6381_v7.patch, trunk-6381_v8.patch, 
 trunk-6381_v9.patch


 Currently AssignmentManager handles clean startup and failover very 
 differently.
 Different logic is mingled together so it is hard to find out which is for 
 which.
 We should clean it up and share the same logic so that AssignmentManager 
 handles
 both cases the same way.  This way, the code will much easier to understand 
 and
 maintain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-6223:
--

Assignee: Jimmy Xiang

 Document  hbck improvements: HBASE-6173, HBASE-5360
 ---

 Key: HBASE-6223
 URL: https://issues.apache.org/jira/browse/HBASE-6223
 Project: HBase
  Issue Type: Task
  Components: documentation, hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: trunk-6223.patch


 We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360.
 We should document them. Especially, for HBASE-5360, it's something
 one normally doesn't do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6223:
---

Attachment: trunk-6223.patch

 Document  hbck improvements: HBASE-6173, HBASE-5360
 ---

 Key: HBASE-6223
 URL: https://issues.apache.org/jira/browse/HBASE-6223
 Project: HBase
  Issue Type: Task
  Components: documentation, hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: trunk-6223.patch


 We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360.
 We should document them. Especially, for HBASE-5360, it's something
 one normally doesn't do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6223:
---

Status: Patch Available  (was: Open)

 Document  hbck improvements: HBASE-6173, HBASE-5360
 ---

 Key: HBASE-6223
 URL: https://issues.apache.org/jira/browse/HBASE-6223
 Project: HBase
  Issue Type: Task
  Components: documentation, hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: trunk-6223.patch


 We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360.
 We should document them. Especially, for HBASE-5360, it's something
 one normally doesn't do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-10-25 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484537#comment-13484537
 ] 

Jimmy Xiang commented on HBASE-5136:


@Ted, were you referring to 5136.txt?  That's your patch.  My patch is 
5136-trunk.patch, which is already committed in HBASE-6357.
I think we can close this one as a duplicate.


 Redundant MonitoredTask instances in case of distributed log splitting retry
 

 Key: HBASE-5136
 URL: https://issues.apache.org/jira/browse/HBASE-5136
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 5136-trunk.patch, 5136.txt


 In case of log splitting retry, the following code would be executed multiple 
 times:
 {code}
   public long splitLogDistributed(final ListPath logDirs) throws 
 IOException {
 MonitoredTask status = TaskMonitor.get().createStatus(
   Doing distributed log split in  + logDirs);
 {code}
 leading to multiple MonitoredTask instances.
 User may get confused by multiple distributed log splitting entries for the 
 same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360

2012-10-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6223:
---

Attachment: trunk-6223_v2.patch

 Document  hbck improvements: HBASE-6173, HBASE-5360
 ---

 Key: HBASE-6223
 URL: https://issues.apache.org/jira/browse/HBASE-6223
 Project: HBase
  Issue Type: Task
  Components: documentation, hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: trunk-6223.patch, trunk-6223_v2.patch


 We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360.
 We should document them. Especially, for HBASE-5360, it's something
 one normally doesn't do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360

2012-10-26 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6223:
---

Attachment: trunk-6223_v3.patch

Thanks for review.  I updated the patch a little bit. Now the sentence is short 
and easy to parse. :)

 Document  hbck improvements: HBASE-6173, HBASE-5360
 ---

 Key: HBASE-6223
 URL: https://issues.apache.org/jira/browse/HBASE-6223
 Project: HBase
  Issue Type: Task
  Components: documentation, hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: trunk-6223.patch, trunk-6223_v2.patch, 
 trunk-6223_v3.patch


 We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360.
 We should document them. Especially, for HBASE-5360, it's something
 one normally doesn't do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-10-26 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485092#comment-13485092
 ] 

Jimmy Xiang commented on HBASE-6060:


I think we can let it go for 0.94 since timeout monitor can handle it and there 
is no better way to fix it, because the region state in 0.94 is not so reliable.

For 0.96, this one is not covered yet.  It still relies on timeout monitor. Let 
me cook up a patch for 0.96 now.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: rajeshbabu
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 
 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, 
 HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, 
 HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-10-26 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6060:
---

Attachment: trunk-6060.patch

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: rajeshbabu
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 
 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, 
 HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, 
 HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, 
 trunk-6060.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-10-26 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485173#comment-13485173
 ] 

Jimmy Xiang commented on HBASE-6060:


I uploaded a simple patch for 0.96: trunk-6060.patch. Could you please review?

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: rajeshbabu
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 
 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, 
 HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, 
 HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, 
 trunk-6060.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6223) Document hbck improvements: HBASE-6173, HBASE-5360

2012-10-26 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6223:
---

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Document  hbck improvements: HBASE-6173, HBASE-5360
 ---

 Key: HBASE-6223
 URL: https://issues.apache.org/jira/browse/HBASE-6223
 Project: HBase
  Issue Type: Task
  Components: documentation, hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: trunk-6223.patch, trunk-6223_v2.patch, 
 trunk-6223_v3.patch


 We had a couple hbck improvements recently: HBASE-6173 and HBASE-5360.
 We should document them. Especially, for HBASE-5360, it's something
 one normally doesn't do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-10-26 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485209#comment-13485209
 ] 

Jimmy Xiang commented on HBASE-6060:


@Stack, here is my understanding on the problem. Master calls a rs to open a 
region. Now, in master memory, the region is in pending_open state with this 
rs' server name. Now the rs dies.  When SSH starts, it goes to meta to find all 
the regions on this rs, minus those regions already in transition, then assign 
the remaining regions. If the pending_open region (it could be opening too 
depending on timing) was on this region server before, SSH will take care of 
it.  Otherwise, if it was on a different region server, SSH will not pick it 
up.  In this patch, I just times out the region transition so that tm can 
change the state and re-assign it, instead of waiting for a long time (now, 20 
minutes by default).

I'd like to make sure the region states in master memory is reliable.  
Otherwise, it is of not much use. So I think master always has region control.
In 0.96, I think region states is very reliable now. Of course, there could be 
bugs I am not aware of yet.

@Ted, good point.  I will include the test. For EnvironmentEdgeManager, I will 
leave it to another jira.


 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: rajeshbabu
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 
 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, 
 HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, 
 HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, 
 trunk-6060.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-10-27 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6060:
---

Attachment: trunk-6060_v2.patch

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: rajeshbabu
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 
 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, 
 HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, 
 HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, 
 trunk-6060.patch, trunk-6060_v2.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-10-27 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485515#comment-13485515
 ] 

Jimmy Xiang commented on HBASE-6060:


Added a test, and uploaded v2: trunk-6060_v2.patch.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: rajeshbabu
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 
 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, 
 HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, 
 HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, 
 trunk-6060.patch, trunk-6060_v2.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down

2012-10-29 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5119:
---

Status: Patch Available  (was: Open)

 Set the TimeoutMonitor's timeout back down
 --

 Key: HBASE-5119
 URL: https://issues.apache.org/jira/browse/HBASE-5119
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: trunk_5119.patch


 The TimeoutMonitor used to be extremely racy and caused more troubles than it 
 fixed, but most of this has been fixed I believe in the context of 0.92 so I 
 think we should set it down back to a useful level. Currently it's 30 
 minutes, what should the new value be?
 I think 5 minutes should be good, will do some testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HBASE-5119) Set the TimeoutMonitor's timeout back down

2012-10-29 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481975#comment-13481975
 ] 

Jimmy Xiang edited comment on HBASE-5119 at 10/29/12 4:45 PM:
--

Based on my testing for HBASE-6611/HBASE-6977, in my testing cluster, I can 
easily open 10+k regions on 4 nodes in around 4 minutes.  I think we are fine 
to set the timeout back to 5 minutes.

  was (Author: jxiang):
Based on my testing for HBASE-6611/HBASE-6977, in my testing cluster, I can 
easily open 10+ regions on 4 nodes in around 4 minutes.  I think we are fine to 
set the timeout back to 5 minutes.
  
 Set the TimeoutMonitor's timeout back down
 --

 Key: HBASE-5119
 URL: https://issues.apache.org/jira/browse/HBASE-5119
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: trunk_5119.patch


 The TimeoutMonitor used to be extremely racy and caused more troubles than it 
 fixed, but most of this has been fixed I believe in the context of 0.92 so I 
 think we should set it down back to a useful level. Currently it's 30 
 minutes, what should the new value be?
 I think 5 minutes should be good, will do some testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down

2012-10-29 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5119:
---

Attachment: trunk_5119.patch

 Set the TimeoutMonitor's timeout back down
 --

 Key: HBASE-5119
 URL: https://issues.apache.org/jira/browse/HBASE-5119
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: trunk_5119.patch


 The TimeoutMonitor used to be extremely racy and caused more troubles than it 
 fixed, but most of this has been fixed I believe in the context of 0.92 so I 
 think we should set it down back to a useful level. Currently it's 30 
 minutes, what should the new value be?
 I think 5 minutes should be good, will do some testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5119) Set the TimeoutMonitor's timeout back down

2012-10-29 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486137#comment-13486137
 ] 

Jimmy Xiang commented on HBASE-5119:


I uploaded a patch, which simply set the timeout to 10minutes and the checking 
period to 30 seconds.

 Set the TimeoutMonitor's timeout back down
 --

 Key: HBASE-5119
 URL: https://issues.apache.org/jira/browse/HBASE-5119
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: trunk_5119.patch


 The TimeoutMonitor used to be extremely racy and caused more troubles than it 
 fixed, but most of this has been fixed I believe in the context of 0.92 so I 
 think we should set it down back to a useful level. Currently it's 30 
 minutes, what should the new value be?
 I think 5 minutes should be good, will do some testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6060) Regions's in OPENING state from failed regionservers takes a long time to recover

2012-10-29 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486333#comment-13486333
 ] 

Jimmy Xiang commented on HBASE-6060:


Good review.  Thanks.  I posted a new patch on RB: 
https://reviews.apache.org/r/7767/

In the new patch, I re-organized the SSH code a little bit, and handled regions 
in OPEN state too.
It also handles regions in OPENING/PENDING_OPEN/OFFLINE state, but was open on 
this region
server before that.  This could happen when the cluster starts up.

 Regions's in OPENING state from failed regionservers takes a long time to 
 recover
 -

 Key: HBASE-6060
 URL: https://issues.apache.org/jira/browse/HBASE-6060
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Reporter: Enis Soztutar
Assignee: rajeshbabu
 Fix For: 0.92.3, 0.94.3, 0.96.0

 Attachments: 6060-94-v3.patch, 6060-94-v4_1.patch, 
 6060-94-v4_1.patch, 6060-94-v4.patch, 6060_alternative_suggestion.txt, 
 6060_suggestion2_based_off_v3.patch, 6060_suggestion_based_off_v3.patch, 
 6060_suggestion_toassign_rs_wentdown_beforerequest.patch, 6060-trunk_2.patch, 
 6060-trunk_3.patch, 6060-trunk.patch, 6060-trunk.patch, HBASE-6060-92.patch, 
 HBASE-6060-94.patch, HBASE-6060_latest.patch, HBASE-6060_latest.patch, 
 HBASE-6060_latest.patch, HBASE-6060-trunk_4.patch, HBASE-6060_trunk_5.patch, 
 trunk-6060.patch, trunk-6060_v2.patch


 we have seen a pattern in tests, that the regions are stuck in OPENING state 
 for a very long time when the region server who is opening the region fails. 
 My understanding of the process: 
  
  - master calls rs to open the region. If rs is offline, a new plan is 
 generated (a new rs is chosen). RegionState is set to PENDING_OPEN (only in 
 master memory, zk still shows OFFLINE). See HRegionServer.openRegion(), 
 HMaster.assign()
  - RegionServer, starts opening a region, changes the state in znode. But 
 that znode is not ephemeral. (see ZkAssign)
  - Rs transitions zk node from OFFLINE to OPENING. See 
 OpenRegionHandler.process()
  - rs then opens the region, and changes znode from OPENING to OPENED
  - when rs is killed between OPENING and OPENED states, then zk shows OPENING 
 state, and the master just waits for rs to change the region state, but since 
 rs is down, that wont happen. 
  - There is a AssignmentManager.TimeoutMonitor, which does exactly guard 
 against these kind of conditions. It periodically checks (every 10 sec by 
 default) the regions in transition to see whether they timedout 
 (hbase.master.assignment.timeoutmonitor.timeout). Default timeout is 30 min, 
 which explains what you and I are seeing. 
  - ServerShutdownHandler in Master does not reassign regions in OPENING 
 state, although it handles other states. 
 Lowering that threshold from the configuration is one option, but still I 
 think we can do better. 
 Will investigate more. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down

2012-10-29 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5119:
---

  Resolution: Fixed
Release Note: The region assignment timeout time is reduced to 10 minutes. 
The timeout check interval is reduced to 30 seconds from 60 seconds.
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Set the TimeoutMonitor's timeout back down
 --

 Key: HBASE-5119
 URL: https://issues.apache.org/jira/browse/HBASE-5119
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.96.0

 Attachments: trunk_5119.patch


 The TimeoutMonitor used to be extremely racy and caused more troubles than it 
 fixed, but most of this has been fixed I believe in the context of 0.92 so I 
 think we should set it down back to a useful level. Currently it's 30 
 minutes, what should the new value be?
 I think 5 minutes should be good, will do some testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk

2012-10-30 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487214#comment-13487214
 ] 

Jimmy Xiang commented on HBASE-7055:


This issue doesn't have a component. Should we need two +1s to commit it?

It also introduces a new configuration file. It seems to be a big patch to me.


 port HBASE-6371 tier-based compaction from 0.89-fb to trunk
 ---

 Key: HBASE-7055
 URL: https://issues.apache.org/jira/browse/HBASE-7055
 Project: HBase
  Issue Type: Task
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch


 There's divergence in the code :(
 See HBASE-6371 for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk

2012-10-31 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13487855#comment-13487855
 ] 

Jimmy Xiang commented on HBASE-7055:


I agree with Enis.  Imagine if each component has a config file, we will have 
tons of them.  It will be hard to manage.  If the same name value pair is 
defined in different files, which one should we use?

 port HBASE-6371 tier-based compaction from 0.89-fb to trunk
 ---

 Key: HBASE-7055
 URL: https://issues.apache.org/jira/browse/HBASE-7055
 Project: HBase
  Issue Type: Task
  Components: Compaction
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.96.0

 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch


 There's divergence in the code :(
 See HBASE-6371 for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7080) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError

2012-10-31 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-7080:
--

 Summary: TestHFileOutputFormat.testColumnFamilyCompression failing 
due to UnsatisfiedLinkError
 Key: HBASE-7080
 URL: https://issues.apache.org/jira/browse/HBASE-7080
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Due to HADOOP-8756, this test fails

{noformat}
java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
Method)
at 
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
at 
org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118)
at 
org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
at 
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.getSupportedCompressionAlgorithms(TestHFileOutputFormat.java:649)
at 
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testColumnFamilyCompression(TestHFileOutputFormat.java:571)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7080) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError

2012-10-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7080:
---

Attachment: trunk_7080.patch

 TestHFileOutputFormat.testColumnFamilyCompression failing due to 
 UnsatisfiedLinkError
 -

 Key: HBASE-7080
 URL: https://issues.apache.org/jira/browse/HBASE-7080
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0

 Attachments: trunk_7080.patch


 Due to HADOOP-8756, this test fails
 {noformat}
 java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
   at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
 Method)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118)
   at 
 org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
   at 
 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.getSupportedCompressionAlgorithms(TestHFileOutputFormat.java:649)
   at 
 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testColumnFamilyCompression(TestHFileOutputFormat.java:571)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7080) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError

2012-10-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7080:
---

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 TestHFileOutputFormat.testColumnFamilyCompression failing due to 
 UnsatisfiedLinkError
 -

 Key: HBASE-7080
 URL: https://issues.apache.org/jira/browse/HBASE-7080
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0

 Attachments: trunk_7080.patch


 Due to HADOOP-8756, this test fails
 {noformat}
 java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
   at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
 Method)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118)
   at 
 org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
   at 
 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.getSupportedCompressionAlgorithms(TestHFileOutputFormat.java:649)
   at 
 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testColumnFamilyCompression(TestHFileOutputFormat.java:571)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-10-31 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488032#comment-13488032
 ] 

Jimmy Xiang commented on HBASE-6305:


If the scheme is not null.  The path will have the scheme twice.
The failure seems to be because of the null scheme.  Is there a better way to 
fix it?

 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
 

 Key: HBASE-6305
 URL: https://issues.apache.org/jira/browse/HBASE-6305
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.92.2, 0.94.1
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.94.3

 Attachments: hbase-6305-94.patch, HBASE-6305-94-v2.patch, 
 HBASE-6305-v1.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
 {code}
 testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
 elapsed: 0.022 sec   ERROR!
 java.lang.RuntimeException: Master not initialized after 200 seconds
 at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
 at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
 at 
 org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-10-31 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488053#comment-13488053
 ] 

Jimmy Xiang commented on HBASE-6305:


Path.toString() will add the scheme if it's not null.  We add one more time 
here.  So it is added twice.

 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
 

 Key: HBASE-6305
 URL: https://issues.apache.org/jira/browse/HBASE-6305
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.92.2, 0.94.1
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.94.3

 Attachments: hbase-6305-94.patch, HBASE-6305-94-v2.patch, 
 HBASE-6305-v1.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
 {code}
 testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
 elapsed: 0.022 sec   ERROR!
 java.lang.RuntimeException: Master not initialized after 200 seconds
 at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
 at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
 at 
 org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-10-31 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488086#comment-13488086
 ] 

Jimmy Xiang commented on HBASE-6305:


Can you do something like below?

conf.set(HConstants.HBASE_DIR,
TEST_UTIL.getDataTestDir(hbase.rootdir).makeQualified(TEST_UTIL.getTestFileSystem()).toString());

 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
 

 Key: HBASE-6305
 URL: https://issues.apache.org/jira/browse/HBASE-6305
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.92.2, 0.94.1
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.94.3

 Attachments: hbase-6305-94.patch, HBASE-6305-94-v2.patch, 
 HBASE-6305-v1.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
 {code}
 testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
 elapsed: 0.022 sec   ERROR!
 java.lang.RuntimeException: Master not initialized after 200 seconds
 at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
 at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
 at 
 org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7080) TestHFileOutputFormat.testColumnFamilyCompression failing due to UnsatisfiedLinkError

2012-10-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7080:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Integrated into trunk.  Thanks Greg for reviewing it.

 TestHFileOutputFormat.testColumnFamilyCompression failing due to 
 UnsatisfiedLinkError
 -

 Key: HBASE-7080
 URL: https://issues.apache.org/jira/browse/HBASE-7080
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0

 Attachments: trunk_7080.patch


 Due to HADOOP-8756, this test fails
 {noformat}
 java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
   at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
 Method)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
   at 
 org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:127)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:104)
   at 
 org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:118)
   at 
 org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
   at 
 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.getSupportedCompressionAlgorithms(TestHFileOutputFormat.java:649)
   at 
 org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testColumnFamilyCompression(TestHFileOutputFormat.java:571)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7082) TestHFileCleaner#testHFileCleaning fails due to cleaner is reset

2012-10-31 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-7082:
--

 Summary: TestHFileCleaner#testHFileCleaning fails due to cleaner 
is reset
 Key: HBASE-7082
 URL: https://issues.apache.org/jira/browse/HBASE-7082
 Project: HBase
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Attachments: trunk-7082.patch

TestHFileCleaner#testHFileCleaning fails if it runs after 
testRemovesEmptyDirectories which resets the cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7082) TestHFileCleaner#testHFileCleaning fails due to cleaner is reset

2012-10-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7082:
---

Attachment: trunk-7082.patch

 TestHFileCleaner#testHFileCleaning fails due to cleaner is reset
 

 Key: HBASE-7082
 URL: https://issues.apache.org/jira/browse/HBASE-7082
 Project: HBase
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Attachments: trunk-7082.patch


 TestHFileCleaner#testHFileCleaning fails if it runs after 
 testRemovesEmptyDirectories which resets the cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7082) TestHFileCleaner#testHFileCleaning fails due to cleaner is reset

2012-10-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7082:
---

Status: Patch Available  (was: Open)

 TestHFileCleaner#testHFileCleaning fails due to cleaner is reset
 

 Key: HBASE-7082
 URL: https://issues.apache.org/jira/browse/HBASE-7082
 Project: HBase
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Attachments: trunk-7082.patch


 TestHFileCleaner#testHFileCleaning fails if it runs after 
 testRemovesEmptyDirectories which resets the cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-10-31 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488280#comment-13488280
 ] 

Jimmy Xiang commented on HBASE-6305:


+1

 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
 

 Key: HBASE-6305
 URL: https://issues.apache.org/jira/browse/HBASE-6305
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.92.2, 0.94.1
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.94.3

 Attachments: hbase-6305-94.patch, HBASE-6305-94-v2.patch, 
 HBASE-6305-94-v2.patch, HBASE-6305-v1.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
 {code}
 testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
 elapsed: 0.022 sec   ERROR!
 java.lang.RuntimeException: Master not initialized after 200 seconds
 at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
 at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
 at 
 org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7083) SSH#fixupDaughter should force re-assign missing daughter

2012-10-31 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-7083:
--

 Summary: SSH#fixupDaughter should force re-assign missing daughter
 Key: HBASE-7083
 URL: https://issues.apache.org/jira/browse/HBASE-7083
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


In looking into flaky test 
TestSplitTransactionOnCluster#testShutdownSimpleFixup, I found out that a 
missing daughter is not assigned by SSH properly.  It could be open on the dead 
server.  We need to force re-assign it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7083) SSH#fixupDaughter should force re-assign missing daughter

2012-10-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7083:
---

Fix Version/s: 0.96.0
   Status: Patch Available  (was: Open)

 SSH#fixupDaughter should force re-assign missing daughter
 -

 Key: HBASE-7083
 URL: https://issues.apache.org/jira/browse/HBASE-7083
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0

 Attachments: trunk-7083.patch


 In looking into flaky test 
 TestSplitTransactionOnCluster#testShutdownSimpleFixup, I found out that a 
 missing daughter is not assigned by SSH properly.  It could be open on the 
 dead server.  We need to force re-assign it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7083) SSH#fixupDaughter should force re-assign missing daughter

2012-10-31 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7083:
---

Attachment: trunk-7083.patch

 SSH#fixupDaughter should force re-assign missing daughter
 -

 Key: HBASE-7083
 URL: https://issues.apache.org/jira/browse/HBASE-7083
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0

 Attachments: trunk-7083.patch


 In looking into flaky test 
 TestSplitTransactionOnCluster#testShutdownSimpleFixup, I found out that a 
 missing daughter is not assigned by SSH properly.  It could be open on the 
 dead server.  We need to force re-assign it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7083) SSH#fixupDaughter should force re-assign missing daughter

2012-11-01 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-7083:
---

Status: Open  (was: Patch Available)

 SSH#fixupDaughter should force re-assign missing daughter
 -

 Key: HBASE-7083
 URL: https://issues.apache.org/jira/browse/HBASE-7083
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.96.0

 Attachments: trunk-7083.patch


 In looking into flaky test 
 TestSplitTransactionOnCluster#testShutdownSimpleFixup, I found out that a 
 missing daughter is not assigned by SSH properly.  It could be open on the 
 dead server.  We need to force re-assign it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


<    3   4   5   6   7   8   9   10   11   12   >