[jira] Resolved: (HBASE-3115) HBaseClient wastes 1 TCP packet per RPC

2010-10-18 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HBASE-3115.


   Resolution: Fixed
Fix Version/s: 0.90.0
 Hadoop Flags: [Reviewed]

OK, I'll go with your logic. Thanks for the good find, Benoit.

> HBaseClient wastes 1 TCP packet per RPC
> ---
>
> Key: HBASE-3115
> URL: https://issues.apache.org/jira/browse/HBASE-3115
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 0.90.0
>
> Attachments: hbase-3115.txt
>
>
> In {{ipc/HBaseClient.java}}, the method {{sendParam}} does:
> {code}
> out.writeInt(dataLength);  //first put the data length
> out.write(data, 0, dataLength);//write the data
> {code}
> While analyzing some tcpdump traces tonight, I saw that this consistently 
> translates to 1 TCP packet with a 4 byte payload followed by another TCP 
> packet with the RPC itself.  This makes inefficient use of network resources 
> and adversely affects TCP throughput.  I believe each of those lines 
> translates to a {{write}} system call on the socket's file descriptor 
> (unnecessary system calls are also bad for performance).  The code attempts 
> to call {{out.flush();}} but this approach is ineffective on sockets in Java 
> (as far as I empirically noticed over the past few months).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3114) Test up on hudson are leaking zookeeper ensembles

2010-10-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3114:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolving.  Its holding steady since this patch went in.  Hopefully next 
restart of hudson will clean up those that are sticking around.  Here is from 
build #1565:

{code}
2010-10-19 02:24:46,170 INFO  [main] zookeeper.MiniZooKeeperCluster(111): 
Failed binding ZK Server to client port: 21818
2010-10-19 02:24:46,170 INFO  [main] zookeeper.MiniZooKeeperCluster(111): 
Failed binding ZK Server to client port: 21819
2010-10-19 02:24:46,170 INFO  [main] zookeeper.MiniZooKeeperCluster(111): 
Failed binding ZK Server to client port: 21820
2010-10-19 02:24:46,171 INFO  [main] zookeeper.MiniZooKeeperCluster(111): 
Failed binding ZK Server to client port: 21821
2010-10-19 02:24:46,210 INFO  [main] zookeeper.MiniZooKeeperCluster(125): 
Started MiniZK Server on client port: 21822
{code}

> Test up on hudson are leaking zookeeper ensembles
> -
>
> Key: HBASE-3114
> URL: https://issues.apache.org/jira/browse/HBASE-3114
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 0.90.0
>
> Attachments: 3114-v2.txt, 3114.txt
>
>
> Here is from a recent run up on Hudson:
> {code}
> 2010-10-14 23:31:56,482 INFO  [main] zookeeper.MiniZooKeeperCluster(111): 
> Failed binding ZK Server to client port: 21818
> 2010-10-14 23:31:56,483 INFO  [main] zookeeper.MiniZooKeeperCluster(111): 
> Failed binding ZK Server to client port: 21819
> 2010-10-14 23:31:56,483 INFO  [main] zookeeper.MiniZooKeeperCluster(111): 
> Failed binding ZK Server to client port: 21820
> 2010-10-14 23:31:56,522 INFO  [main] zookeeper.MiniZooKeeperCluster(125): 
> Started MiniZK Server on client port: 21821
> {code}
> See how we start trying to bind to 21818 but we don't get a free port till we 
> get to 21821?
> Some test or tests is not cleaning up after itself leaving a running zk 
> cluster or two about.
> TestReplication looks to be suspect.  Here is its @AfterClass method:
> {code}
>   /**
>* @throws java.lang.Exception
>*/
>   @AfterClass
>   public static void tearDownAfterClass() throws Exception {
> /* REENABLE
> utility2.shutdownMiniCluster();
> utility1.shutdownMiniCluster();
> */
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2658) REST (stargate) TableRegionModel Regions need to be updated to work w/ new region naming convention from HBASE-2531

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922445#action_12922445
 ] 

stack commented on HBASE-2658:
--

Is this needed for 0.90 Andrew?  I'd think so.

> REST (stargate) TableRegionModel Regions need to be updated to work w/ new 
> region naming convention from HBASE-2531
> ---
>
> Key: HBASE-2658
> URL: https://issues.apache.org/jira/browse/HBASE-2658
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Andrew Purtell
> Fix For: 0.90.0
>
>
> One reason TestTableResource was failing was because comparing region names 
> as strings was failing because the two below no longer matched.  My guess is 
> that the rest stuff is not using the new means of constructing region names.  
> See HBASE-2531
> TestTableResource,,1275503739792.30a45563321be3ec11841b0f1e79d687.
> TestTableResource,,1275503739792

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3122) NPE in master.AssignmentManager if all region servers shut down

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922440#action_12922440
 ] 

stack commented on HBASE-3122:
--

This issue is a little more involved.  The kernel of the issue is what to do if 
no servers in the cluster when a shutdownserverhandler is running?  
ShutdownServerHandler can't complete if not root or meta assigned.  I'm 
thinking that SSH should just hang around.  Off in assign, we should check 
online servers and if none, then hold (unless master is asked shutdown).  The 
hold would be after the server's logs had been split.

> NPE in master.AssignmentManager if all region servers shut down
> ---
>
> Key: HBASE-3122
> URL: https://issues.apache.org/jira/browse/HBASE-3122
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Minor
> Fix For: 0.90.0
>
>
> 10/10/18 16:26:44 INFO catalog.CatalogTracker: acer,60020,1287443908850 
> carrying .META.; unsetting .META. location
> 10/10/18 16:26:44 INFO catalog.CatalogTracker: Current cached META location 
> is not valid, resetting
> 10/10/18 16:26:44 INFO handler.ServerShutdownHandler: Splitting logs for 
> acer,60020,1287443908850
> 10/10/18 16:26:44 INFO zookeeper.ZKUtil: hconnection-0x12bc1a2f0a60001 Set 
> watcher on existing znode /hbase/root-region-server
> 10/10/18 16:26:44 INFO catalog.RootLocationEditor: Unsetting ROOT region 
> location in ZooKeeper
> 10/10/18 16:26:44 DEBUG zookeeper.ZKAssign: master:6-0x12bc1a2f0a6 
> Creating (or updating) unassigned node for 70236052 with OFFLINE state
> 10/10/18 16:26:44 WARN master.LoadBalancer: Wanted to do random assignment 
> but no servers to assign to
> 10/10/18 16:26:44 ERROR executor.EventHandler: Caught throwable while 
> processing event M_SERVER_SHUTDOWN
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.LoadBalancer$RegionPlan.toString(LoadBalancer.java:595)
>   at java.lang.String.valueOf(String.java:2826)
>   at java.lang.StringBuilder.append(StringBuilder.java:115)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:803)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:777)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:720)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:640)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:922)
>   at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:97)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:150)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1845) MultiGet, MultiDelete, and MultiPut - batched to the appropriate region servers

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922437#action_12922437
 ] 

stack commented on HBASE-1845:
--

@Ryan Want to open new issue to address?

> MultiGet, MultiDelete, and MultiPut - batched to the appropriate region 
> servers
> ---
>
> Key: HBASE-1845
> URL: https://issues.apache.org/jira/browse/HBASE-1845
> Project: HBase
>  Issue Type: New Feature
>Reporter: Erik Holstad
>Assignee: Marc Limotte
> Fix For: 0.90.0
>
> Attachments: batch.patch, hbase-1845-trunk.patch, 
> hbase-1845_0.20.3.patch, hbase-1845_0.20.5.patch, multi-v1.patch
>
>
> I've started to create a general interface for doing these batch/multi calls 
> and would like to get some input and thoughts about how we should handle this 
> and what the protocol should
> look like. 
> First naive patch, coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2485) Persist Master in-memory state so on restart or failover, new instance can pick up where the old left off

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922436#action_12922436
 ] 

stack commented on HBASE-2485:
--

Just to say that the attached is good on how things used to work.  It also puts 
up a few simple axioms on how things are to be in the new master with listings 
of general transition flows.  The hbase 'book' has the committed versions of 
these flows.  I also took from the doc. description of how splits are now in 
new master.

> Persist Master in-memory state so on restart or failover, new instance can 
> pick up where the old left off
> -
>
> Key: HBASE-2485
> URL: https://issues.apache.org/jira/browse/HBASE-2485
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Jonathan Gray
> Fix For: 0.90.0
>
> Attachments: HBase-State-Transitions.docx
>
>
> Today there was some good stuff up on IRC on how transitions won't always 
> make it across Master failovers in multi-master deploy because transitions 
> are kept in in-memory structure up in the Master and so on master crash, the 
> new master will be missing state  on startup (Todd was main promulgator of 
> this observation and of the opinion that while  master rewrite is scheduled 
> for 0.21, some part needs to be done for 0.20.5).  A few suggestions were 
> made: transitions should be file-backed somehow, etc.  Let this issue be 
> about the subset we want to do for 0.20.5.
> Of the in-memory state queues, there is at least the master tasks queue -- 
> process region opens, closes, regionserver crashes, etc. -- where tasks must 
> be done in order and IIRC, tasks are fairly idempotent (at least in the 
> server crash case, its multi-step and we'll put the crash event back on the 
> queue if we cannot do all steps in the one go).  Perhaps this queue could be 
> done using the new queue facility in zk 3.3.0 (I haven't looked to check if 
> possible, just suggesting).  Another suggestion was a file to which we'd 
> append queue items, requeueing, and marking the file with task complete, etc. 
>  On Master restart or fail-over, we'd replay the queue log.
> There is also the Map of regions-in-transition.  Yesterday we learned that 
> there is a bug where server shutdown processing does not iterate the Map of 
> regions-in-transition.  This Map may hold regions that are in "opening" or 
> "opened" state but haven't yet had the fact added to .META. by master.  
> Meantime the hosting server can crash.  Regions that were opening will stay 
> in the regions-in-transition and those in opened-but-not-yet-added-to-meta 
> will go ahead and add a crashed server to .META. (Currently 
> regions-in-transition does not record server the region opening/open is 
> happening on so it doesn't have enough info to be processed as part of server 
> shutdown).
> Regions-in-transition also needs to be persistant.  On startup, 
> regions-in-transition can get kinda hectic on a big cluster.  Ordering is not 
> so important here I believe.  A directory in zk might work (For 1M regions in 
> a big cluster, that'd be about 2M creates and 2M deletes during startup -- 
> thats too much?).  Or we could write a WAL-like log again of region  
> transitions (We'd have to develop a little vocabulary) that got reread by a 
> new master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3115) HBaseClient wastes 1 TCP packet per RPC

2010-10-18 Thread Benoit Sigoure (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922435#action_12922435
 ] 

Benoit Sigoure commented on HBASE-3115:
---

No I haven't tried it either - I don't use this HBase client anymore, I use my 
asynchronous client instead.

But given that the patch changes the code to call out.write() only once, I 
can't imagine how this can be split in 2 write() system calls that would result 
in 2 packets.

> HBaseClient wastes 1 TCP packet per RPC
> ---
>
> Key: HBASE-3115
> URL: https://issues.apache.org/jira/browse/HBASE-3115
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hbase-3115.txt
>
>
> In {{ipc/HBaseClient.java}}, the method {{sendParam}} does:
> {code}
> out.writeInt(dataLength);  //first put the data length
> out.write(data, 0, dataLength);//write the data
> {code}
> While analyzing some tcpdump traces tonight, I saw that this consistently 
> translates to 1 TCP packet with a 4 byte payload followed by another TCP 
> packet with the RPC itself.  This makes inefficient use of network resources 
> and adversely affects TCP throughput.  I believe each of those lines 
> translates to a {{write}} system call on the socket's file descriptor 
> (unnecessary system calls are also bad for performance).  The code attempts 
> to call {{out.flush();}} but this approach is ineffective on sockets in Java 
> (as far as I empirically noticed over the past few months).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922431#action_12922431
 ] 

stack commented on HBASE-2753:
--

Yeah... a few days ago I enabled assertions when tests run.  We should try 
enabling assertions when we run too.. but yeah, this is looking like its fixed 
in 0.90... just make it last thing we close out.

> Remove sorted() methods from Result now that Gets are Scans
> ---
>
> Key: HBASE-2753
> URL: https://issues.apache.org/jira/browse/HBASE-2753
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Jonathan Gray
>Assignee: ryan rawson
> Fix For: 0.90.0
>
>
> With the old Get codepath, we used to sometimes get results sent to the 
> client that weren't fully sorted.  Now that Gets are Scans, results should 
> always be sorted.
> Confirm that we always get back sorted results and if so drop the 
> Result.sorted() method and update javadoc accordingly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3115) HBaseClient wastes 1 TCP packet per RPC

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922430#action_12922430
 ] 

stack commented on HBASE-3115:
--

Not me (Pinging Benoit).

> HBaseClient wastes 1 TCP packet per RPC
> ---
>
> Key: HBASE-3115
> URL: https://issues.apache.org/jira/browse/HBASE-3115
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hbase-3115.txt
>
>
> In {{ipc/HBaseClient.java}}, the method {{sendParam}} does:
> {code}
> out.writeInt(dataLength);  //first put the data length
> out.write(data, 0, dataLength);//write the data
> {code}
> While analyzing some tcpdump traces tonight, I saw that this consistently 
> translates to 1 TCP packet with a 4 byte payload followed by another TCP 
> packet with the RPC itself.  This makes inefficient use of network resources 
> and adversely affects TCP throughput.  I believe each of those lines 
> translates to a {{write}} system call on the socket's file descriptor 
> (unnecessary system calls are also bad for performance).  The code attempts 
> to call {{out.flush();}} but this approach is ineffective on sockets in Java 
> (as far as I empirically noticed over the past few months).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3126) Force use of 'mv -f' when moving aside hbase logfiles

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922428#action_12922428
 ] 

stack commented on HBASE-3126:
--

We should kill all inheritance in our scripts (except JAVA_HOME?)

> Force use of 'mv -f' when moving aside hbase logfiles
> -
>
> Key: HBASE-3126
> URL: https://issues.apache.org/jira/browse/HBASE-3126
> Project: HBase
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 0.89.20100924
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Minor
> Fix For: 0.90.0
>
>
> We saw a case where the hbase startup script wouldn't finish because it 
> couldn't move logfiles out of the way, and would throw the session into 
> interactive mode.
> The problem is caused because the script uses 'mv' with no arguments, and so 
> it will inherit any shell aliases for the default. This changes 'mv' to 'mv 
> -f'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3127) TestHTablePool failing after recent commit of HConnection changes

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922427#action_12922427
 ] 

stack commented on HBASE-3127:
--

Sorry, 2669, not 2997.  Let me edit the svn commit message.

> TestHTablePool failing after recent commit of HConnection changes
> -
>
> Key: HBASE-3127
> URL: https://issues.apache.org/jira/browse/HBASE-3127
> Project: HBase
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 0.90.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Fix For: 0.90.0
>
> Attachments: HBASE-3127-v1.patch
>
>
> TestHTablePool passes a null Configuration to HTablePool causing a new NPE.  
> It should just reuse the TEST_UTIL.conf.
> Could also add a check to HTablePool that conf is not null but seems that it 
> should make sense that you can't pass it null?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3127) TestHTablePool failing after recent commit of HConnection changes

2010-10-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3127:
-

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Sorry Jon, I fixed it under 2985 banner! (Meant to do it under 2997 banner).  
Yeah, I can't figure how this test passed previous -- how it found cluster and 
how it didn't create the tables its testing against.

> TestHTablePool failing after recent commit of HConnection changes
> -
>
> Key: HBASE-3127
> URL: https://issues.apache.org/jira/browse/HBASE-3127
> Project: HBase
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 0.90.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Fix For: 0.90.0
>
> Attachments: HBASE-3127-v1.patch
>
>
> TestHTablePool passes a null Configuration to HTablePool causing a new NPE.  
> It should just reuse the TEST_UTIL.conf.
> Could also add a check to HTablePool that conf is not null but seems that it 
> should make sense that you can't pass it null?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3127) TestHTablePool failing after recent commit of HConnection changes

2010-10-18 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3127:
-

Status: Patch Available  (was: Open)

> TestHTablePool failing after recent commit of HConnection changes
> -
>
> Key: HBASE-3127
> URL: https://issues.apache.org/jira/browse/HBASE-3127
> Project: HBase
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 0.90.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Fix For: 0.90.0
>
> Attachments: HBASE-3127-v1.patch
>
>
> TestHTablePool passes a null Configuration to HTablePool causing a new NPE.  
> It should just reuse the TEST_UTIL.conf.
> Could also add a check to HTablePool that conf is not null but seems that it 
> should make sense that you can't pass it null?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3127) TestHTablePool failing after recent commit of HConnection changes

2010-10-18 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3127:
-

Attachment: HBASE-3127-v1.patch

This passes the TEST_UTIL configuration when instantiating HTablePools rather 
than null.

Also adds table creation to the beforeClass() and removes some weird code that 
could not have been working right before (delete of table w/o disabling if 
table already existed).

Somehow these were working against non-existant tables before?

Tests passing now.

> TestHTablePool failing after recent commit of HConnection changes
> -
>
> Key: HBASE-3127
> URL: https://issues.apache.org/jira/browse/HBASE-3127
> Project: HBase
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 0.90.0
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Fix For: 0.90.0
>
> Attachments: HBASE-3127-v1.patch
>
>
> TestHTablePool passes a null Configuration to HTablePool causing a new NPE.  
> It should just reuse the TEST_UTIL.conf.
> Could also add a check to HTablePool that conf is not null but seems that it 
> should make sense that you can't pass it null?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3127) TestHTablePool failing after recent commit of HConnection changes

2010-10-18 Thread Jonathan Gray (JIRA)
TestHTablePool failing after recent commit of HConnection changes
-

 Key: HBASE-3127
 URL: https://issues.apache.org/jira/browse/HBASE-3127
 Project: HBase
  Issue Type: Bug
  Components: client, test
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.0


TestHTablePool passes a null Configuration to HTablePool causing a new NPE.  It 
should just reuse the TEST_UTIL.conf.

Could also add a check to HTablePool that conf is not null but seems that it 
should make sense that you can't pass it null?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3126) Force use of 'mv -f' when moving aside hbase logfiles

2010-10-18 Thread Jonathan Gray (JIRA)
Force use of 'mv -f' when moving aside hbase logfiles
-

 Key: HBASE-3126
 URL: https://issues.apache.org/jira/browse/HBASE-3126
 Project: HBase
  Issue Type: Improvement
  Components: scripts
Affects Versions: 0.89.20100924
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.90.0


We saw a case where the hbase startup script wouldn't finish because it 
couldn't move logfiles out of the way, and would throw the session into 
interactive mode.

The problem is caused because the script uses 'mv' with no arguments, and so it 
will inherit any shell aliases for the default. This changes 'mv' to 'mv -f'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3125) Allow HFile's bytes per checksum to be configured

2010-10-18 Thread Jonathan Gray (JIRA)
Allow HFile's bytes per checksum to be configured
-

 Key: HBASE-3125
 URL: https://issues.apache.org/jira/browse/HBASE-3125
 Project: HBase
  Issue Type: New Feature
  Components: io
Reporter: Jonathan Gray
Priority: Minor


It could be desirable to have a non-default bytes per checksum at the HDFS 
level.  This change requires some HDFS-side changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3096) TestCompaction times out in latest release

2010-10-18 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HBASE-3096.


  Resolution: Duplicate
Assignee: stack  (was: Todd Lipcon)
Hadoop Flags: [Reviewed]

Went to commit this, but looks like stack already fixed this in r999017

> TestCompaction times out in latest release
> --
>
> Key: HBASE-3096
> URL: https://issues.apache.org/jira/browse/HBASE-3096
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.89.20100924
>Reporter: Todd Lipcon
>Assignee: stack
> Attachments: hbase-3096.txt
>
>
> TestCompaction is timing out in 0.89.20100924. It's using HRegion directly 
> and writing too much data, so the writes start blocking forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2985) HRegionServer.multi() no longer calls HRegion.put(List) when possible

2010-10-18 Thread ryan rawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson resolved HBASE-2985.


Resolution: Fixed

> HRegionServer.multi() no longer calls HRegion.put(List) when possible
> -
>
> Key: HBASE-2985
> URL: https://issues.apache.org/jira/browse/HBASE-2985
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89.20100621
>Reporter: ryan rawson
>Assignee: ryan rawson
> Fix For: 0.90.0
>
>
> This should result in a reduce performance of puts in batched mode

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3115) HBaseClient wastes 1 TCP packet per RPC

2010-10-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922391#action_12922391
 ] 

Todd Lipcon commented on HBASE-3115:


FYI I didn't verify with tshark or anything that this actually reduced # of 
packets, just kind of assuming. Did either of you guys try it out?

> HBaseClient wastes 1 TCP packet per RPC
> ---
>
> Key: HBASE-3115
> URL: https://issues.apache.org/jira/browse/HBASE-3115
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hbase-3115.txt
>
>
> In {{ipc/HBaseClient.java}}, the method {{sendParam}} does:
> {code}
> out.writeInt(dataLength);  //first put the data length
> out.write(data, 0, dataLength);//write the data
> {code}
> While analyzing some tcpdump traces tonight, I saw that this consistently 
> translates to 1 TCP packet with a 4 byte payload followed by another TCP 
> packet with the RPC itself.  This makes inefficient use of network resources 
> and adversely affects TCP throughput.  I believe each of those lines 
> translates to a {{write}} system call on the socket's file descriptor 
> (unnecessary system calls are also bad for performance).  The code attempts 
> to call {{out.flush();}} but this approach is ineffective on sockets in Java 
> (as far as I empirically noticed over the past few months).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans

2010-10-18 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922387#action_12922387
 ] 

ryan rawson commented on HBASE-2753:


its not done until it's done, and the sort is still in the code base.  Lets 
wait until later this week for some more test runs.

> Remove sorted() methods from Result now that Gets are Scans
> ---
>
> Key: HBASE-2753
> URL: https://issues.apache.org/jira/browse/HBASE-2753
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Jonathan Gray
>Assignee: ryan rawson
> Fix For: 0.90.0
>
>
> With the old Get codepath, we used to sometimes get results sent to the 
> client that weren't fully sorted.  Now that Gets are Scans, results should 
> always be sorted.
> Confirm that we always get back sorted results and if so drop the 
> Result.sorted() method and update javadoc accordingly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1845) MultiGet, MultiDelete, and MultiPut - batched to the appropriate region servers

2010-10-18 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922385#action_12922385
 ] 

ryan rawson commented on HBASE-1845:


an addition to this issue, the as-is unit test was testing the order of 
mixed-operations, in that they were applied in the order they were provided.  
That is not realistic, especially if we want to implement HBASE-2985, which 
does the Puts last in 1 batch.  The regionserver should not be constrained to 
the most optimal order of operations by the API. 

> MultiGet, MultiDelete, and MultiPut - batched to the appropriate region 
> servers
> ---
>
> Key: HBASE-1845
> URL: https://issues.apache.org/jira/browse/HBASE-1845
> Project: HBase
>  Issue Type: New Feature
>Reporter: Erik Holstad
>Assignee: Marc Limotte
> Fix For: 0.90.0
>
> Attachments: batch.patch, hbase-1845-trunk.patch, 
> hbase-1845_0.20.3.patch, hbase-1845_0.20.5.patch, multi-v1.patch
>
>
> I've started to create a general interface for doing these batch/multi calls 
> and would like to get some input and thoughts about how we should handle this 
> and what the protocol should
> look like. 
> First naive patch, coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2753) Remove sorted() methods from Result now that Gets are Scans

2010-10-18 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922384#action_12922384
 ] 

Jonathan Gray commented on HBASE-2753:
--

Should we move this to 0.92 now?  sorted() is already deprecated on trunk as of 
now.

> Remove sorted() methods from Result now that Gets are Scans
> ---
>
> Key: HBASE-2753
> URL: https://issues.apache.org/jira/browse/HBASE-2753
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Jonathan Gray
>Assignee: ryan rawson
> Fix For: 0.90.0
>
>
> With the old Get codepath, we used to sometimes get results sent to the 
> client that weren't fully sorted.  Now that Gets are Scans, results should 
> always be sorted.
> Confirm that we always get back sorted results and if so drop the 
> Result.sorted() method and update javadoc accordingly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3124) TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets occsionally fails with NPE

2010-10-18 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922381#action_12922381
 ] 

Hairong Kuang commented on HBASE-3124:
--

It seems to me this unit test has a race condition. 
ZombieNewLogWriterRegionServer creates a file after detecting that t1 split is 
done. However, there is no synchronization to guarantee that the created before 
logs are archived.

The logs also show the problem:
2010-10-18 13:58:46,249 INFO  [main] util.FSUtils(646): Finished lease recover 
attempt for hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.9
2010-10-18 13:58:46,255 DEBUG [main] wal.HLog(1574): Pushed=20 entries from 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.92010-10-18 13:58:46,256 
DEBUG [SplitWriter-0] wal.HLog$1(1597): Split writer thread for region bbb got 
10 to process
2010-10-18 13:58:46,256 DEBUG [SplitWriter-0] wal.HLog$1(1622): Split writer 
thread for region bbb Applied 10 total edits to bbb in 0ms
2010-10-18 13:58:46,256 DEBUG [SplitWriter-1] wal.HLog$1(1597): Split writer 
thread for region ccc got 10 to process
2010-10-18 13:58:46,257 DEBUG [SplitWriter-1] wal.HLog$1(1622): Split writer 
thread for region ccc Applied 10 total edits to ccc in 0ms2010-10-18 
13:58:46,267 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.0 to 
/hbase/hlog.old/hlog.dat.0
2010-10-18 13:58:46,269 INFO  [ZombieNewLogWriterRegionServer] 
wal.SequenceFileLogWriter(105): Using syncFs -- HDFS-200
2010-10-18 13:58:46,271 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.1 to 
/hbase/hlog.old/hlog.dat.1
2010-10-18 13:58:46,274 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.2 to 
/hbase/hlog.old/hlog.dat.2
2010-10-18 13:58:46,276 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.3 to 
/hbase/hlog.old/hlog.dat.3
2010-10-18 13:58:46,279 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.4 to 
/hbase/hlog.old/hlog.dat.4
2010-10-18 13:58:46,281 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.5 to 
/hbase/hlog.old/hlog.dat.5
2010-10-18 13:58:46,284 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.6 to 
/hbase/hlog.old/hlog.dat.6
2010-10-18 13:58:46,287 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.7 to 
/hbase/hlog.old/hlog.dat.7
2010-10-18 13:58:46,290 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.8 to 
/hbase/hlog.old/hlog.dat.8
2010-10-18 13:58:46,292 INFO  [main] wal.HLog(1666): Archived processed log 
hdfs://localhost.localdomain:54844/hbase/hlog/hlog.dat.9 to 
/hbase/hlog.old/hlog.dat.9
Juliet file creator: created file /hbase/hlog/hlog.dat..juliet
2010-10-18 13:58:46,314 DEBUG [main] wal.HLog(1348): Closed 
/hbase/t1/1183284561/recovered.edits/001
2010-10-18 13:58:46,336 DEBUG [main] wal.HLog(1348): Closed 
/hbase/t1/1386060762/recovered.edits/001
2010-10-18 13:58:46,337 INFO  [main] wal.HLog(1203): Moving 
/hbase/hlog/hlog.dat..juliet to /hbase/hlog.old/hlog.dat..juliet
2010-10-18 13:58:46,339 DEBUG [main] wal.HLog(1207): Moved 1 log files to 
/hbase/hlog.old
2010-10-18 13:58:46,339 WARN  [main] hdfs.DFSClient(320): File /hbase/hlog is 
beng deleted only through Trash org.apache.hadoop.fs.FsShell.delete because all 
deletes must go through Trash.
Deleted /hbase/hlog
2010-10-18 13:58:46,343 INFO  [main] wal.HLog(1217): hlog file splitting 
completed in 184 millis for /hbase/hlog

> TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets occsionally fails with 
> NPE
> ---
>
> Key: HBASE-3124
> URL: https://issues.apache.org/jira/browse/HBASE-3124
> Project: HBase
>  Issue Type: Bug
>Reporter: Hairong Kuang
>
> I am working on an HDFS side patch to reduce the cost of file recovery in 
> HLog#splitLog. when I run HBase unit test, I sometimes see 
> TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets fail with the following 
> error.
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit.testSplitWillNotTouchLogsIfNewHLogGetsCreatedAfterSplitStarted(TestHLogSplit.java:462)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.ref

[jira] Created: (HBASE-3124) TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets occsionally fails with NPE

2010-10-18 Thread Hairong Kuang (JIRA)
TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets occsionally fails with NPE
---

 Key: HBASE-3124
 URL: https://issues.apache.org/jira/browse/HBASE-3124
 Project: HBase
  Issue Type: Bug
Reporter: Hairong Kuang


I am working on an HDFS side patch to reduce the cost of file recovery in 
HLog#splitLog. when I run HBase unit test, I sometimes see 
TestHLogSplit#testSplitWillNotTouchLogsIfNewHLogGets fail with the following 
error.
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit.testSplitWillNotTouchLogsIfNewHLogGetsCreatedAfterSplitStarted(TestHLogSplit.java:462)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2485) Persist Master in-memory state so on restart or failover, new instance can pick up where the old left off

2010-10-18 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-2485.
--

Resolution: Fixed

Newly committed master failover unit tests all passing on hudson.  Resolving!

> Persist Master in-memory state so on restart or failover, new instance can 
> pick up where the old left off
> -
>
> Key: HBASE-2485
> URL: https://issues.apache.org/jira/browse/HBASE-2485
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Jonathan Gray
> Fix For: 0.90.0
>
> Attachments: HBase-State-Transitions.docx
>
>
> Today there was some good stuff up on IRC on how transitions won't always 
> make it across Master failovers in multi-master deploy because transitions 
> are kept in in-memory structure up in the Master and so on master crash, the 
> new master will be missing state  on startup (Todd was main promulgator of 
> this observation and of the opinion that while  master rewrite is scheduled 
> for 0.21, some part needs to be done for 0.20.5).  A few suggestions were 
> made: transitions should be file-backed somehow, etc.  Let this issue be 
> about the subset we want to do for 0.20.5.
> Of the in-memory state queues, there is at least the master tasks queue -- 
> process region opens, closes, regionserver crashes, etc. -- where tasks must 
> be done in order and IIRC, tasks are fairly idempotent (at least in the 
> server crash case, its multi-step and we'll put the crash event back on the 
> queue if we cannot do all steps in the one go).  Perhaps this queue could be 
> done using the new queue facility in zk 3.3.0 (I haven't looked to check if 
> possible, just suggesting).  Another suggestion was a file to which we'd 
> append queue items, requeueing, and marking the file with task complete, etc. 
>  On Master restart or fail-over, we'd replay the queue log.
> There is also the Map of regions-in-transition.  Yesterday we learned that 
> there is a bug where server shutdown processing does not iterate the Map of 
> regions-in-transition.  This Map may hold regions that are in "opening" or 
> "opened" state but haven't yet had the fact added to .META. by master.  
> Meantime the hosting server can crash.  Regions that were opening will stay 
> in the regions-in-transition and those in opened-but-not-yet-added-to-meta 
> will go ahead and add a crashed server to .META. (Currently 
> regions-in-transition does not record server the region opening/open is 
> happening on so it doesn't have enough info to be processed as part of server 
> shutdown).
> Regions-in-transition also needs to be persistant.  On startup, 
> regions-in-transition can get kinda hectic on a big cluster.  Ordering is not 
> so important here I believe.  A directory in zk might work (For 1M regions in 
> a big cluster, that'd be about 2M creates and 2M deletes during startup -- 
> thats too much?).  Or we could write a WAL-like log again of region  
> transitions (We'd have to develop a little vocabulary) that got reread by a 
> new master.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2700) Handle master failover for regions in transition

2010-10-18 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-2700.
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed.  Master failover unit tests all passing.

> Handle master failover for regions in transition
> 
>
> Key: HBASE-2700
> URL: https://issues.apache.org/jira/browse/HBASE-2700
> Project: HBase
>  Issue Type: Sub-task
>  Components: master, zookeeper
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.90.0
>
> Attachments: HBASE-2700-test-v6.patch
>
>
> To this point in HBASE-2692 tasks we have moved everything for regions in 
> transition into ZK, but we have not fully handled the master failover case.  
> This is to deal with that and to write tests for it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922372#action_12922372
 ] 

HBase Review Board commented on HBASE-3119:
---

Message from: "Jonathan Gray" 


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java, line 217
bq.  > 
bq.  >
bq.  > this looks clever, is it more generically useful to other parts of 
hbase?
bq.  
bq.  Andrew Purtell wrote:
bq.  Other parts of HBase use ZKW methods to do this. I brought this in 
here to do the same without pulling in all of the behavior of ZKW I didn't want.
bq.  
bq.  Jonathan Gray wrote:
bq.  Which behaviors of ZKW are you referring to?  Hopefully this component 
is generally reusable (the new ZooKeeperWatcher) and could be used even in 
limited contexts.  Using it as the primary watcher and registering with it also 
helps when writing unit test.  You'd then use ZKUtil methods for this kind of 
stuff and inherit work done there.
bq.  
bq.  We are going to need one more level underneath ZKUtil or underlying 
ZKUtil that manages retry policies and such.  I'm going to target that for 
0.92.  And if all our code uses these APIs then it will be easier to be 
consistent.
bq.  
bq.  The patch looks fine to me though so we can work at unifying later and 
not blocking you on this.
bq.  
bq.  Jonathan Gray wrote:
bq.  It also forces u into good behavior, for example, by needing to pass 
it an Abortable on construction.
bq.  
bq.  Andrew Purtell wrote:
bq.  One issue is the constructor creates or checks znodes that the REST 
interface should not care about. (I'm thinking ahead to when ZK ACLs are in use 
a bit maybe.) Also I wanted automatic retry behavior for setData but that is 
something for which a wrapper around ZKUtil method calls would work. 
bq.  
bq.  Unifying later should not be a big deal.

Good point, totally agree.  That definitely does not belong in the ZKW 
constructor.  The first master should do it and that's it.  Filed HBASE-3123.


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1039/#review1560
---





> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.20.7, 0.92.0
>
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2952) HConnectionManager's shutdown hook interferes with client's operations

2010-10-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-2952.
--

Resolution: Fixed

I'm going to resolve this issue as fixed by HBASE-2669.  We are not doing 
reference counting.  We are just explicitly cleaning up all connections we use 
and punting to the client the need to clean up HConnection instances they make 
(added javadoc to explain the mechanics of HConnection and how it interacts 
with Configuration). 

Lets open new issue if we want to go the reference-count way.

> HConnectionManager's shutdown hook interferes with client's operations
> --
>
> Key: HBASE-2952
> URL: https://issues.apache.org/jira/browse/HBASE-2952
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.0
>Reporter: Prakash Khemani
>
> My HBase client calls incrementColValue() in pairs. If someone kills the 
> client (SIGINT or SIGTERM) I want my client's increment threads to gracefully 
> exit. If a thread has already done one of the incrementColValue() then I want 
> that thread to complete the other incrementColValue() and then exit.
> For this purpose I installed my own shutdownHook(). My shitdownHook() thread 
> 'sugnals' all the threads in my process that it is time to exit and then 
> waits for them to complete.
> The problem is that HConnectionManager's shutdownHook thread also runs and 
> shuts down all connections and IPC threads.
> My increment thread keeps waiting to increment and then times out after 240s. 
> Two problems with this - the incrementColValiue() didn't go through which 
> will increase the chances of inconsistency in my HBase data. And it too 240s 
> to exit. I am pasting some of the messages that the client thread outputs 
> while it tries contact the HBase server.
> Signalled. Exiting ...
> 2010-09-01 12:11:14,769 DEBUG [HCM.shutdownHook] 
> zookeeper.ZooKeeperWrapper(787): 
> Closed 
> connection with ZooKeeper; /hbase/root-region-server
> flushing after 7899
> 2010-09-01 12:11:19,669 DEBUG [Line Processing Thread 0] 
> client.HConnectionManager$TableServers(903): Cache hit for row <> in 
> tableName .META.: location server hadoop2205.snc3.facebook.com:60020, 
> location region name .META.,,1.1028785192
> 2010-09-01 12:11:19,671 INFO  [Line Processing Thread 0] 
> zookeeper.ZooKeeperWrapper(206): Reconnecting to zookeeper
> 2010-09-01 12:11:19,671 DEBUG [Line Processing Thread 0] 
> zookeeper.ZooKeeperWrapper(212): 
> Connected 
> to zookeeper again
> 2010-09-01 12:11:24,679 DEBUG [Line Processing Thread 0] 
> client.HConnectionManager$TableServers(964): Removed .META.,,1.1028785192 for 
> tableName=.META. from cache because of content_action_url_metrics,\x080r& 
> B\xF7\x81_T\x07\x08\x16uOrcom.gigya 429934274290948,99
> 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
> client.HConnectionManager$TableServers(857): locateRegionInMeta attempt 0 of 
> 4 failed; retrying after sleep of 5000 because: The client is stopped
> 2010-09-01 12:11:24,680 DEBUG [Line Processing Thread 0] 
> zookeeper.ZooKeeperWrapper(470): 
> Trying to 
> read /hbase/root-region-server
> 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
> zookeeper.ZooKeeperWrapper(489): 
> Read 
> ZNode /hbase/root-region-server got 10.26.119.190:60020
> 2010-09-01 12:11:24,681 DEBUG [Line Processing Thread 0] 
> client.HConnectionManager$TableServers(1116): Root region location changed. 
> Sleeping.
> ===
> It might be a good idea to only run the HCM shutdown code when all the 
> HTables referring to it have been closed. That way the client can control 
> when the shutdown actually happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3123) ZKW constructor should not always create nodes and should be more amenable to general usage

2010-10-18 Thread Jonathan Gray (JIRA)
ZKW constructor should not always create nodes and should be more amenable to 
general usage
---

 Key: HBASE-3123
 URL: https://issues.apache.org/jira/browse/HBASE-3123
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0


The constructor in ZooKeeperWatcher does too much.  It does things like create 
the layout in ZK.  It's probably not harmful but it's unnecessary.  It should 
ensure that the basenode is created.  Other stuff should at least be stuffed 
into an initializeNodes() method or the like and only called by the first 
active master on cluster startup.

Other components in HBase that want to also access ZK, things like rest, should 
be able to reuse ZKW and ZKUtil easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2669) HCM.shutdownHook causes data loss with hbase.client.write.buffer != 0

2010-10-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2669:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed.  Thanks for the review Jon.  Did all you suggested on commit (Did 
not add back your RIT logging -- you can do taht if you need it).

> HCM.shutdownHook causes data loss with hbase.client.write.buffer != 0
> -
>
> Key: HBASE-2669
> URL: https://issues.apache.org/jira/browse/HBASE-2669
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Reporter: Benoit Sigoure
>Assignee: stack
>Priority: Critical
> Fix For: 0.90.0
>
> Attachments: 2669-v2.txt, 2669.txt
>
>
> In my application I set {{hbase.client.write.buffer}} to a reasonably small 
> value (roughly 64 edits) in order to try to batch a few {{Put}} together 
> before talking to HBase.  When my application does a graceful shutdown, I 
> call {{HTable#flushCommits}} in order to flush any pending change to HBase.  
> I want to do the same thing when I get a {{SIGTERM}} by using 
> {{Runtime#addShutdownHook}} but this is impossible since 
> {{HConnectionManager}} already registers a shutdown hook that invokes 
> {{HConnectionManager#deleteAllConnections}}.  This static method closes all 
> the connections to HBase and then all connections to ZooKeeper.  Because all 
> shutdown hooks run in parallel, my hook will attempt to flush edits while 
> connections are getting closed.
> There is no way to guarantee the order in which the hooks will execute, so I 
> propose that we remove the hook in the HCM altogether and provide some 
> user-visible API they call in their own hook after they're done flushing 
> their stuff, if they really want to do a graceful shutdown.  I expect that a 
> lot of users won't use a hook though, otherwise this issue would have cropped 
> up already.  For those users, connections won't get "gracefully" terminated, 
> but I don't think that would be a problem since the underlying TCP socket 
> will get closed by the OS anyway, so things like ZooKeeper and such should 
> realize that the connection has been terminated and assume the client is 
> gone, and do the necessary clean-up on their side.
> An alternate fix would be to leave the hook in place by default but keep a 
> reference to it and add a user-visible API to be able to un-register the 
> hook.  I find this ugly.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922364#action_12922364
 ] 

HBase Review Board commented on HBASE-3119:
---

Message from: "Andrew Purtell" 


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java, line 217
bq.  > 
bq.  >
bq.  > this looks clever, is it more generically useful to other parts of 
hbase?
bq.  
bq.  Andrew Purtell wrote:
bq.  Other parts of HBase use ZKW methods to do this. I brought this in 
here to do the same without pulling in all of the behavior of ZKW I didn't want.
bq.  
bq.  Jonathan Gray wrote:
bq.  Which behaviors of ZKW are you referring to?  Hopefully this component 
is generally reusable (the new ZooKeeperWatcher) and could be used even in 
limited contexts.  Using it as the primary watcher and registering with it also 
helps when writing unit test.  You'd then use ZKUtil methods for this kind of 
stuff and inherit work done there.
bq.  
bq.  We are going to need one more level underneath ZKUtil or underlying 
ZKUtil that manages retry policies and such.  I'm going to target that for 
0.92.  And if all our code uses these APIs then it will be easier to be 
consistent.
bq.  
bq.  The patch looks fine to me though so we can work at unifying later and 
not blocking you on this.
bq.  
bq.  Jonathan Gray wrote:
bq.  It also forces u into good behavior, for example, by needing to pass 
it an Abortable on construction.

One issue is the constructor creates or checks znodes that the REST interface 
should not care about. (I'm thinking ahead to when ZK ACLs are in use a bit 
maybe.) Also I wanted automatic retry behavior for setData but that is 
something for which a wrapper around ZKUtil method calls would work. 

Unifying later should not be a big deal.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1039/#review1560
---





> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.20.7, 0.92.0
>
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922360#action_12922360
 ] 

HBase Review Board commented on HBASE-3119:
---

Message from: "Jonathan Gray" 


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java, line 217
bq.  > 
bq.  >
bq.  > this looks clever, is it more generically useful to other parts of 
hbase?
bq.  
bq.  Andrew Purtell wrote:
bq.  Other parts of HBase use ZKW methods to do this. I brought this in 
here to do the same without pulling in all of the behavior of ZKW I didn't want.
bq.  
bq.  Jonathan Gray wrote:
bq.  Which behaviors of ZKW are you referring to?  Hopefully this component 
is generally reusable (the new ZooKeeperWatcher) and could be used even in 
limited contexts.  Using it as the primary watcher and registering with it also 
helps when writing unit test.  You'd then use ZKUtil methods for this kind of 
stuff and inherit work done there.
bq.  
bq.  We are going to need one more level underneath ZKUtil or underlying 
ZKUtil that manages retry policies and such.  I'm going to target that for 
0.92.  And if all our code uses these APIs then it will be easier to be 
consistent.
bq.  
bq.  The patch looks fine to me though so we can work at unifying later and 
not blocking you on this.

It also forces u into good behavior, for example, by needing to pass it an 
Abortable on construction.


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1039/#review1560
---





> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.20.7, 0.92.0
>
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-3122) NPE in master.AssignmentManager if all region servers shut down

2010-10-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-3122:


Assignee: stack

> NPE in master.AssignmentManager if all region servers shut down
> ---
>
> Key: HBASE-3122
> URL: https://issues.apache.org/jira/browse/HBASE-3122
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Andrew Purtell
>Assignee: stack
>Priority: Minor
> Fix For: 0.90.0
>
>
> 10/10/18 16:26:44 INFO catalog.CatalogTracker: acer,60020,1287443908850 
> carrying .META.; unsetting .META. location
> 10/10/18 16:26:44 INFO catalog.CatalogTracker: Current cached META location 
> is not valid, resetting
> 10/10/18 16:26:44 INFO handler.ServerShutdownHandler: Splitting logs for 
> acer,60020,1287443908850
> 10/10/18 16:26:44 INFO zookeeper.ZKUtil: hconnection-0x12bc1a2f0a60001 Set 
> watcher on existing znode /hbase/root-region-server
> 10/10/18 16:26:44 INFO catalog.RootLocationEditor: Unsetting ROOT region 
> location in ZooKeeper
> 10/10/18 16:26:44 DEBUG zookeeper.ZKAssign: master:6-0x12bc1a2f0a6 
> Creating (or updating) unassigned node for 70236052 with OFFLINE state
> 10/10/18 16:26:44 WARN master.LoadBalancer: Wanted to do random assignment 
> but no servers to assign to
> 10/10/18 16:26:44 ERROR executor.EventHandler: Caught throwable while 
> processing event M_SERVER_SHUTDOWN
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.LoadBalancer$RegionPlan.toString(LoadBalancer.java:595)
>   at java.lang.String.valueOf(String.java:2826)
>   at java.lang.StringBuilder.append(StringBuilder.java:115)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:803)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:777)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:720)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:640)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:922)
>   at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:97)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:150)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3115) HBaseClient wastes 1 TCP packet per RPC

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922358#action_12922358
 ] 

stack commented on HBASE-3115:
--

I'm +1 too

> HBaseClient wastes 1 TCP packet per RPC
> ---
>
> Key: HBASE-3115
> URL: https://issues.apache.org/jira/browse/HBASE-3115
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hbase-3115.txt
>
>
> In {{ipc/HBaseClient.java}}, the method {{sendParam}} does:
> {code}
> out.writeInt(dataLength);  //first put the data length
> out.write(data, 0, dataLength);//write the data
> {code}
> While analyzing some tcpdump traces tonight, I saw that this consistently 
> translates to 1 TCP packet with a 4 byte payload followed by another TCP 
> packet with the RPC itself.  This makes inefficient use of network resources 
> and adversely affects TCP throughput.  I believe each of those lines 
> translates to a {{write}} system call on the socket's file descriptor 
> (unnecessary system calls are also bad for performance).  The code attempts 
> to call {{out.flush();}} but this approach is ineffective on sockets in Java 
> (as far as I empirically noticed over the past few months).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3122) NPE in master.AssignmentManager if all region servers shut down

2010-10-18 Thread Andrew Purtell (JIRA)
NPE in master.AssignmentManager if all region servers shut down
---

 Key: HBASE-3122
 URL: https://issues.apache.org/jira/browse/HBASE-3122
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Andrew Purtell
Priority: Minor
 Fix For: 0.90.0


10/10/18 16:26:44 INFO catalog.CatalogTracker: acer,60020,1287443908850 
carrying .META.; unsetting .META. location
10/10/18 16:26:44 INFO catalog.CatalogTracker: Current cached META location is 
not valid, resetting
10/10/18 16:26:44 INFO handler.ServerShutdownHandler: Splitting logs for 
acer,60020,1287443908850
10/10/18 16:26:44 INFO zookeeper.ZKUtil: hconnection-0x12bc1a2f0a60001 Set 
watcher on existing znode /hbase/root-region-server
10/10/18 16:26:44 INFO catalog.RootLocationEditor: Unsetting ROOT region 
location in ZooKeeper
10/10/18 16:26:44 DEBUG zookeeper.ZKAssign: master:6-0x12bc1a2f0a6 
Creating (or updating) unassigned node for 70236052 with OFFLINE state
10/10/18 16:26:44 WARN master.LoadBalancer: Wanted to do random assignment but 
no servers to assign to
10/10/18 16:26:44 ERROR executor.EventHandler: Caught throwable while 
processing event M_SERVER_SHUTDOWN
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.LoadBalancer$RegionPlan.toString(LoadBalancer.java:595)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:803)
at 
org.apache.hadoop.hbase.master.AssignmentManager.getRegionPlan(AssignmentManager.java:777)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:720)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:640)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:922)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:97)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:150)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922352#action_12922352
 ] 

HBase Review Board commented on HBASE-3119:
---

Message from: "Jonathan Gray" 


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java, line 217
bq.  > 
bq.  >
bq.  > this looks clever, is it more generically useful to other parts of 
hbase?
bq.  
bq.  Andrew Purtell wrote:
bq.  Other parts of HBase use ZKW methods to do this. I brought this in 
here to do the same without pulling in all of the behavior of ZKW I didn't want.

Which behaviors of ZKW are you referring to?  Hopefully this component is 
generally reusable (the new ZooKeeperWatcher) and could be used even in limited 
contexts.  Using it as the primary watcher and registering with it also helps 
when writing unit test.  You'd then use ZKUtil methods for this kind of stuff 
and inherit work done there.

We are going to need one more level underneath ZKUtil or underlying ZKUtil that 
manages retry policies and such.  I'm going to target that for 0.92.  And if 
all our code uses these APIs then it will be easier to be consistent.

The patch looks fine to me though so we can work at unifying later and not 
blocking you on this.


- Jonathan


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1039/#review1560
---





> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.20.7, 0.92.0
>
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2985) HRegionServer.multi() no longer calls HRegion.put(List) when possible

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922351#action_12922351
 ] 

HBase Review Board commented on HBASE-2985:
---

Message from: "Ryan Rawson" 


bq.  On 2010-10-18 16:15:13, stack wrote:
bq.  > 
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java, 
line 2357
bq.  > 
bq.  >
bq.  > Remove this message on commit
bq.  >

i think it's a legit comment, people might need that printf debugging in prod 
:-)


- Ryan


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1038/#review1566
---





> HRegionServer.multi() no longer calls HRegion.put(List) when possible
> -
>
> Key: HBASE-2985
> URL: https://issues.apache.org/jira/browse/HBASE-2985
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.89.20100621
>Reporter: ryan rawson
>Assignee: ryan rawson
> Fix For: 0.90.0
>
>
> This should result in a reduce performance of puts in batched mode

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2514) RegionServer should refuse to be assigned a region that use LZO when LZO isn't available

2010-10-18 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922346#action_12922346
 ] 

ryan rawson commented on HBASE-2514:


im thinking we might want to do a different approach, since checking the schema 
isnt comprehensive enough as JD pointed out above.  Instead if we had a list of 
"required codecs" we could abort the regionserver during startup.  This would 
allow the admin to take action to correct the deployment.

I'm not sure there is a good reason to have only part of your RS being able to 
open LZO regions and others passing and having the master bounce regions around 
until they stick.

> RegionServer should refuse to be assigned a region that use LZO when LZO 
> isn't available
> 
>
> Key: HBASE-2514
> URL: https://issues.apache.org/jira/browse/HBASE-2514
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: ryan rawson
>Priority: Critical
> Fix For: 0.90.0
>
>
> If a RegionServer is assigned a region that uses LZO but the required 
> libraries aren't installed on that RegionServer, the server will fail 
> unexpectedly after throwing a {{java.lang.ClassNotFoundException: 
> com.hadoop.compression.lzo.LzoCodec}}
> {code}
> 2010-05-04 16:57:27,258 FATAL 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Replay of hlog 
> required. Forcing server shutdown
> org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1273011287339
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:994)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:887)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:255)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:142)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> com.hadoop.compression.lzo.LzoCodec
> at 
> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:91)
> at 
> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:196)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:388)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:374)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:345)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:517)
> at 
> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:482)
> at 
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:558)
> at 
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:522)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:979)
> ... 3 more
> Caused by: java.lang.ClassNotFoundException: 
> com.hadoop.compression.lzo.LzoCodec
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:315)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:250)
> at 
> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:87)
> ... 12 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-3119:
--

Fix Version/s: 0.92.0
   0.20.7

For > 0.90.

> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.20.7, 0.92.0
>
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3121) [rest] Do not perform cache control when returning results

2010-10-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-3121.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to trunk and 0.20 branch. Thanks for the review Ryan.

> [rest] Do not perform cache control when returning results
> --
>
> Key: HBASE-3121
> URL: https://issues.apache.org/jira/browse/HBASE-3121
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.20.7, 0.90.0
>
> Attachments: HBASE-3121.patch
>
>
> The REST interface currently provides MaxAge hints to HTTP cache servers when 
> returning results, but does not do so in a way that makes much sense. For 
> some other responses such as scanner results or schema, the REST interface 
> provides a NoCache hint. That seems appropriate. Otherwise, especially given 
> the rich configuration languages of caching servers such as Varnish, it is 
> probably not appropriate to manage cache policy in the REST interface. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922340#action_12922340
 ] 

HBase Review Board commented on HBASE-3119:
---

Message from: "Ryan Rawson" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1039/#review1565
---

Ship it!


- Ryan





> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922339#action_12922339
 ] 

HBase Review Board commented on HBASE-3119:
---

Message from: "Andrew Purtell" 


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/HConstants.java, line 139
bq.  > 
bq.  >
bq.  > do we want to migrate class-specific static config to those classes?

There are statics for other daemon ports nearby. I was following this 
convention.


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java, line 217
bq.  > 
bq.  >
bq.  > this looks clever, is it more generically useful to other parts of 
hbase?

Other parts of HBase use ZKW methods to do this. I brought this in here to do 
the same without pulling in all of the behavior of ZKW I didn't want. 


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java, line 226
bq.  > 
bq.  >
bq.  > redundant assignment?

No.


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java, line 362
bq.  > 
bq.  >
bq.  > is this better called startZooKeeperClient?

Ok, can do on commit.


bq.  On 2010-10-18 15:34:45, Ryan Rawson wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java, line 381
bq.  > 
bq.  >
bq.  > does this mean that stats are only updated every minute by default?

Yes.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1039/#review1560
---





> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3121) [rest] Do not perform cache control when returning results

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922336#action_12922336
 ] 

HBase Review Board commented on HBASE-3121:
---

Message from: "Ryan Rawson" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1041/#review1562
---

Ship it!


- Ryan





> [rest] Do not perform cache control when returning results
> --
>
> Key: HBASE-3121
> URL: https://issues.apache.org/jira/browse/HBASE-3121
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.20.7, 0.90.0
>
> Attachments: HBASE-3121.patch
>
>
> The REST interface currently provides MaxAge hints to HTTP cache servers when 
> returning results, but does not do so in a way that makes much sense. For 
> some other responses such as scanner results or schema, the REST interface 
> provides a NoCache hint. That seems appropriate. Otherwise, especially given 
> the rich configuration languages of caching servers such as Varnish, it is 
> probably not appropriate to manage cache policy in the REST interface. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3121) [rest] Do not perform cache control when returning results

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922335#action_12922335
 ] 

HBase Review Board commented on HBASE-3121:
---

Message from: "Andrew Purtell" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1041/
---

Review request for hbase.


Summary
---

The REST interface currently provides MaxAge hints to HTTP cache servers when 
returning results, but does not do so in a way that makes much sense. For some 
other responses such as scanner results or schema, the REST interface provides 
a NoCache hint. That seems appropriate. Otherwise, especially given the rich 
configuration languages of caching servers such as Varnish, it is probably not 
appropriate to manage cache policy in the REST interface. 


This addresses bug HBASE-3121.
http://issues.apache.org/jira/browse/HBASE-3121


Diffs
-

  src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java ed92857 
  src/main/java/org/apache/hadoop/hbase/rest/RowResource.java a4cdd1c 
  src/main/java/org/apache/hadoop/hbase/rest/SchemaResource.java 0c134aa 

Diff: http://review.cloudera.org/r/1041/diff


Testing
---


Thanks,

Andrew




> [rest] Do not perform cache control when returning results
> --
>
> Key: HBASE-3121
> URL: https://issues.apache.org/jira/browse/HBASE-3121
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.20.7, 0.90.0
>
> Attachments: HBASE-3121.patch
>
>
> The REST interface currently provides MaxAge hints to HTTP cache servers when 
> returning results, but does not do so in a way that makes much sense. For 
> some other responses such as scanner results or schema, the REST interface 
> provides a NoCache hint. That seems appropriate. Otherwise, especially given 
> the rich configuration languages of caching servers such as Varnish, it is 
> probably not appropriate to manage cache policy in the REST interface. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3121) [rest] Do not perform cache control when returning results

2010-10-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-3121:
--

Attachment: HBASE-3121.patch

> [rest] Do not perform cache control when returning results
> --
>
> Key: HBASE-3121
> URL: https://issues.apache.org/jira/browse/HBASE-3121
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.20.7, 0.90.0
>
> Attachments: HBASE-3121.patch
>
>
> The REST interface currently provides MaxAge hints to HTTP cache servers when 
> returning results, but does not do so in a way that makes much sense. For 
> some other responses such as scanner results or schema, the REST interface 
> provides a NoCache hint. That seems appropriate. Otherwise, especially given 
> the rich configuration languages of caching servers such as Varnish, it is 
> probably not appropriate to manage cache policy in the REST interface. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3121) [rest] Do not perform cache control when returning results

2010-10-18 Thread Andrew Purtell (JIRA)
[rest] Do not perform cache control when returning results
--

 Key: HBASE-3121
 URL: https://issues.apache.org/jira/browse/HBASE-3121
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.20.7, 0.90.0
 Attachments: HBASE-3121.patch

The REST interface currently provides MaxAge hints to HTTP cache servers when 
returning results, but does not do so in a way that makes much sense. For some 
other responses such as scanner results or schema, the REST interface provides 
a NoCache hint. That seems appropriate. Otherwise, especially given the rich 
configuration languages of caching servers such as Varnish, it is probably not 
appropriate to manage cache policy in the REST interface. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922333#action_12922333
 ] 

HBase Review Board commented on HBASE-3119:
---

Message from: "Ryan Rawson" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1039/#review1560
---



src/main/java/org/apache/hadoop/hbase/HConstants.java


do we want to migrate class-specific static config to those classes?



src/main/java/org/apache/hadoop/hbase/rest/Main.java


where did DEFAULT_LISTEN_PORT go to? i am not seeing it removed?



src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java


this looks clever, is it more generically useful to other parts of hbase?



src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java


redundant assignment?



src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java


ditto on this one. i wonder if there might be an elegant way to retry or 
redo zk operations, but there might not be.



src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java


is this better called startZooKeeperClient?



src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java


does this mean that stats are only updated every minute by default?


- Ryan





> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2669) HCM.shutdownHook causes data loss with hbase.client.write.buffer != 0

2010-10-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2669:
-

Attachment: 2669-v2.txt

Version 2.  In this version, we add more explicit closeups of HConnection and 
then add a bunch of javadoc explaining how HConnection works and how its 
cleanup is done.

{code}
A set of changes that allow doing away with shutdown hook in client.

M src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
  Removed unused import and changed message from info to debug
  -- when info it shows in shell whenever we run a command.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
  Changed message from info to debug -- when info it shows in shell
  whenever we run a command.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Changed order; online region before we tell everyone where the
  region is (I changed this order recently but reviewing comments
  and issues I can't figure why I did it -- I think there was a
  reason but can't recall so just put this back until we trip
  over the issue again.  My change made it so that we had
  strange issue where we'd get a NSRE though the region was coming
  up here on this server... rather than do retries of NSREs,
  put it into online servers before updating zk... again).
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Removed noisy message
M src/main/java/org/apache/hadoop/hbase/master/LogCleanerDelegate.java
  Have this interface implement Stoppable... Some implemenations
  need their Stop called.
M src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
  Startup wont work w/o this change.
M src/main/java/org/apache/hadoop/hbase/master/LogCleaner.java
  Wrap the chore run so we can call the stop on all log cleaners.
M src/main/java/org/apache/hadoop/hbase/master/TimeToLiveLogCleaner.java
M 
src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
  Implement Stoppable
M src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
  Cleanup all connections on way out.
M src/main/java/org/apache/hadoop/hbase/util/HMerge.java
  Cleanup proxies was false.
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
M src/main/java/org/apache/hadoop/hbase/client/HConnection.java
  Javadoc explaining how HConnections work.
M src/main/java/org/apache/hadoop/hbase/client/HTablePool.java
  Make it so we make a new Configuration and that we then
  do our own cleanup when pool is shutdown.
M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
  Use alternate method now the one that takes a Connection removed.
M src/main/java/org/apache/hadoop/hbase/client/HTable.java
  More javadoc on how HConnection works.
{code}

> HCM.shutdownHook causes data loss with hbase.client.write.buffer != 0
> -
>
> Key: HBASE-2669
> URL: https://issues.apache.org/jira/browse/HBASE-2669
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Reporter: Benoit Sigoure
>Assignee: stack
>Priority: Critical
> Fix For: 0.90.0
>
> Attachments: 2669-v2.txt, 2669.txt
>
>
> In my application I set {{hbase.client.write.buffer}} to a reasonably small 
> value (roughly 64 edits) in order to try to batch a few {{Put}} together 
> before talking to HBase.  When my application does a graceful shutdown, I 
> call {{HTable#flushCommits}} in order to flush any pending change to HBase.  
> I want to do the same thing when I get a {{SIGTERM}} by using 
> {{Runtime#addShutdownHook}} but this is impossible since 
> {{HConnectionManager}} already registers a shutdown hook that invokes 
> {{HConnectionManager#deleteAllConnections}}.  This static method closes all 
> the connections to HBase and then all connections to ZooKeeper.  Because all 
> shutdown hooks run in parallel, my hook will attempt to flush edits while 
> connections are getting closed.
> There is no way to guarantee the order in which the hooks will execute, so I 
> propose that we remove the hook in the HCM altogether and provide some 
> user-visible API they call in their own hook after they're done flushing 
> their stuff, if they really want to do a graceful shutdown.  I expect that a 
> lot of users won't use a hook though, otherwise this issue would have cropped 
> up already.  For those users, connections won't get "gracefully" terminated, 
> but I don't think that would be a problem since the underlying TCP socket 
> will get closed by the OS anyway, so things like ZooKeeper and such should 
> realize that the connection has been terminated and assume the client is 
> gone, and do the necessary clean-up on their side.
> An alternate fix would be to leave the hook in place by default but keep a 
> reference to it and a

[jira] Created: (HBASE-3120) [rest] content transcoding

2010-10-18 Thread Andrew Purtell (JIRA)
[rest] content transcoding
--

 Key: HBASE-3120
 URL: https://issues.apache.org/jira/browse/HBASE-3120
 Project: HBase
  Issue Type: Improvement
  Components: rest
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor


We have a reasonable user request for support for decoding of base64 encoded 
values into raw/binary when servicing GET requests with an Accept header of 
{{application/octet-stream}}. 

We can introduce a table schema attribute for column families that instructs 
the REST gateway to perform input and/or output transcoding, with 
base64->binary (for GET) and vice versa (for PUT or POST) as the first 
supported option. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922287#action_12922287
 ] 

HBase Review Board commented on HBASE-3119:
---

Message from: "Andrew Purtell" 

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1039/
---

Review request for hbase.


Summary
---

This change allows the REST interface to publish its endpoint and metrics, 
currently only requests/sec, into ZooKeeper. By default a permanent znode tree 
is created as needed at /hbase/rest/status and Stargate instances create 
ephemeral children of this with names in the format :. The 
ephemeral znodes contain JSON serialized information about the instance, e.g.

  
{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"}}

The function of Stargate itself is not affected, except for one significant 
change: now if the ZooKeeper service is lost, the Stargate instances will abort 
along with the rest of HBase.


This addresses bug HBASE-3119.
http://issues.apache.org/jira/browse/HBASE-3119


Diffs
-

  src/main/java/org/apache/hadoop/hbase/HConstants.java 71c3e7b 
  src/main/java/org/apache/hadoop/hbase/rest/Main.java 368b4b4 
  src/main/java/org/apache/hadoop/hbase/rest/RESTServlet.java ed92857 
  src/main/java/org/apache/hadoop/hbase/rest/metrics/RESTMetrics.java 284bbc5 

Diff: http://review.cloudera.org/r/1039/diff


Testing
---


Thanks,

Andrew




> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-3119:
--

Priority: Minor  (was: Major)

> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-3119:
--

Attachment: HBASE-3119.patch

> [rest] publish endpoint and statistics into ZooKeeper
> -
>
> Key: HBASE-3119
> URL: https://issues.apache.org/jira/browse/HBASE-3119
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Attachments: HBASE-3119.patch
>
>
> This change allows the REST interface to publish its endpoint and metrics, 
> currently only requests/sec, into ZooKeeper. By default a permanent znode 
> tree is created as needed at {{/hbase/rest/status}} and Stargate instances 
> create ephemeral children of this with names in the format {{:}}. 
> The ephemeral znodes contain JSON serialized information about the instance, 
> e.g.
> 
> {{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"
> The function of Stargate itself is not affected, except for one significant 
> change: now if the ZooKeeper service is lost, the Stargate instances will 
> abort along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3119) [rest] publish endpoint and statistics into ZooKeeper

2010-10-18 Thread Andrew Purtell (JIRA)
[rest] publish endpoint and statistics into ZooKeeper
-

 Key: HBASE-3119
 URL: https://issues.apache.org/jira/browse/HBASE-3119
 Project: HBase
  Issue Type: Improvement
  Components: rest
Reporter: Andrew Purtell
Assignee: Andrew Purtell


This change allows the REST interface to publish its endpoint and metrics, 
currently only requests/sec, into ZooKeeper. By default a permanent znode tree 
is created as needed at {{/hbase/rest/status}} and Stargate instances create 
ephemeral children of this with names in the format {{:}}. The 
ephemeral znodes contain JSON serialized information about the instance, e.g.


{{{"connector":{"host":"restserver.example.com","port":"8080"},"statistics":{"requests":"13"

The function of Stargate itself is not affected, except for one significant 
change: now if the ZooKeeper service is lost, the Stargate instances will abort 
along with the rest of HBase. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2751) Consider closing StoreFiles sometimes

2010-10-18 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922258#action_12922258
 ] 

ryan rawson commented on HBASE-2751:


We need to move to a DFSClient that doesn't keep sockets open when a file is 
opened.  We should only have 1 socket open against the current Block's 
datanode, and we open and close them for pread() calls as they happen.  

jbooth's hdfs patch should probably be mentioned and revived at some point.

> Consider closing StoreFiles sometimes
> -
>
> Key: HBASE-2751
> URL: https://issues.apache.org/jira/browse/HBASE-2751
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Priority: Minor
>
> Having a lot of regions per region server could be considered harmless if 
> most of them aren't used, but that's not really true at the moment. We keep 
> all files opened all the time (except for rolled HLogs). I'm thinking of 2 
> solutions
>  # Lazy open the store files, or at least close them down after we read the 
> file info. Or we could do this for every file except the most recent one.
>  # Close files when they're not in use. We need some heuristic to determine 
> when is the best moment to declare that a file can be closed. 
> Both solutions go hand in hand, and I think it would be a huge gain in order 
> to lower the ulimit and xceivers-related issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2751) Consider closing StoreFiles sometimes

2010-10-18 Thread Daniel Einspanjer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922251#action_12922251
 ] 

Daniel Einspanjer commented on HBASE-2751:
--

I don't believe this should be a minor issue.  It seems to be at the heart of 
Mozilla's recent cluster instability as over time, each server in our 19 node 
cluster opens more and more connections to all the others and eventually, when 
we get around 10k connections, we have to restart the cluster due to client lag.

If there were configurations for the number of open connections to trigger 
cleanup and a threshold for how many to close on each cleanup, we could use an 
LRU to close the oldest connections. This would impose an access penalty on 
those unopened regions (which we currently have to pay already the first time 
they are accessed) but would prevent connection overload.

> Consider closing StoreFiles sometimes
> -
>
> Key: HBASE-2751
> URL: https://issues.apache.org/jira/browse/HBASE-2751
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jean-Daniel Cryans
>Priority: Minor
>
> Having a lot of regions per region server could be considered harmless if 
> most of them aren't used, but that's not really true at the moment. We keep 
> all files opened all the time (except for rolled HLogs). I'm thinking of 2 
> solutions
>  # Lazy open the store files, or at least close them down after we read the 
> file info. Or we could do this for every file except the most recent one.
>  # Close files when they're not in use. We need some heuristic to determine 
> when is the best moment to declare that a file can be closed. 
> Both solutions go hand in hand, and I think it would be a huge gain in order 
> to lower the ulimit and xceivers-related issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3118) HBaseAdmin.createTable and modifyTable should have a family check

2010-10-18 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922234#action_12922234
 ] 

Jonathan Gray commented on HBASE-3118:
--

@Kannan, Yeah HBASE-2984 is currently a blocker against 0.90 to do just that

> HBaseAdmin.createTable and modifyTable should have a family check
> -
>
> Key: HBASE-3118
> URL: https://issues.apache.org/jira/browse/HBASE-3118
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.20.6
> Environment: 0.20.6, Linux, Java 1.6+
>Reporter: Vaibhav Puranik
>
> Currently HBaseAdmin.modifyTable does not check whether any family is 
> supplied or not. I can execute this method without supplying any families and 
> that puts the table in inconsistent state.
> In my opinion, modifyTable should let you modifyTable without overwriting. 
> But looks like it doesn't do that. This can be very dangerous as you can end 
> up overwriting a table in production and loose data. But if we are going to 
> keep the behavior then at least some checks should be added to make it safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-1015) pure C and C++ client libraries

2010-10-18 Thread Benoit Sigoure (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure updated HBASE-1015:
--

Status: Open  (was: Patch Available)

> pure C and C++ client libraries
> ---
>
> Key: HBASE-1015
> URL: https://issues.apache.org/jira/browse/HBASE-1015
> Project: HBase
>  Issue Type: New Feature
>  Components: client
>Affects Versions: 0.20.6
>Reporter: Andrew Purtell
>Priority: Minor
> Fix For: 0.92.0
>
>
> If via HBASE-794 first class support for talking via Thrift directly to 
> HMaster and HRS is available, then pure C and C++ client libraries are 
> possible. 
> The C client library would wrap a Thrift core. 
> The C++ client library can provide a class hierarchy quite close to 
> o.a.h.h.client and, ideally, identical semantics. It  should be just a 
> wrapper around the C API, for economy.
> Internally to my employer there is a lot of resistance to HBase because many 
> dev teams have a strong C/C++ bias. The real issue however is really client 
> side integration, not a fundamental objection. (What runs server side and how 
> it is managed is a secondary consideration.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3115) HBaseClient wastes 1 TCP packet per RPC

2010-10-18 Thread Benoit Sigoure (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1293#action_1293
 ] 

Benoit Sigoure commented on HBASE-3115:
---

+1 Todd, thanks a bunch.

> HBaseClient wastes 1 TCP packet per RPC
> ---
>
> Key: HBASE-3115
> URL: https://issues.apache.org/jira/browse/HBASE-3115
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hbase-3115.txt
>
>
> In {{ipc/HBaseClient.java}}, the method {{sendParam}} does:
> {code}
> out.writeInt(dataLength);  //first put the data length
> out.write(data, 0, dataLength);//write the data
> {code}
> While analyzing some tcpdump traces tonight, I saw that this consistently 
> translates to 1 TCP packet with a 4 byte payload followed by another TCP 
> packet with the RPC itself.  This makes inefficient use of network resources 
> and adversely affects TCP throughput.  I believe each of those lines 
> translates to a {{write}} system call on the socket's file descriptor 
> (unnecessary system calls are also bad for performance).  The code attempts 
> to call {{out.flush();}} but this approach is ineffective on sockets in Java 
> (as far as I empirically noticed over the past few months).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2700) Handle master failover for regions in transition

2010-10-18 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-2700:
-

Attachment: HBASE-2700-test-v6.patch

Patch +1'd by Stack on rb.  Thanks for review.

This adds three unit tests of master failover.  The first is a basic test that 
we backup masters will take over.  The second is a test of master failover 
happening concurrently with regions in transition.  The third is a test of 
concurrent RIT as well as concurrent RS failure.  Tables/regions are used from 
both enabled and disabled tables as this is one of the trickier parts, ensuring 
disabled tables stay offline properly regardless of what state they were in 
during master failover.

Tests are passing locally consistently.  Committing this patch, let's see what 
hudson says.

> Handle master failover for regions in transition
> 
>
> Key: HBASE-2700
> URL: https://issues.apache.org/jira/browse/HBASE-2700
> Project: HBase
>  Issue Type: Sub-task
>  Components: master, zookeeper
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>Priority: Critical
> Fix For: 0.90.0
>
> Attachments: HBASE-2700-test-v6.patch
>
>
> To this point in HBASE-2692 tasks we have moved everything for regions in 
> transition into ZK, but we have not fully handled the master failover case.  
> This is to deal with that and to write tests for it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3118) HBaseAdmin.createTable and modifyTable should have a family check

2010-10-18 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922214#action_12922214
 ] 

Kannan Muthukkaruppan commented on HBASE-3118:
--

+1.

On a related note, if you use alter table to change the settings for a CF, you 
must provide all the non-default settings once again even if you don't want to 
change them.

For example, if you had VERSION => 5, COMPRESSION => 'NONE' on a column family, 
and you did an alter, and provided just COMPRESSION => 'LZO', then the VERSION 
setting reverts back to the default (3).

It would be nice if the "alter" read the old state, and applied only the delta 
change.



> HBaseAdmin.createTable and modifyTable should have a family check
> -
>
> Key: HBASE-3118
> URL: https://issues.apache.org/jira/browse/HBASE-3118
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.20.6
> Environment: 0.20.6, Linux, Java 1.6+
>Reporter: Vaibhav Puranik
>
> Currently HBaseAdmin.modifyTable does not check whether any family is 
> supplied or not. I can execute this method without supplying any families and 
> that puts the table in inconsistent state.
> In my opinion, modifyTable should let you modifyTable without overwriting. 
> But looks like it doesn't do that. This can be very dangerous as you can end 
> up overwriting a table in production and loose data. But if we are going to 
> keep the behavior then at least some checks should be added to make it safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3096) TestCompaction times out in latest release

2010-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922208#action_12922208
 ] 

stack commented on HBASE-3096:
--

+1

> TestCompaction times out in latest release
> --
>
> Key: HBASE-3096
> URL: https://issues.apache.org/jira/browse/HBASE-3096
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.89.20100924
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hbase-3096.txt
>
>
> TestCompaction is timing out in 0.89.20100924. It's using HRegion directly 
> and writing too much data, so the writes start blocking forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3001) Ship dependency jars to the cluster for all jobs

2010-10-18 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3001.
--

   Resolution: Fixed
Fix Version/s: 0.90.0
 Hadoop Flags: [Reviewed]

Resolving.  Patch was committed while back.

> Ship dependency jars to the cluster for all jobs
> 
>
> Key: HBASE-3001
> URL: https://issues.apache.org/jira/browse/HBASE-3001
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.90.0
>
> Attachments: 3001-v2.txt, hbase-3001.txt
>
>
> It would be handy if we automatically shipped dependency jars to the cluster 
> with jobs by default. This makes it easier to run HBase without changing 
> hadoop-env.sh on the cluster. We already have some utilities here from a 
> previous JIRA, but it didn't get fully integrated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3101) bin assembly doesn't include -tests or -source jars

2010-10-18 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HBASE-3101.


   Resolution: Fixed
Fix Version/s: 0.90.0
 Hadoop Flags: [Reviewed]

> bin assembly doesn't include -tests or -source jars
> ---
>
> Key: HBASE-3101
> URL: https://issues.apache.org/jira/browse/HBASE-3101
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.90.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.90.0
>
> Attachments: pom-fix.txt
>
>
> Currently the bin assembly tries to include the hbase-VERSION-tests.jar but 
> there's a typo "test" instead of "tests" in the assembly descriptor, so it 
> doesn't do so. Also, I think we should probably ship the -sources jar even in 
> the -bin assembly - it's useful for people to point their IDE at it to get 
> API javadocs, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3076) Allow to disable automatic shipping of dependency jars for mapreduce jobs

2010-10-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922186#action_12922186
 ] 

Todd Lipcon commented on HBASE-3076:


Hey Bruno. Looks like this fell out of date against trunk, sorry about that. 
Would you mind uploading a new patch?

> Allow to disable automatic shipping of dependency jars for mapreduce jobs
> -
>
> Key: HBASE-3076
> URL: https://issues.apache.org/jira/browse/HBASE-3076
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Bruno Dumon
>Assignee: Bruno Dumon
>Priority: Minor
> Attachments: tablemapreduceutil-optional-adddependencies-patch.txt
>
>
> Since HBASE-3001, TableMapReduceUtil.initTableMap/ReduceJob will 
> automatically upload the HBase jars needed to execute a map reduce job.
> In my case I am building a job jar using Maven's assembly plugin, this way 
> all the necessary dependencies are in the job jar. So in such case the 
> default behavior of HBase causes some needless upload work. It also uploads 
> hadoop-core itself which is not necessary.
> Therefore I propose to add a variant of the initTableMap/ReduceJob methods 
> with an extra boolean argument to disable the automatic adding of dependency 
> jars.
> I will attach a patch with the proposed change.
> Note that everything works as is, this is just an optimization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-3076) Allow to disable automatic shipping of dependency jars for mapreduce jobs

2010-10-18 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HBASE-3076:
--

Assignee: Bruno Dumon

> Allow to disable automatic shipping of dependency jars for mapreduce jobs
> -
>
> Key: HBASE-3076
> URL: https://issues.apache.org/jira/browse/HBASE-3076
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Bruno Dumon
>Assignee: Bruno Dumon
>Priority: Minor
> Attachments: tablemapreduceutil-optional-adddependencies-patch.txt
>
>
> Since HBASE-3001, TableMapReduceUtil.initTableMap/ReduceJob will 
> automatically upload the HBase jars needed to execute a map reduce job.
> In my case I am building a job jar using Maven's assembly plugin, this way 
> all the necessary dependencies are in the job jar. So in such case the 
> default behavior of HBase causes some needless upload work. It also uploads 
> hadoop-core itself which is not necessary.
> Therefore I propose to add a variant of the initTableMap/ReduceJob methods 
> with an extra boolean argument to disable the automatic adding of dependency 
> jars.
> I will attach a patch with the proposed change.
> Note that everything works as is, this is just an optimization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3118) HBaseAdmin.createTable and modifyTable should have a family check

2010-10-18 Thread Vaibhav Puranik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Puranik updated HBASE-3118:
---

Affects Version/s: 0.20.6

> HBaseAdmin.createTable and modifyTable should have a family check
> -
>
> Key: HBASE-3118
> URL: https://issues.apache.org/jira/browse/HBASE-3118
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.20.6
> Environment: 0.20.6, Linux, Java 1.6+
>Reporter: Vaibhav Puranik
>
> Currently HBaseAdmin.modifyTable does not check whether any family is 
> supplied or not. I can execute this method without supplying any families and 
> that puts the table in inconsistent state.
> In my opinion, modifyTable should let you modifyTable without overwriting. 
> But looks like it doesn't do that. This can be very dangerous as you can end 
> up overwriting a table in production and loose data. But if we are going to 
> keep the behavior then at least some checks should be added to make it safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3118) HBaseAdmin.createTable and modifyTable should have a family check

2010-10-18 Thread Vaibhav Puranik (JIRA)
HBaseAdmin.createTable and modifyTable should have a family check
-

 Key: HBASE-3118
 URL: https://issues.apache.org/jira/browse/HBASE-3118
 Project: HBase
  Issue Type: Improvement
 Environment: 0.20.6, Linux, Java 1.6+
Reporter: Vaibhav Puranik


Currently HBaseAdmin.modifyTable does not check whether any family is supplied 
or not. I can execute this method without supplying any families and that puts 
the table in inconsistent state.

In my opinion, modifyTable should let you modifyTable without overwriting. But 
looks like it doesn't do that. This can be very dangerous as you can end up 
overwriting a table in production and loose data. But if we are going to keep 
the behavior then at least some checks should be added to make it safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2881) TestAdmin intermittent failures: Race condition during createTable can result in region double assignment

2010-10-18 Thread Kannan Muthukkaruppan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-2881:
-

Attachment: HBASE-2881_0.89.txt

Note: This doesn't apply to new master as it doesn't rely on the base scannner. 

Confirmed in a reproducible manner that the race condition genuinely existed by 
putting a sleep of 2 mins prior to (step #5) setting the region unassigned in 
master's in-memory state. By this time, the base scanner assigns out the new 
region; and then, setting the region unassigned in master's state (step 5) 
after the sleep causes it to get "unassigned" once again, and hence assigned 
out a second time to a new region server.

The fix basically no longer sets the newly created region unassigned in the 
master's in-memory state. Instead, after updating META (with region marked 
offline) for all the newly created regions, it simply schedules the meta 
scanner to be run immediately. So the regions get assigned without delay as 
before, but without the race condition.

> TestAdmin intermittent failures: Race condition during createTable can result 
> in region double assignment
> -
>
> Key: HBASE-2881
> URL: https://issues.apache.org/jira/browse/HBASE-2881
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
>Assignee: Jonathan Gray
> Attachments: HBASE-2881_0.89.txt
>
>
> The TestAdmin test fails on trunk intermittently because it is unable to 
> "enable" a "disabled" table. However, the root cause seems to be that much 
> earlier, at "createTable" time the table's region got assigned to 2 region 
> servers. And this later confuses the "disable"/"enable" code.
> createTable goes down to RegionManager.java:createRegion:
> {code}
> public void createRegion(HRegionInfo newRegion, HRegionInterface server,
>   byte [] metaRegionName)
>   throws IOException {
> // 2. Create the HRegion
> HRegion region = HRegion.createHRegion(newRegion, 
> this.master.getRootDir(),
>   master.getConfiguration());
> // 3. Insert into meta
> HRegionInfo info = region.getRegionInfo();
> byte [] regionName = region.getRegionName();
> Put put = new Put(regionName);
> put.add(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER,
> Writables.getBytes(info));
> server.put(metaRegionName, put);
> // 4. Close the new region to flush it to disk.  Close its log file too.
> region.close();
> region.getLog().closeAndDelete();
> // 5. Get it assigned to a server
> setUnassigned(info, true);
>   }
> {code}
> Between, after #3, but before #5, if the MetaScanner runs, it'll find this 
> region in unassigned state and also assign it out.
> And then #5 comes along at again "force" sets this region to be unassigned... 
> causing it to get assigned again to a different region server (as part of the 
> RegionManager's job of assigning out regions waiting to be assigned along 
> with region server heart beats).
> ---
> The test in question that diffs is TestAdmin:testHundredsOfTable(). I tried 
> repro'ing this more reliable by modifying the test to have the metascanner 
> run more frequently:
> {code}
>   
> TEST_UTIL.getConfiguration().setInt("hbase.master.meta.thread.rescanfrequency",
>  1000);// 1 seconds
> {code}
> (instead of the default 60seconds); but it didn't help improve the 
> reproducibility.
> ---

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1744) Thrift server to match the new java api.

2010-10-18 Thread Bryan Duxbury (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922173#action_12922173
 ] 

Bryan Duxbury commented on HBASE-1744:
--

Stack asked me to take a look at your Thrift IDL. Here are the comments I came 
up with:
- It's a little confusing to typedef binary to Bytes, but that's all cosmetic.
- All your files (types, admin, client) use the same java namespace. This means 
they'll all generate a Constants.java with the same path, which means 2/3 of 
them will be clobbered. You only have constants in one of the files, which is 
good, but I think there's a good chance that class will be destroyed.

Otherwise, looks good to me.

> Thrift server to match the new java api.
> 
>
> Key: HBASE-1744
> URL: https://issues.apache.org/jira/browse/HBASE-1744
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Tim Sell
>Assignee: Lars Francke
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-1744.preview.1.patch, thriftexperiment.patch
>
>
> This mutateRows, etc.. is a little confusing compared to the new cleaner java 
> client.
> Thinking of ways to make a thrift client that is just as elegant. something 
> like:
> void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
> with:
> struct TColumn {
>   1:Bytes family,
>   2:Bytes qualifier,
>   3:i64 timestamp
> }
> struct TPut {
>   1:Bytes row,
>   2:map values
> }
> This creates more verbose rpc  than if the columns in TPut were just 
> map>, but that is harder to fit timestamps into and 
> still be intuitive from say python.
> Presumably the goal of a thrift gateway is to be easy first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3115) HBaseClient wastes 1 TCP packet per RPC

2010-10-18 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-3115:
---

Attachment: hbase-3115.txt

Maybe something like this?

> HBaseClient wastes 1 TCP packet per RPC
> ---
>
> Key: HBASE-3115
> URL: https://issues.apache.org/jira/browse/HBASE-3115
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: Todd Lipcon
>Priority: Minor
> Attachments: hbase-3115.txt
>
>
> In {{ipc/HBaseClient.java}}, the method {{sendParam}} does:
> {code}
> out.writeInt(dataLength);  //first put the data length
> out.write(data, 0, dataLength);//write the data
> {code}
> While analyzing some tcpdump traces tonight, I saw that this consistently 
> translates to 1 TCP packet with a 4 byte payload followed by another TCP 
> packet with the RPC itself.  This makes inefficient use of network resources 
> and adversely affects TCP throughput.  I believe each of those lines 
> translates to a {{write}} system call on the socket's file descriptor 
> (unnecessary system calls are also bad for performance).  The code attempts 
> to call {{out.flush();}} but this approach is ineffective on sockets in Java 
> (as far as I empirically noticed over the past few months).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-3115) HBaseClient wastes 1 TCP packet per RPC

2010-10-18 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HBASE-3115:
--

Assignee: Todd Lipcon

> HBaseClient wastes 1 TCP packet per RPC
> ---
>
> Key: HBASE-3115
> URL: https://issues.apache.org/jira/browse/HBASE-3115
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.20.5, 0.20.6, 
> 0.89.20100621
>Reporter: Benoit Sigoure
>Assignee: Todd Lipcon
>Priority: Minor
>
> In {{ipc/HBaseClient.java}}, the method {{sendParam}} does:
> {code}
> out.writeInt(dataLength);  //first put the data length
> out.write(data, 0, dataLength);//write the data
> {code}
> While analyzing some tcpdump traces tonight, I saw that this consistently 
> translates to 1 TCP packet with a 4 byte payload followed by another TCP 
> packet with the RPC itself.  This makes inefficient use of network resources 
> and adversely affects TCP throughput.  I believe each of those lines 
> translates to a {{write}} system call on the socket's file descriptor 
> (unnecessary system calls are also bad for performance).  The code attempts 
> to call {{out.flush();}} but this approach is ineffective on sockets in Java 
> (as far as I empirically noticed over the past few months).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3103) investigate/improve compaction performance

2010-10-18 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922109#action_12922109
 ] 

Kannan Muthukkaruppan commented on HBASE-3103:
--

Todd: The test case is not very stand-alone at the moment. I haven't had a 
chance to work on this last week (and may not be able to this week too), but 
when I get a chance, I'll work on make the test standalone.

> investigate/improve compaction performance
> --
>
> Key: HBASE-3103
> URL: https://issues.apache.org/jira/browse/HBASE-3103
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kannan Muthukkaruppan
> Attachments: profiler_data.jpg
>
>
> I was running some tests and am seeing that major compacting about 100M of 
> data seems to take around 40-50 seconds. 
> My simplified test case is something like:
> * Created about a 100M store file (800M uncompressed).
> * 10k keys with 1k columns each (avg. key size: 30 bytes; avg. value size: 45 
> bytes) 
> * Compression and ROWCOL bloom was turned on.
> The test was to major compact this single store file into a new file.
> Added some nanoTime() calls around these three stages:
> * Scanner.next operations
> * bloom computation logic in: StoreFile:append()
> * StoreFile.Writer.append()
> This is what I saw for these three stages:
> {code}
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction scanTime (ns) 4338103000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction bloom only time (ns) 14433821000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction append time (ns) 23191478000
> {code}
> The HFile.getReadTime() and HFile.getWriteTime() themselves seems pretty low 
> (under 1 second levels). These are the times for the parts that interact with 
> the DFS (readBlock() and finishBlock() mostly).
> Are these numbers roughly in line with what others are seeing normally? 
> Will double check my instrumentations, and try to get more data. Might try to 
> run it under a profiler. But wanted to put it out there for additional 
> input/ideas on improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.