date:20110330

completebulkload does not use HBase configuration
-

 Key: HBASE-3714
 URL: https://issues.apache.org/jira/browse/HBASE-3714
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.1, 0.90.0, 0.90.2, 0.90.3
Reporter: Nichole Treadway
 Attachments: HBASE-3714.txt

The completebulkupload tool should be using the HBaseConfiguration.create() 
method to get the HBase configuration in 0.90.*. In it's present state, you 
receive a connection error when running this tool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3714) completebulkload does not use HBase configuration


 [ 
https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nichole Treadway updated HBASE-3714:


Attachment: HBASE-3714.txt

 completebulkload does not use HBase configuration
 -

 Key: HBASE-3714
 URL: https://issues.apache.org/jira/browse/HBASE-3714
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3
Reporter: Nichole Treadway
 Attachments: HBASE-3714.txt


 The completebulkupload tool should be using the HBaseConfiguration.create() 
 method to get the HBase configuration in 0.90.*. In it's present state, you 
 receive a connection error when running this tool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3714) completebulkload does not use HBase configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013000#comment-13013000
 ] 

Ted Yu commented on HBASE-3714:
---

I was looking at LoadIncrementalHFiles yesterday.
Can you make similar changes to LoadIncrementalHFiles ?

I think these two classes should accept an optional parameter for zookeeper 
quorum for more flexibility.

 completebulkload does not use HBase configuration
 -

 Key: HBASE-3714
 URL: https://issues.apache.org/jira/browse/HBASE-3714
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3
Reporter: Nichole Treadway
Priority: Minor
 Attachments: HBASE-3714.txt


 The completebulkupload tool should be using the HBaseConfiguration.create() 
 method to get the HBase configuration in 0.90.*. In it's present state, you 
 receive a connection error when running this tool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3714) completebulkload does not use HBase configuration


 [ 
https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nichole Treadway updated HBASE-3714:


Priority: Minor  (was: Major)

 completebulkload does not use HBase configuration
 -

 Key: HBASE-3714
 URL: https://issues.apache.org/jira/browse/HBASE-3714
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3
Reporter: Nichole Treadway
Priority: Minor
 Attachments: HBASE-3714.txt


 The completebulkupload tool should be using the HBaseConfiguration.create() 
 method to get the HBase configuration in 0.90.*. In it's present state, you 
 receive a connection error when running this tool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3714) completebulkload does not use HBase configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013007#comment-13013007
 ] 

Nichole Treadway commented on HBASE-3714:
-

Ted, LoadIncrementalHFiles and which other class did you mean?

 completebulkload does not use HBase configuration
 -

 Key: HBASE-3714
 URL: https://issues.apache.org/jira/browse/HBASE-3714
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3
Reporter: Nichole Treadway
Priority: Minor
 Attachments: HBASE-3714.txt


 The completebulkupload tool should be using the HBaseConfiguration.create() 
 method to get the HBase configuration in 0.90.*. In it's present state, you 
 receive a connection error when running this tool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3714) completebulkload does not use HBase configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013523#comment-13013523
 ] 

Ted Yu commented on HBASE-3714:
---

Pardon me for incomplete message from the breakfast table :-)

I was looking at references to this method in HTable:
{code}
  public HTable(final String tableName)
{code}
There was only one, by LoadIncrementalHFiles.

With the patch in this JIRA, we're able to make the above method package 
private.
I will send email to dev@ for further explanation.

 completebulkload does not use HBase configuration
 -

 Key: HBASE-3714
 URL: https://issues.apache.org/jira/browse/HBASE-3714
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3
Reporter: Nichole Treadway
Priority: Minor
 Attachments: HBASE-3714.txt


 The completebulkupload tool should be using the HBaseConfiguration.create() 
 method to get the HBase configuration in 0.90.*. In it's present state, you 
 receive a connection error when running this tool.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce


 [ 
https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3715:
-

Status: Patch Available  (was: Open)

 Book.xml - adding architecture section on client, adding section on spec-ex 
 under mapreduce
 ---

 Key: HBASE-3715
 URL: https://issues.apache.org/jira/browse/HBASE-3715
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book.xml.patch


 Small changes to book.xml
 * added small section under MapReduce saying that it's generally advisable to 
 turn off speculative execution when using HBase as a source
 * Adding 'client' section under architecture that is a simplified port of the 
 client section in the HBaseArchitecture wiki page. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce


 [ 
https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3715:
-

Attachment: book.xml.patch

 Book.xml - adding architecture section on client, adding section on spec-ex 
 under mapreduce
 ---

 Key: HBASE-3715
 URL: https://issues.apache.org/jira/browse/HBASE-3715
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book.xml.patch


 Small changes to book.xml
 * added small section under MapReduce saying that it's generally advisable to 
 turn off speculative execution when using HBase as a source
 * Adding 'client' section under architecture that is a simplified port of the 
 client section in the HBaseArchitecture wiki page. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3716) Intermittent TestRegionRebalancing failure

Intermittent TestRegionRebalancing failure
--

 Key: HBASE-3716
 URL: https://issues.apache.org/jira/browse/HBASE-3716
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Ted Yu
Assignee: Ted Yu


See HBase-TRUNK build #1820

This could be due to HBASE-3681

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HBASE-3716) Intermittent TestRegionRebalancing failure


 [ 
https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-3716 started by Ted Yu.

 Intermittent TestRegionRebalancing failure
 --

 Key: HBASE-3716
 URL: https://issues.apache.org/jira/browse/HBASE-3716
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Ted Yu
Assignee: Ted Yu

 See HBase-TRUNK build #1820
 This could be due to HBASE-3681

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3716) Intermittent TestRegionRebalancing failure

2011-03-30 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013569#comment-13013569
 ] 

Jean-Daniel Cryans commented on HBASE-3716:
---

Yeah the test should check against the same slop that's configured.

 Intermittent TestRegionRebalancing failure
 --

 Key: HBASE-3716
 URL: https://issues.apache.org/jira/browse/HBASE-3716
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Ted Yu
Assignee: Ted Yu

 See HBase-TRUNK build #1820
 This could be due to HBASE-3681
 In trunk, default value of hbase.regions.slop is 20%. It is possible for 
 load balancer to see region distribution which falls within 20% of optimal 
 distribution.
 However, assertRegionsAreBalanced() uses 10% slop.
 One solution is to align the slop in assertRegionsAreBalanced() with 
 hbase.regions.slop value.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3716) Intermittent TestRegionRebalancing failure


[ 
https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013582#comment-13013582
 ] 

Ted Yu commented on HBASE-3716:
---

TestLoadBalancer passes too.

 Intermittent TestRegionRebalancing failure
 --

 Key: HBASE-3716
 URL: https://issues.apache.org/jira/browse/HBASE-3716
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 3716.txt


 See HBase-TRUNK build #1820
 This could be due to HBASE-3681
 In trunk, default value of hbase.regions.slop is 20%. It is possible for 
 load balancer to see region distribution which falls within 20% of optimal 
 distribution.
 However, assertRegionsAreBalanced() uses 10% slop.
 One solution is to align the slop in assertRegionsAreBalanced() with 
 hbase.regions.slop value.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce


 [ 
https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3715:
-

Status: Patch Available  (was: Open)

 Book.xml - adding architecture section on client, adding section on spec-ex 
 under mapreduce
 ---

 Key: HBASE-3715
 URL: https://issues.apache.org/jira/browse/HBASE-3715
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book.xml.patch, book.xml.patch


 Small changes to book.xml
 * added small section under MapReduce saying that it's generally advisable to 
 turn off speculative execution when using HBase as a source
 * Adding 'client' section under architecture that is a simplified port of the 
 client section in the HBaseArchitecture wiki page. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce

2011-03-30 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013621#comment-13013621
 ] 

Jean-Daniel Cryans commented on HBASE-3715:
---

Click on the right side of Attachments, there's a drop down where you can get 
to Manage Attachments.

 Book.xml - adding architecture section on client, adding section on spec-ex 
 under mapreduce
 ---

 Key: HBASE-3715
 URL: https://issues.apache.org/jira/browse/HBASE-3715
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book.xml.patch, book.xml.patch


 Small changes to book.xml
 * added small section under MapReduce saying that it's generally advisable to 
 turn off speculative execution when using HBase as a source
 * Adding 'client' section under architecture that is a simplified port of the 
 client section in the HBaseArchitecture wiki page. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce


[ 
https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013623#comment-13013623
 ] 

Doug Meil commented on HBASE-3715:
--

Thanks!

 Book.xml - adding architecture section on client, adding section on spec-ex 
 under mapreduce
 ---

 Key: HBASE-3715
 URL: https://issues.apache.org/jira/browse/HBASE-3715
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book.xml.patch


 Small changes to book.xml
 * added small section under MapReduce saying that it's generally advisable to 
 turn off speculative execution when using HBase as a source
 * Adding 'client' section under architecture that is a simplified port of the 
 client section in the HBaseArchitecture wiki page. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce


 [ 
https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3715:
-

Attachment: (was: book.xml.patch)

 Book.xml - adding architecture section on client, adding section on spec-ex 
 under mapreduce
 ---

 Key: HBASE-3715
 URL: https://issues.apache.org/jira/browse/HBASE-3715
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book.xml.patch


 Small changes to book.xml
 * added small section under MapReduce saying that it's generally advisable to 
 turn off speculative execution when using HBase as a source
 * Adding 'client' section under architecture that is a simplified port of the 
 client section in the HBaseArchitecture wiki page. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3071) Graceful decommissioning of a regionserver

[
https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-3071:
-

Attachment: 3701-v2.txt

Addressed J-D issues.

Changed names of scripts and how they run. now there is a graceful_stop.sh
script that manages running of the region_mover.rb script and subsequent remote
shutdown. graceful_stop.sh takes flags to restart the node subsequently and
then another reload flag which will put back the old region set on the
just-started node.

I played trying to add the load/unload region script to hbase-daemon.sh so we
could do stuff like ./bin/hbase-daemons.sh unload regionserver but that gets
messy in bash. I already had to add flag to bin/hbase to optionally not run
java with an exec.

Testing on cluster seems to basically work. Going to try with a cluster under
load next.

Graceful decommissioning of a regionserver
--

Key: HBASE-3071
URL: https://issues.apache.org/jira/browse/HBASE-3071
Project: HBase
Issue Type: Improvement
Reporter: stack
Attachments: 3071.txt, 3701-v2.txt

Currently if you stop a regionserver nicely, it'll put up its stopping flag
and then close all hosted regions. While the stopping flag is in place all
region requests are rejected. If this server was under load, closing could
take a while. Only after all is closed is the master informed and it'll
restart assigning (in old master, master woud get a report with list of all
regions closed, in new master the zk expired is triggered and we'll run
shutdown handler).
At least in new master, we have means of disabling balancer, and then moving
the regions off the server one by one via HBaseAdmin methods -- we shoud
write a script to do this at least for rolling restarts -- but we need
something better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3071) Graceful decommissioning of a regionserver

[
https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013689#comment-13013689
]

stack commented on HBASE-3071:
--

Other notes:

Currently I run it like this:

for i in `cat regionserver`; do ./bin/graceful.sh SERVERNAME; done

You should turn off the balancer before you do the above. The region_mover.rb
doesn't care in that if regions show up since it started, it'll just start in
on the new ones until its down to zero regions (though could be race in here if
balancer is running). Script doesn't turn it on/off because need to trap to
turn it back on again AND the current api for balancer is dumb; there is no way
to query current state... so this is manual step for now.

We can move off ~2 regions per second on unloaded cluster. Moving back on the
regions takes longer for some reason -- about 1 a second. This means a rolling
restart could take a while on a big loaded cluster. Could parallellize this
script but would need more work to make sure concurrent graceful_restarts all
read a common set of restarting servers.

Graceful decommissioning of a regionserver
--

Key: HBASE-3071
URL: https://issues.apache.org/jira/browse/HBASE-3071
Project: HBase
Issue Type: Improvement
Reporter: stack
Attachments: 3071.txt, 3701-v2.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3717) deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods

2011-03-30 Thread David Buttler (JIRA)

deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods


 Key: HBASE-3717
 URL: https://issues.apache.org/jira/browse/HBASE-3717
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.1
Reporter: David Buttler
Priority: Trivial


the static methods on HTable.isTableEnabled() can lead to unintended 
consequences if used naively without understanding potential side-effects.  
Suggest deprecating these methods and pointing at the HBaseAdmin methods to 
accomplish same task instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3107) Breakup HLogSplitTest unit tests.

2011-03-30 Thread Alex Newman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman reassigned HBASE-3107:
--

Assignee: (was: Alex Newman)

 Breakup HLogSplitTest unit tests.
 -

 Key: HBASE-3107
 URL: https://issues.apache.org/jira/browse/HBASE-3107
 Project: HBase
  Issue Type: Sub-task
Reporter: Alex Newman



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-03-30 Thread Alex Newman (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alex Newman reassigned HBASE-1364:
--

Assignee: (was: Alex Newman)

[performance] Distributed splitting of regionserver commit logs
---

Key: HBASE-1364
URL: https://issues.apache.org/jira/browse/HBASE-1364
Project: HBase
Issue Type: Improvement
Components: coprocessors
Reporter: stack
Priority: Critical
Fix For: 0.92.0

Attachments: HBASE-1364.patch

Time Spent: 8h
Remaining Estimate: 0h

HBASE-1008 has some improvements to our log splitting on regionserver crash;
but it needs to run even faster.
(Below is from HBASE-1008)
In bigtable paper, the split is distributed. If we're going to have 1000
logs, we need to distribute or at least multithread the splitting.
1. As is, regions starting up expect to find one reconstruction log only.
Need to make it so pick up a bunch of edit logs and it should be fine that
logs are elsewhere in hdfs in an output directory written by all split
participants whether multithreaded or a mapreduce-like distributed process
(Lets write our distributed sort first as a MR so we learn whats involved;
distributed sort, as much as possible should use MR framework pieces). On
startup, regions go to this directory and pick up the files written by split
participants deleting and clearing the dir when all have been read in. Making
it so can take multiple logs for input, can also make the split process more
robust rather than current tenuous process which loses all edits if it
doesn't make it to the end without error.
2. Each column family rereads the reconstruction log to find its edits. Need
to fix that. Split can sort the edits by column family so store only reads
its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3108) Add a method for creating persistent Sequential zk nodes.

2011-03-30 Thread Alex Newman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman reassigned HBASE-3108:
--

Assignee: (was: Alex Newman)

 Add a method for creating persistent Sequential zk nodes.
 -

 Key: HBASE-3108
 URL: https://issues.apache.org/jira/browse/HBASE-3108
 Project: HBase
  Issue Type: Sub-task
Reporter: Alex Newman



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions


[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013745#comment-13013745
 ] 

Ted Yu commented on HBASE-1512:
---

This feature is very useful.
Is it possible to pass some class to AggregateProtocolImpl which can interpret 
the type of value based on colFamily:colQualifier ?

I tried adding type parameter (for type of value) to AggregateCpProtocol but 
encountered various compilation errors.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3717) deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods

2011-03-30 Thread David Buttler (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Buttler updated HBASE-3717:
-

Attachment: deprecate_HTable_isTableEnabled.patch

 deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods
 

 Key: HBASE-3717
 URL: https://issues.apache.org/jira/browse/HBASE-3717
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.1
Reporter: David Buttler
Priority: Trivial
 Attachments: deprecate_HTable_isTableEnabled.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 the static methods on HTable.isTableEnabled() can lead to unintended 
 consequences if used naively without understanding potential side-effects.  
 Suggest deprecating these methods and pointing at the HBaseAdmin methods to 
 accomplish same task instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3717) deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods

2011-03-30 Thread David Buttler (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Buttler updated HBASE-3717:
-

Status: Patch Available  (was: Open)

 deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods
 

 Key: HBASE-3717
 URL: https://issues.apache.org/jira/browse/HBASE-3717
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.90.1
Reporter: David Buttler
Priority: Trivial
 Attachments: deprecate_HTable_isTableEnabled.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 the static methods on HTable.isTableEnabled() can lead to unintended 
 consequences if used naively without understanding potential side-effects.  
 Suggest deprecating these methods and pointing at the HBaseAdmin methods to 
 accomplish same task instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions


[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013820#comment-13013820
 ] 

Ted Yu commented on HBASE-1512:
---

I think AggregationClient should have a ctor which accepts Configuration and 
saves it.
Then Configuration can be used to point to a table in remote cluster:
{code}
HTable table = new HTable(conf, tableName);
{code}


 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss

2011-03-30 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013825#comment-13013825
]

Liyin Tang commented on HBASE-3065:
---

Most of retry are simple, except 2: create and setData.
I got some basic idea of retry 'create' from
http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling

But how to do the setData?
The problem is the 1st setData may success and got a connectionloss exception
after that.
Then it retries and got the badversion exception.
How to make know that this badversion is caused by the result of previous
correctly setData?

Retry all 'retryable' zk operations; e.g. connection loss
-

Key: HBASE-3065
URL: https://issues.apache.org/jira/browse/HBASE-3065
Project: HBase
Issue Type: Bug
Reporter: stack
Fix For: 0.92.0

The 'new' master refactored our zk code tidying up all zk accesses and
coralling them behind nice zk utility classes. One improvement was letting
out all KeeperExceptions letting the client deal. Thats good generally
because in old days, we'd suppress important state zk changes in state. But
there is at least one case the new zk utility could handle for the
application and thats the class of retryable KeeperExceptions. The one that
comes to mind is conection loss. On connection loss we should retry the
just-failed operation. Usually the retry will just work. At worse, on
reconnect, we'll pick up the expired session event.
Adding in this change shouldn't be too bad given the refactor of zk corralled
all zk access into one or two classes only.
One thing to consider though is how much we should retry. We could retry on
a timer or we could retry for ever as long as the Stoppable interface is
passed so if another thread has stopped or aborted the hosting service, we'll
notice and give up trying. Doing the latter is probably better than some
kinda timeout.
HBASE-3062 adds a timed retry on the first zk operation. This issue is about
generalizing what is over there across all zk access.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss

2011-03-30 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang reassigned HBASE-3065:
-

Assignee: Liyin Tang

Retry all 'retryable' zk operations; e.g. connection loss
-

Key: HBASE-3065
URL: https://issues.apache.org/jira/browse/HBASE-3065
Project: HBase
Issue Type: Bug
Reporter: stack
Assignee: Liyin Tang
Fix For: 0.92.0

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3718) Improve 'get' performance when row resides in memstore

2011-03-30 Thread dhruba borthakur (JIRA)

Improve 'get' performance when row resides in memstore
--

 Key: HBASE-3718
 URL: https://issues.apache.org/jira/browse/HBASE-3718
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur


The regionserver uses a ConcurrentSkipList to store the KVs in the memstore. 
Although the order complexity of a lookup is O(n), still the latency to lookup 
a specific key in the memstore is very large, especially when the memstore is 
large and the KV.compare() method is costly.

One optimization is to investigate using a ConcurrentHashMap (instead of 
ConcurrentSkipList). The lookup and insertion cost is minimized. We can do it 
only for column-families that are marked as do not support rangescans. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions


[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013830#comment-13013830
 ] 

Ted Yu commented on HBASE-1512:
---

A 4 byte value can represent float. 8 byte value can represent double.

As for the return type, Long, I tried to make AggregateCpProtocol generic but 
wasn't successful.
e.g. AggregateCpProtocolLong.class wouldn't compile. Since 
AggregateCpProtocol is interface, I cannot instantiate and obtain class 
afterward.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3719) Workload has to drain before hlog can be rolled

2011-03-30 Thread dhruba borthakur (JIRA)

Workload has to drain before hlog can be rolled
---

 Key: HBASE-3719
 URL: https://issues.apache.org/jira/browse/HBASE-3719
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur


In the current implementation, the regionserver blocks new transactions from 
occuring when the HLog is rolled. Closing the existing HLog sometimes takes 
more than a few seconds and during this time all new puts/increments are 
blocked. It will be nice if we can continue to write new transactions to the 
new HLog (but maybe not commit those transactions) while the old HLog is being 
closed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions


 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1512:
--

Attachment: AggregationClient.java

AggregationClient with ctor accepting Configuration.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregationClient.java, patch-1512-2.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3071) Graceful decommissioning of a regionserver


 [ 
https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3071:
-

Assignee: stack
  Status: Patch Available  (was: Open)

Looking for a bit of a review.  Was figuring this could go into branch and 
trunk.  It doesn't change any server code, not yet anyways, just scripts.

 Graceful decommissioning of a regionserver
 --

 Key: HBASE-3071
 URL: https://issues.apache.org/jira/browse/HBASE-3071
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Attachments: 3071.txt, 3701-v2.txt, 3701-v3.txt


 Currently if you stop a regionserver nicely, it'll put up its stopping flag 
 and then close all hosted regions.  While the stopping flag is in place all 
 region requests are rejected.  If this server was under load, closing could 
 take a while.  Only after all is closed is the master informed and it'll 
 restart assigning (in old master, master woud get a report with list of all 
 regions closed, in new master the zk expired is triggered and we'll run 
 shutdown handler).
 At least in new master, we have means of disabling balancer, and then moving 
 the regions off the server one by one via HBaseAdmin methods -- we shoud 
 write a script to do this at least for rolling restarts -- but we need 
 something better.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3071) Graceful decommissioning of a regionserver