[jira] [Created] (HBASE-3714) completebulkload does not use HBase configuration
completebulkload does not use HBase configuration - Key: HBASE-3714 URL: https://issues.apache.org/jira/browse/HBASE-3714 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.1, 0.90.0, 0.90.2, 0.90.3 Reporter: Nichole Treadway Attachments: HBASE-3714.txt The completebulkupload tool should be using the HBaseConfiguration.create() method to get the HBase configuration in 0.90.*. In it's present state, you receive a connection error when running this tool. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3714) completebulkload does not use HBase configuration
[ https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nichole Treadway updated HBASE-3714: Attachment: HBASE-3714.txt completebulkload does not use HBase configuration - Key: HBASE-3714 URL: https://issues.apache.org/jira/browse/HBASE-3714 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3 Reporter: Nichole Treadway Attachments: HBASE-3714.txt The completebulkupload tool should be using the HBaseConfiguration.create() method to get the HBase configuration in 0.90.*. In it's present state, you receive a connection error when running this tool. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3714) completebulkload does not use HBase configuration
[ https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013000#comment-13013000 ] Ted Yu commented on HBASE-3714: --- I was looking at LoadIncrementalHFiles yesterday. Can you make similar changes to LoadIncrementalHFiles ? I think these two classes should accept an optional parameter for zookeeper quorum for more flexibility. completebulkload does not use HBase configuration - Key: HBASE-3714 URL: https://issues.apache.org/jira/browse/HBASE-3714 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3 Reporter: Nichole Treadway Priority: Minor Attachments: HBASE-3714.txt The completebulkupload tool should be using the HBaseConfiguration.create() method to get the HBase configuration in 0.90.*. In it's present state, you receive a connection error when running this tool. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3714) completebulkload does not use HBase configuration
[ https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nichole Treadway updated HBASE-3714: Priority: Minor (was: Major) completebulkload does not use HBase configuration - Key: HBASE-3714 URL: https://issues.apache.org/jira/browse/HBASE-3714 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3 Reporter: Nichole Treadway Priority: Minor Attachments: HBASE-3714.txt The completebulkupload tool should be using the HBaseConfiguration.create() method to get the HBase configuration in 0.90.*. In it's present state, you receive a connection error when running this tool. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3714) completebulkload does not use HBase configuration
[ https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013007#comment-13013007 ] Nichole Treadway commented on HBASE-3714: - Ted, LoadIncrementalHFiles and which other class did you mean? completebulkload does not use HBase configuration - Key: HBASE-3714 URL: https://issues.apache.org/jira/browse/HBASE-3714 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3 Reporter: Nichole Treadway Priority: Minor Attachments: HBASE-3714.txt The completebulkupload tool should be using the HBaseConfiguration.create() method to get the HBase configuration in 0.90.*. In it's present state, you receive a connection error when running this tool. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3714) completebulkload does not use HBase configuration
[ https://issues.apache.org/jira/browse/HBASE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013523#comment-13013523 ] Ted Yu commented on HBASE-3714: --- Pardon me for incomplete message from the breakfast table :-) I was looking at references to this method in HTable: {code} public HTable(final String tableName) {code} There was only one, by LoadIncrementalHFiles. With the patch in this JIRA, we're able to make the above method package private. I will send email to dev@ for further explanation. completebulkload does not use HBase configuration - Key: HBASE-3714 URL: https://issues.apache.org/jira/browse/HBASE-3714 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.0, 0.90.1, 0.90.2, 0.90.3 Reporter: Nichole Treadway Priority: Minor Attachments: HBASE-3714.txt The completebulkupload tool should be using the HBaseConfiguration.create() method to get the HBase configuration in 0.90.*. In it's present state, you receive a connection error when running this tool. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce
[ https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3715: - Status: Patch Available (was: Open) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce --- Key: HBASE-3715 URL: https://issues.apache.org/jira/browse/HBASE-3715 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book.xml.patch Small changes to book.xml * added small section under MapReduce saying that it's generally advisable to turn off speculative execution when using HBase as a source * Adding 'client' section under architecture that is a simplified port of the client section in the HBaseArchitecture wiki page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce
[ https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3715: - Attachment: book.xml.patch Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce --- Key: HBASE-3715 URL: https://issues.apache.org/jira/browse/HBASE-3715 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book.xml.patch Small changes to book.xml * added small section under MapReduce saying that it's generally advisable to turn off speculative execution when using HBase as a source * Adding 'client' section under architecture that is a simplified port of the client section in the HBaseArchitecture wiki page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3716) Intermittent TestRegionRebalancing failure
Intermittent TestRegionRebalancing failure -- Key: HBASE-3716 URL: https://issues.apache.org/jira/browse/HBASE-3716 Project: HBase Issue Type: Bug Components: master Reporter: Ted Yu Assignee: Ted Yu See HBase-TRUNK build #1820 This could be due to HBASE-3681 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HBASE-3716) Intermittent TestRegionRebalancing failure
[ https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-3716 started by Ted Yu. Intermittent TestRegionRebalancing failure -- Key: HBASE-3716 URL: https://issues.apache.org/jira/browse/HBASE-3716 Project: HBase Issue Type: Bug Components: master Reporter: Ted Yu Assignee: Ted Yu See HBase-TRUNK build #1820 This could be due to HBASE-3681 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3716) Intermittent TestRegionRebalancing failure
[ https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013569#comment-13013569 ] Jean-Daniel Cryans commented on HBASE-3716: --- Yeah the test should check against the same slop that's configured. Intermittent TestRegionRebalancing failure -- Key: HBASE-3716 URL: https://issues.apache.org/jira/browse/HBASE-3716 Project: HBase Issue Type: Bug Components: master Reporter: Ted Yu Assignee: Ted Yu See HBase-TRUNK build #1820 This could be due to HBASE-3681 In trunk, default value of hbase.regions.slop is 20%. It is possible for load balancer to see region distribution which falls within 20% of optimal distribution. However, assertRegionsAreBalanced() uses 10% slop. One solution is to align the slop in assertRegionsAreBalanced() with hbase.regions.slop value. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3716) Intermittent TestRegionRebalancing failure
[ https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013582#comment-13013582 ] Ted Yu commented on HBASE-3716: --- TestLoadBalancer passes too. Intermittent TestRegionRebalancing failure -- Key: HBASE-3716 URL: https://issues.apache.org/jira/browse/HBASE-3716 Project: HBase Issue Type: Bug Components: master Reporter: Ted Yu Assignee: Ted Yu Attachments: 3716.txt See HBase-TRUNK build #1820 This could be due to HBASE-3681 In trunk, default value of hbase.regions.slop is 20%. It is possible for load balancer to see region distribution which falls within 20% of optimal distribution. However, assertRegionsAreBalanced() uses 10% slop. One solution is to align the slop in assertRegionsAreBalanced() with hbase.regions.slop value. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce
[ https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3715: - Status: Patch Available (was: Open) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce --- Key: HBASE-3715 URL: https://issues.apache.org/jira/browse/HBASE-3715 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book.xml.patch, book.xml.patch Small changes to book.xml * added small section under MapReduce saying that it's generally advisable to turn off speculative execution when using HBase as a source * Adding 'client' section under architecture that is a simplified port of the client section in the HBaseArchitecture wiki page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce
[ https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013621#comment-13013621 ] Jean-Daniel Cryans commented on HBASE-3715: --- Click on the right side of Attachments, there's a drop down where you can get to Manage Attachments. Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce --- Key: HBASE-3715 URL: https://issues.apache.org/jira/browse/HBASE-3715 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book.xml.patch, book.xml.patch Small changes to book.xml * added small section under MapReduce saying that it's generally advisable to turn off speculative execution when using HBase as a source * Adding 'client' section under architecture that is a simplified port of the client section in the HBaseArchitecture wiki page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce
[ https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013623#comment-13013623 ] Doug Meil commented on HBASE-3715: -- Thanks! Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce --- Key: HBASE-3715 URL: https://issues.apache.org/jira/browse/HBASE-3715 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book.xml.patch Small changes to book.xml * added small section under MapReduce saying that it's generally advisable to turn off speculative execution when using HBase as a source * Adding 'client' section under architecture that is a simplified port of the client section in the HBaseArchitecture wiki page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3715) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce
[ https://issues.apache.org/jira/browse/HBASE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3715: - Attachment: (was: book.xml.patch) Book.xml - adding architecture section on client, adding section on spec-ex under mapreduce --- Key: HBASE-3715 URL: https://issues.apache.org/jira/browse/HBASE-3715 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book.xml.patch Small changes to book.xml * added small section under MapReduce saying that it's generally advisable to turn off speculative execution when using HBase as a source * Adding 'client' section under architecture that is a simplified port of the client section in the HBaseArchitecture wiki page. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3071) Graceful decommissioning of a regionserver
[ https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3071: - Attachment: 3701-v2.txt Addressed J-D issues. Changed names of scripts and how they run. now there is a graceful_stop.sh script that manages running of the region_mover.rb script and subsequent remote shutdown. graceful_stop.sh takes flags to restart the node subsequently and then another reload flag which will put back the old region set on the just-started node. I played trying to add the load/unload region script to hbase-daemon.sh so we could do stuff like ./bin/hbase-daemons.sh unload regionserver but that gets messy in bash. I already had to add flag to bin/hbase to optionally not run java with an exec. Testing on cluster seems to basically work. Going to try with a cluster under load next. Graceful decommissioning of a regionserver -- Key: HBASE-3071 URL: https://issues.apache.org/jira/browse/HBASE-3071 Project: HBase Issue Type: Improvement Reporter: stack Attachments: 3071.txt, 3701-v2.txt Currently if you stop a regionserver nicely, it'll put up its stopping flag and then close all hosted regions. While the stopping flag is in place all region requests are rejected. If this server was under load, closing could take a while. Only after all is closed is the master informed and it'll restart assigning (in old master, master woud get a report with list of all regions closed, in new master the zk expired is triggered and we'll run shutdown handler). At least in new master, we have means of disabling balancer, and then moving the regions off the server one by one via HBaseAdmin methods -- we shoud write a script to do this at least for rolling restarts -- but we need something better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3071) Graceful decommissioning of a regionserver
[ https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013689#comment-13013689 ] stack commented on HBASE-3071: -- Other notes: Currently I run it like this: for i in `cat regionserver`; do ./bin/graceful.sh SERVERNAME; done You should turn off the balancer before you do the above. The region_mover.rb doesn't care in that if regions show up since it started, it'll just start in on the new ones until its down to zero regions (though could be race in here if balancer is running). Script doesn't turn it on/off because need to trap to turn it back on again AND the current api for balancer is dumb; there is no way to query current state... so this is manual step for now. We can move off ~2 regions per second on unloaded cluster. Moving back on the regions takes longer for some reason -- about 1 a second. This means a rolling restart could take a while on a big loaded cluster. Could parallellize this script but would need more work to make sure concurrent graceful_restarts all read a common set of restarting servers. Graceful decommissioning of a regionserver -- Key: HBASE-3071 URL: https://issues.apache.org/jira/browse/HBASE-3071 Project: HBase Issue Type: Improvement Reporter: stack Attachments: 3071.txt, 3701-v2.txt Currently if you stop a regionserver nicely, it'll put up its stopping flag and then close all hosted regions. While the stopping flag is in place all region requests are rejected. If this server was under load, closing could take a while. Only after all is closed is the master informed and it'll restart assigning (in old master, master woud get a report with list of all regions closed, in new master the zk expired is triggered and we'll run shutdown handler). At least in new master, we have means of disabling balancer, and then moving the regions off the server one by one via HBaseAdmin methods -- we shoud write a script to do this at least for rolling restarts -- but we need something better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3717) deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods
deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods Key: HBASE-3717 URL: https://issues.apache.org/jira/browse/HBASE-3717 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.1 Reporter: David Buttler Priority: Trivial the static methods on HTable.isTableEnabled() can lead to unintended consequences if used naively without understanding potential side-effects. Suggest deprecating these methods and pointing at the HBaseAdmin methods to accomplish same task instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-3107) Breakup HLogSplitTest unit tests.
[ https://issues.apache.org/jira/browse/HBASE-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman reassigned HBASE-3107: -- Assignee: (was: Alex Newman) Breakup HLogSplitTest unit tests. - Key: HBASE-3107 URL: https://issues.apache.org/jira/browse/HBASE-3107 Project: HBase Issue Type: Sub-task Reporter: Alex Newman -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs
[ https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman reassigned HBASE-1364: -- Assignee: (was: Alex Newman) [performance] Distributed splitting of regionserver commit logs --- Key: HBASE-1364 URL: https://issues.apache.org/jira/browse/HBASE-1364 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: stack Priority: Critical Fix For: 0.92.0 Attachments: HBASE-1364.patch Time Spent: 8h Remaining Estimate: 0h HBASE-1008 has some improvements to our log splitting on regionserver crash; but it needs to run even faster. (Below is from HBASE-1008) In bigtable paper, the split is distributed. If we're going to have 1000 logs, we need to distribute or at least multithread the splitting. 1. As is, regions starting up expect to find one reconstruction log only. Need to make it so pick up a bunch of edit logs and it should be fine that logs are elsewhere in hdfs in an output directory written by all split participants whether multithreaded or a mapreduce-like distributed process (Lets write our distributed sort first as a MR so we learn whats involved; distributed sort, as much as possible should use MR framework pieces). On startup, regions go to this directory and pick up the files written by split participants deleting and clearing the dir when all have been read in. Making it so can take multiple logs for input, can also make the split process more robust rather than current tenuous process which loses all edits if it doesn't make it to the end without error. 2. Each column family rereads the reconstruction log to find its edits. Need to fix that. Split can sort the edits by column family so store only reads its edits. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-3108) Add a method for creating persistent Sequential zk nodes.
[ https://issues.apache.org/jira/browse/HBASE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman reassigned HBASE-3108: -- Assignee: (was: Alex Newman) Add a method for creating persistent Sequential zk nodes. - Key: HBASE-3108 URL: https://issues.apache.org/jira/browse/HBASE-3108 Project: HBase Issue Type: Sub-task Reporter: Alex Newman -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013745#comment-13013745 ] Ted Yu commented on HBASE-1512: --- This feature is very useful. Is it possible to pass some class to AggregateProtocolImpl which can interpret the type of value based on colFamily:colQualifier ? I tried adding type parameter (for type of value) to AggregateCpProtocol but encountered various compilation errors. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3717) deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods
[ https://issues.apache.org/jira/browse/HBASE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Buttler updated HBASE-3717: - Attachment: deprecate_HTable_isTableEnabled.patch deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods Key: HBASE-3717 URL: https://issues.apache.org/jira/browse/HBASE-3717 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.1 Reporter: David Buttler Priority: Trivial Attachments: deprecate_HTable_isTableEnabled.patch Original Estimate: 1h Remaining Estimate: 1h the static methods on HTable.isTableEnabled() can lead to unintended consequences if used naively without understanding potential side-effects. Suggest deprecating these methods and pointing at the HBaseAdmin methods to accomplish same task instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3717) deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods
[ https://issues.apache.org/jira/browse/HBASE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Buttler updated HBASE-3717: - Status: Patch Available (was: Open) deprecate HTable isTableEnabled() methods in favor of HBaseAdmin methods Key: HBASE-3717 URL: https://issues.apache.org/jira/browse/HBASE-3717 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.1 Reporter: David Buttler Priority: Trivial Attachments: deprecate_HTable_isTableEnabled.patch Original Estimate: 1h Remaining Estimate: 1h the static methods on HTable.isTableEnabled() can lead to unintended consequences if used naively without understanding potential side-effects. Suggest deprecating these methods and pointing at the HBaseAdmin methods to accomplish same task instead. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013820#comment-13013820 ] Ted Yu commented on HBASE-1512: --- I think AggregationClient should have a ctor which accepts Configuration and saves it. Then Configuration can be used to point to a table in remote cluster: {code} HTable table = new HTable(conf, tableName); {code} Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss
[ https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013825#comment-13013825 ] Liyin Tang commented on HBASE-3065: --- Most of retry are simple, except 2: create and setData. I got some basic idea of retry 'create' from http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling But how to do the setData? The problem is the 1st setData may success and got a connectionloss exception after that. Then it retries and got the badversion exception. How to make know that this badversion is caused by the result of previous correctly setData? Retry all 'retryable' zk operations; e.g. connection loss - Key: HBASE-3065 URL: https://issues.apache.org/jira/browse/HBASE-3065 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.92.0 The 'new' master refactored our zk code tidying up all zk accesses and coralling them behind nice zk utility classes. One improvement was letting out all KeeperExceptions letting the client deal. Thats good generally because in old days, we'd suppress important state zk changes in state. But there is at least one case the new zk utility could handle for the application and thats the class of retryable KeeperExceptions. The one that comes to mind is conection loss. On connection loss we should retry the just-failed operation. Usually the retry will just work. At worse, on reconnect, we'll pick up the expired session event. Adding in this change shouldn't be too bad given the refactor of zk corralled all zk access into one or two classes only. One thing to consider though is how much we should retry. We could retry on a timer or we could retry for ever as long as the Stoppable interface is passed so if another thread has stopped or aborted the hosting service, we'll notice and give up trying. Doing the latter is probably better than some kinda timeout. HBASE-3062 adds a timed retry on the first zk operation. This issue is about generalizing what is over there across all zk access. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss
[ https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang reassigned HBASE-3065: - Assignee: Liyin Tang Retry all 'retryable' zk operations; e.g. connection loss - Key: HBASE-3065 URL: https://issues.apache.org/jira/browse/HBASE-3065 Project: HBase Issue Type: Bug Reporter: stack Assignee: Liyin Tang Fix For: 0.92.0 The 'new' master refactored our zk code tidying up all zk accesses and coralling them behind nice zk utility classes. One improvement was letting out all KeeperExceptions letting the client deal. Thats good generally because in old days, we'd suppress important state zk changes in state. But there is at least one case the new zk utility could handle for the application and thats the class of retryable KeeperExceptions. The one that comes to mind is conection loss. On connection loss we should retry the just-failed operation. Usually the retry will just work. At worse, on reconnect, we'll pick up the expired session event. Adding in this change shouldn't be too bad given the refactor of zk corralled all zk access into one or two classes only. One thing to consider though is how much we should retry. We could retry on a timer or we could retry for ever as long as the Stoppable interface is passed so if another thread has stopped or aborted the hosting service, we'll notice and give up trying. Doing the latter is probably better than some kinda timeout. HBASE-3062 adds a timed retry on the first zk operation. This issue is about generalizing what is over there across all zk access. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3718) Improve 'get' performance when row resides in memstore
Improve 'get' performance when row resides in memstore -- Key: HBASE-3718 URL: https://issues.apache.org/jira/browse/HBASE-3718 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur The regionserver uses a ConcurrentSkipList to store the KVs in the memstore. Although the order complexity of a lookup is O(n), still the latency to lookup a specific key in the memstore is very large, especially when the memstore is large and the KV.compare() method is costly. One optimization is to investigate using a ConcurrentHashMap (instead of ConcurrentSkipList). The lookup and insertion cost is minimized. We can do it only for column-families that are marked as do not support rangescans. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013830#comment-13013830 ] Ted Yu commented on HBASE-1512: --- A 4 byte value can represent float. 8 byte value can represent double. As for the return type, Long, I tried to make AggregateCpProtocol generic but wasn't successful. e.g. AggregateCpProtocolLong.class wouldn't compile. Since AggregateCpProtocol is interface, I cannot instantiate and obtain class afterward. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3719) Workload has to drain before hlog can be rolled
Workload has to drain before hlog can be rolled --- Key: HBASE-3719 URL: https://issues.apache.org/jira/browse/HBASE-3719 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur In the current implementation, the regionserver blocks new transactions from occuring when the HLog is rolled. Closing the existing HLog sometimes takes more than a few seconds and during this time all new puts/increments are blocked. It will be nice if we can continue to write new transactions to the new HLog (but maybe not commit those transactions) while the old HLog is being closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-1512: -- Attachment: AggregationClient.java AggregationClient with ctor accepting Configuration. Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Attachments: 1512.zip, AggregationClient.java, patch-1512-2.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3071) Graceful decommissioning of a regionserver
[ https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3071: - Assignee: stack Status: Patch Available (was: Open) Looking for a bit of a review. Was figuring this could go into branch and trunk. It doesn't change any server code, not yet anyways, just scripts. Graceful decommissioning of a regionserver -- Key: HBASE-3071 URL: https://issues.apache.org/jira/browse/HBASE-3071 Project: HBase Issue Type: Improvement Reporter: stack Assignee: stack Attachments: 3071.txt, 3701-v2.txt, 3701-v3.txt Currently if you stop a regionserver nicely, it'll put up its stopping flag and then close all hosted regions. While the stopping flag is in place all region requests are rejected. If this server was under load, closing could take a while. Only after all is closed is the master informed and it'll restart assigning (in old master, master woud get a report with list of all regions closed, in new master the zk expired is triggered and we'll run shutdown handler). At least in new master, we have means of disabling balancer, and then moving the regions off the server one by one via HBaseAdmin methods -- we shoud write a script to do this at least for rolling restarts -- but we need something better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3071) Graceful decommissioning of a regionserver
[ https://issues.apache.org/jira/browse/HBASE-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3071: - Attachment: 3701-v3.txt Added more retries of move (saw case where server wasn't fully up yet so move to that server failed and on retry, we put it back in old location.. which looked like it'd never moved). Tested under load. All just runs slower. But loading keeps going. More notes. So, we unload a region at a time till zero. Then we do clean shutdown. This triggers over in master the recover of an expired server only there are no logs to process because it had a clean shutdown, so the server comes up pretty fast and is then ready to take on regions again. If balancer runs, it doesn't cause 'failure' but what it does is that while unloading, it can add some new regions to a server. We'll then move those off. Now, this RS will have the old burden plus the new associated with it. On replay the cluster will be out of kilter, off-balance... so its kinda important having the cluster basically balanced before running this script. If exception processing a server, we'll skip it. It can be hard to figure a failure if you don't keep logs and look at them after. The region mover logs each region move which can be a bunch and if server 50 of 100 failed, you could have a cluster that was 99% upgraded, only. A wrapper script that runs the rolling restart -- enable/disabling balancer, upping and downing masters (see current bin/rolling_restart.sh) -- we could check zk to see if a servers' servername changed (servername is hostname + port + startcode). If it hadn't, we could on the end flag it as failed rolling restart. I'm figuring wrapper script outside the scope of this issue. For now I do a manual enable/disable of balancer and then do a for loop calling bin/graceful_stop.sh... Graceful decommissioning of a regionserver -- Key: HBASE-3071 URL: https://issues.apache.org/jira/browse/HBASE-3071 Project: HBase Issue Type: Improvement Reporter: stack Attachments: 3071.txt, 3701-v2.txt, 3701-v3.txt Currently if you stop a regionserver nicely, it'll put up its stopping flag and then close all hosted regions. While the stopping flag is in place all region requests are rejected. If this server was under load, closing could take a while. Only after all is closed is the master informed and it'll restart assigning (in old master, master woud get a report with list of all regions closed, in new master the zk expired is triggered and we'll run shutdown handler). At least in new master, we have means of disabling balancer, and then moving the regions off the server one by one via HBaseAdmin methods -- we shoud write a script to do this at least for rolling restarts -- but we need something better. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira