RE: [hbase] Suggestions on hbase APIs.
-Original Message- From: Mafish Liu [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 12:23 AM To: hadoop-dev@lucene.apache.org Subject: [hbase] Suggestions on hbase APIs. Hi: I'm recently using hbase (included in hadoop 0.15.2 release)to manage spatial data. And found two flaws which I think can be improved. First, if you fetch the column names in a hbase table using Set Text columns = tableDes.families().keySet(); You can get a set of column names that ended by a colon, which I think should be gotten rid of. The name that ends with a colon is the name of the column family, and you can create multiple family members in an adhoc fashion. For example say you have a column named 'meta:' in which you store data about web pages. You can create multiple family members in the same row such as 'meta:mime-type', 'meta:crawl-date', 'meta:encoding', etc. Example: HTable table = new HTable(conf, tableName); long id = table.startUpdate(row); // enter data in column meta: table.put(id, new Text(meta:mime-type), data); table.put(id, new Text(meta:crawl-date), data); table.put(id, new Text(meta:encoding), data); // enter data in column contents: table.put(id, new Text(contents:), data); table.commit(id); Second, if you read all contains in a hbase table by HScannerInterface.next method, you will ge a TreeMapText, byte[] every time you call. Returning column names every time is a waste of memory and network bandwidth. And there should be an efficient way to do such work. Well, you can retrieve multiple columns with a scanner, so if the column name was not passed back, how would you determine which column goes with which data. Scanning the table in the example above: HScannerInterface scanner = table.obtainScanner( new Text[] {new Text(contents:), new Text(meta)}, new Text()); // empty start row = start at beginning now when you do scanner.next you need the map to find the value for contents: and the (multiple) values for meta:. The above two APIs are used in my program and also in Hbase shell program. I don't know if there are alternative APIs that have performed the improvements. Best regards. Mafish -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing. No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.8/1235 - Release Date: 1/21/2008 9:39 AM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.8/1235 - Release Date: 1/21/2008 9:39 AM
[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2668: -- Status: Open (was: Patch Available) It appears that hudson lost this patch when it went down. Resubmitting. [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: stack Assignee: Jim Kellerman Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch, migration.patch, patch.txt Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2668: -- Status: Patch Available (was: Open) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: stack Assignee: Jim Kellerman Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch, migration.patch, patch.txt Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2668: - Assignee: Jim Kellerman (was: stack) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: Jim Kellerman Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch, migration.patch Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560893#action_12560893 ] Jim Kellerman commented on HADOOP-2668: --- Ok, there is definitely some work to do here. I'll work on fixing Migrate. [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: Jim Kellerman Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch, migration.patch Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2668: -- Affects Version/s: 0.16.0 Status: Patch Available (was: Open) Works locally, try hudson. - Stack, please review patch. [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: stack Assignee: Jim Kellerman Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch, migration.patch, patch.txt Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2668: -- Attachment: patch.txt Lots more checking, clean up several bugs, new read-only mode, usage, etc. [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: stack Assignee: Jim Kellerman Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch, migration.patch, patch.txt Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2643) [hbase] Make migration tool smarter.
[ https://issues.apache.org/jira/browse/HADOOP-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2643: -- Resolution: Fixed Fix Version/s: 0.16.0 Status: Resolved (was: Patch Available) Committed. Ignoring one unrelated core test failure. [hbase] Make migration tool smarter. Key: HADOOP-2643 URL: https://issues.apache.org/jira/browse/HADOOP-2643 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt The migration tool that handles the changes to how hbase lays out files in the file system needs to be smarter. - don't try to migrate old region directories in which the region name is a part of the directory name. - add a version number to the file system -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560748#action_12560748 ] Jim Kellerman commented on HADOOP-2668: --- If you run the migrate tool as the exception suggested, it will write the version file and then the system will start. [hbase] After 2643, cluster won't start if FS was created by an older hbase version --- Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560753#action_12560753 ] Jim Kellerman commented on HADOOP-2668: --- It didn't occur to me that migration was the way to fix the missing version file. From HMaster.java(894, 5): {code} throw new IOException( file system not correct version. Run hbase.util.Migrate); {code} I also figured we should just auto-migrate this one case of a missing version file (If in future, version file goes missing, I'd think it the job of hbsfck recreating it, rather than migration?). Suppose you have a file system that has not been migrated? (i.e. regions are stored in /hbase/hregion_nnn) The master would start up write the version file and then proceed to recreate the root and meta regions because they aren't under /hbase/-ROOT- and /hbase/.META. respectively. Additionally the first thing the migrate tool does is look for the version file. If it finds it and the version number matches, it figures that the file system has been upgraded already and does nothing. But I'm fine w/ forcing users to run the migration. It needs to be better documented and added to the bin/hbase script with verb 'migrate' I'd say. Agreed. How about this changing this patch to update bin/hbase and add documentation (where ?)? I tried to run the migration but it wants to connect to a HMaster. That ain't going to work (Cluster won't start because no version file... can't migrate because cluster ain't up...). It tries to connect to the master to ensure it isn't running (uses HBaseAdmin.isMasterRunning()) We wouldn't want to do a upgrade with the cluster running. [hbase] After 2643, cluster won't start if FS was created by an older hbase version --- Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560753#action_12560753 ] jimk edited comment on HADOOP-2668 at 1/19/08 5:05 PM: It didn't occur to me that migration was the way to fix the missing version file. From HMaster.java(894, 5): {code} throw new IOException( file system not correct version. Run hbase.util.Migrate); {code} I also figured we should just auto-migrate this one case of a missing version file (If in future, version file goes missing, I'd think it the job of hbsfck recreating it, rather than migration?). Suppose you have a file system that has not been migrated? (i.e. regions are stored in =/hbase/hregion_nnn=) The master would start up write the version file and then proceed to recreate the root and meta regions because they aren't under =/hbase/-ROOT-= and =/hbase/.META.= respectively. Additionally the first thing the migrate tool does is look for the version file. If it finds it and the version number matches, it figures that the file system has been upgraded already and does nothing. But I'm fine w/ forcing users to run the migration. It needs to be better documented and added to the bin/hbase script with verb 'migrate' I'd say. Agreed. How about this changing this patch to update bin/hbase and add documentation (where ?)? I tried to run the migration but it wants to connect to a HMaster. That ain't going to work (Cluster won't start because no version file... can't migrate because cluster ain't up...). It tries to connect to the master to ensure it isn't running (uses HBaseAdmin.isMasterRunning()) We wouldn't want to do a upgrade with the cluster running. was (Author: jimk): It didn't occur to me that migration was the way to fix the missing version file. From HMaster.java(894, 5): {code} throw new IOException( file system not correct version. Run hbase.util.Migrate); {code} I also figured we should just auto-migrate this one case of a missing version file (If in future, version file goes missing, I'd think it the job of hbsfck recreating it, rather than migration?). Suppose you have a file system that has not been migrated? (i.e. regions are stored in /hbase/hregion_nnn) The master would start up write the version file and then proceed to recreate the root and meta regions because they aren't under /hbase/-ROOT- and /hbase/.META. respectively. Additionally the first thing the migrate tool does is look for the version file. If it finds it and the version number matches, it figures that the file system has been upgraded already and does nothing. But I'm fine w/ forcing users to run the migration. It needs to be better documented and added to the bin/hbase script with verb 'migrate' I'd say. Agreed. How about this changing this patch to update bin/hbase and add documentation (where ?)? I tried to run the migration but it wants to connect to a HMaster. That ain't going to work (Cluster won't start because no version file... can't migrate because cluster ain't up...). It tries to connect to the master to ensure it isn't running (uses HBaseAdmin.isMasterRunning()) We wouldn't want to do a upgrade with the cluster running. [hbase] After 2643, cluster won't start if FS was created by an older hbase version --- Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version
[ https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560753#action_12560753 ] jimk edited comment on HADOOP-2668 at 1/19/08 5:06 PM: It didn't occur to me that migration was the way to fix the missing version file. From HMaster.java(894, 5): {code} throw new IOException( file system not correct version. Run hbase.util.Migrate); {code} I also figured we should just auto-migrate this one case of a missing version file (If in future, version file goes missing, I'd think it the job of hbsfck recreating it, rather than migration?). Suppose you have a file system that has not been migrated? (i.e. regions are stored in {code}/hbase/hregion_nnn{code}) The master would start up write the version file and then proceed to recreate the root and meta regions because they aren't under {code}/hbase/-ROOT-{code} and {code}/hbase/.META.{code} respectively. Additionally the first thing the migrate tool does is look for the version file. If it finds it and the version number matches, it figures that the file system has been upgraded already and does nothing. But I'm fine w/ forcing users to run the migration. It needs to be better documented and added to the bin/hbase script with verb 'migrate' I'd say. Agreed. How about this changing this patch to update bin/hbase and add documentation (where ?)? I tried to run the migration but it wants to connect to a HMaster. That ain't going to work (Cluster won't start because no version file... can't migrate because cluster ain't up...). It tries to connect to the master to ensure it isn't running (uses HBaseAdmin.isMasterRunning()) We wouldn't want to do a upgrade with the cluster running. was (Author: jimk): It didn't occur to me that migration was the way to fix the missing version file. From HMaster.java(894, 5): {code} throw new IOException( file system not correct version. Run hbase.util.Migrate); {code} I also figured we should just auto-migrate this one case of a missing version file (If in future, version file goes missing, I'd think it the job of hbsfck recreating it, rather than migration?). Suppose you have a file system that has not been migrated? (i.e. regions are stored in =/hbase/hregion_nnn=) The master would start up write the version file and then proceed to recreate the root and meta regions because they aren't under =/hbase/-ROOT-= and =/hbase/.META.= respectively. Additionally the first thing the migrate tool does is look for the version file. If it finds it and the version number matches, it figures that the file system has been upgraded already and does nothing. But I'm fine w/ forcing users to run the migration. It needs to be better documented and added to the bin/hbase script with verb 'migrate' I'd say. Agreed. How about this changing this patch to update bin/hbase and add documentation (where ?)? I tried to run the migration but it wants to connect to a HMaster. That ain't going to work (Cluster won't start because no version file... can't migrate because cluster ain't up...). It tries to connect to the master to ensure it isn't running (uses HBaseAdmin.isMasterRunning()) We wouldn't want to do a upgrade with the cluster running. [hbase] After 2643, cluster won't start if FS was created by an older hbase version --- Key: HADOOP-2668 URL: https://issues.apache.org/jira/browse/HADOOP-2668 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.16.0 Attachments: migrate.patch Hbase now checks for a version file. If none, it reports a version mismatch. There will be no version file if the hbase was made by a version older than r613469 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2643) [hbase] Make migration tool smarter.
[ https://issues.apache.org/jira/browse/HADOOP-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2643: -- Attachment: patch.txt [hbase] Make migration tool smarter. Key: HADOOP-2643 URL: https://issues.apache.org/jira/browse/HADOOP-2643 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Attachments: patch.txt The migration tool that handles the changes to how hbase lays out files in the file system needs to be smarter. - don't try to migrate old region directories in which the region name is a part of the directory name. - add a version number to the file system -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
[ https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2525: -- Resolution: Fixed Status: Resolved (was: Patch Available) Tests passed. Committed Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown --- Key: HADOOP-2525 URL: https://issues.apache.org/jira/browse/HADOOP-2525 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Environment: CentOS 5 Reporter: Chris Kline Assignee: Jim Kellerman Priority: Minor Fix For: 0.16.0 Attachments: patch.txt Background: We ran out of disk space on HMaster before this issue occurred. The sequence of events were: 1. Ran out of disk space 2. Freed up 10 GB of disk space 3. Shut down HBase We had the following 2 lines repeated over 11 million times in the span of 10 minutes: 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: ProcessServerShutdown of 10.100.11.64:60020 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2616) hbase not spliting when the total size of region reaches max region size * 1.5
[ https://issues.apache.org/jira/browse/HADOOP-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman resolved HADOOP-2616. --- Resolution: Fixed Fix Version/s: (was: 0.17.0) 0.16.0 Clarified documentation. Committed with changes for HADOOP-2525 hbase not spliting when the total size of region reaches max region size * 1.5 -- Key: HADOOP-2616 URL: https://issues.apache.org/jira/browse/HADOOP-2616 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Billy Pearson Assignee: Jim Kellerman Priority: Minor Fix For: 0.16.0 right now a region may get larger then the max size set in the conf HRegion.needsSplit Checks the largest column to see if its larger then max region size * 1.5 and then desides to split or not But if we have more then one column the region could be vary large example Say we have 10 columns all about the same size lets say 40MB and the max file size is 64MB we would not split even thought the region size is 400MB well over the 96MB needed to trip a split to happen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2636) [hbase] Make flusher less dumb
[ https://issues.apache.org/jira/browse/HADOOP-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2636: - Assignee: Jim Kellerman [hbase] Make flusher less dumb -- Key: HADOOP-2636 URL: https://issues.apache.org/jira/browse/HADOOP-2636 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Jim Kellerman Priority: Minor When flusher runs -- its triggered when the sum of all Stores in a Region a configurable max size -- we flush all Stores though a Store memcache might have but a few bytes. I would think Stores should only dump their memcache disk if they have some substance. The problem becomes more acute, the more families you have in a Region. Possible behaviors would be to dump the biggest Store only, or only those Stores 50% of max memcache size. Behavior would vary dependent on the prompt that provoked the flush. Would also log why the flush is running: optional or max size. This issue comes out of HADOOP-2621. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2636) [hbase] Make flusher less dumb
[ https://issues.apache.org/jira/browse/HADOOP-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559938#action_12559938 ] Jim Kellerman commented on HADOOP-2636: --- Better yet, move triggering of cache flush to the store level instead of the region level. Same for compactions. Split still has to happen at the region level because it is the region that embodies the concept of row range. However the split could be triggered by a single store reaching the split threshold. [hbase] Make flusher less dumb -- Key: HADOOP-2636 URL: https://issues.apache.org/jira/browse/HADOOP-2636 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Priority: Minor When flusher runs -- its triggered when the sum of all Stores in a Region a configurable max size -- we flush all Stores though a Store memcache might have but a few bytes. I would think Stores should only dump their memcache disk if they have some substance. The problem becomes more acute, the more families you have in a Region. Possible behaviors would be to dump the biggest Store only, or only those Stores 50% of max memcache size. Behavior would vary dependent on the prompt that provoked the flush. Would also log why the flush is running: optional or max size. This issue comes out of HADOOP-2621. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2496) Snapshot of table
[ https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2496: -- Issue Type: New Feature (was: Bug) Snapshot of table - Key: HADOOP-2496 URL: https://issues.apache.org/jira/browse/HADOOP-2496 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Billy Pearson Priority: Minor Fix For: 0.17.0 Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2619) Compaction errors after a region splits
[ https://issues.apache.org/jira/browse/HADOOP-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2619: - Assignee: stack Compaction errors after a region splits --- Key: HADOOP-2619 URL: https://issues.apache.org/jira/browse/HADOOP-2619 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop snv 612165 Reporter: Billy Pearson Assignee: stack Fix For: 0.16.0 Attachments: compactiondir-v4.patch, hbase-root-regionserver-PE1750-4.log I am getting compaction errors from regions after they split not all of them have this problem but some do I attached a log I picked out one region webdata,com.technorati/tag/potiron:http,1200430376177 it is loaded then splits at 2008-01-15 14:52:56,116 the split is finshed at 2008-01-15 14:53:01,653 the first compaction for the new top half region starts at 2008-01-15 14:54:07,612 - webdata,com.technorati/tag/potiron:http,1200430376177 and ends successful at 2008-01-15 14:54:30,229 ten the next compaction starts at 2008-01-15 14:56:16,315 This one ends with an error at 2008-01-15 14:56:40,246 {code} 2008-01-15 14:57:53,002 ERROR org.apache.hadoop.hbase.HRegionServer: Compaction failed for region webdata,com.technorati/tag/potiron:http,1200430376177 org.apache.hadoop.dfs.LeaseExpiredException: org.apache.hadoop.dfs.LeaseExpiredException: No lease on /gfs_storage/hadoop-root/hbase/webdata/compaction.dir/1438658724/in_rank/mapfiles/8222904438849251562/data at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1123) at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1061) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:303) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:908) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:494) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:418) {code} and all other compaction's for this region fail after this one fail with the same error I will have to keep testing to see if it ever finishes successfully. maybe after a restart it will successfully finsh a compaction. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2624) [hbase] memory management
[ https://issues.apache.org/jira/browse/HADOOP-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2624: - Assignee: Jim Kellerman [hbase] memory management - Key: HADOOP-2624 URL: https://issues.apache.org/jira/browse/HADOOP-2624 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Jim Kellerman Each Store has a Memcache of edits that is flushed on a fixed period (It used to be flushed when it grew beyond a limit). A Region can be made up of N Stores. A regionserver has no upper bound on the number of regions that can be deployed to it currently. Add to this that per mapfile, we have read the index into memory. We're also talking about adding caching of blocks and cells. We need a means of keeping an account of memory usage adjusting cache sizes and flush rates (or sizes) dynamically -- using References where possible -- to accomodate deployment of added regions. If memory is strained, we should reject regions proffered by the master with a resouce-constrained, or some such, message. The manual sizing we currently do ain't going to cut it for clusters of any decent size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
[ https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2525: - Assignee: Jim Kellerman (was: stack) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown --- Key: HADOOP-2525 URL: https://issues.apache.org/jira/browse/HADOOP-2525 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Environment: CentOS 5 Reporter: Chris Kline Assignee: Jim Kellerman Priority: Minor Background: We ran out of disk space on HMaster before this issue occurred. The sequence of events were: 1. Ran out of disk space 2. Freed up 10 GB of disk space 3. Shut down HBase We had the following 2 lines repeated over 11 million times in the span of 10 minutes: 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: ProcessServerShutdown of 10.100.11.64:60020 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2615) Add max number of mapfiles to compact at one time giveing us a minor major compaction
[ https://issues.apache.org/jira/browse/HADOOP-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2615: - Assignee: Jim Kellerman Add max number of mapfiles to compact at one time giveing us a minor major compaction --- Key: HADOOP-2615 URL: https://issues.apache.org/jira/browse/HADOOP-2615 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Billy Pearson Assignee: Jim Kellerman Priority: Minor Fix For: 0.17.0 Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3 I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time. When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2616) hbase not spliting when the total size of region reaches max region size * 1.5
[ https://issues.apache.org/jira/browse/HADOOP-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2616: - Assignee: Jim Kellerman hbase not spliting when the total size of region reaches max region size * 1.5 -- Key: HADOOP-2616 URL: https://issues.apache.org/jira/browse/HADOOP-2616 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Billy Pearson Assignee: Jim Kellerman Priority: Minor Fix For: 0.17.0 right now a region may get larger then the max size set in the conf HRegion.needsSplit Checks the largest column to see if its larger then max region size * 1.5 and then desides to split or not But if we have more then one column the region could be vary large example Say we have 10 columns all about the same size lets say 40MB and the max file size is 64MB we would not split even thought the region size is 400MB well over the 96MB needed to trip a split to happen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2621) Memcache flush flushing every 60 secs with out considering the max memcache size
[ https://issues.apache.org/jira/browse/HADOOP-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2621: - Assignee: stack Memcache flush flushing every 60 secs with out considering the max memcache size Key: HADOOP-2621 URL: https://issues.apache.org/jira/browse/HADOOP-2621 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Billy Pearson Assignee: stack Fix For: 0.16.0 Attachments: optionalcacheflushinterval.patch looks like hbase is flushing all memcache to disk every 60 secs causing a lot of work for the compactor to keep up because column gets its own mapfile and every region is flushed at one time. This could be a vary large number of mapfiles to write if a region server is hosting 100 regions all with milti columns. Idea memcache flush keep all data in memory until memcache get larger then the conf size with hbase.hregion.memcache.flush.size. When we reach this size we should flush the regions that are the largest first stopping once we drop back below the memcache max size maybe 20% below the max. This will to flush only as needed as each flush takes time to compact when compaction runs on a region. while we are flushing a region we should also be blocking new updates from happening on that region so the region server does not get over ran when a high update load hits a region server. By only blocking on the region we are flushing at that time other regions will still be able to do updates this. We we still want to use the hbase.regionserver.optionalcacheflushinterval we should set to to run once an hour so something like that so we can recover memory from the memcache on region that do not have a lot updates in memory. But running at the default set now of 60 secs is not so good for the compactor if it has many regions to handle also not good for a scanner to have to scan many small files vs a few larger ones Example a compactor may take 15 mins to compact a region in that time we will flush 15 times causeing all other regions to get a new mapfile to compact when it becomes it turn to get compacted if you had many regions getting compacted the last one on the list of say 10 regions would have 10 regions * 15 mins each = 150 mapfiles for each column in the last region written before the compactor can get to it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2643) [hbase] Make migration tool smarter.
[hbase] Make migration tool smarter. Key: HADOOP-2643 URL: https://issues.apache.org/jira/browse/HADOOP-2643 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman The migration tool that handles the changes to how hbase lays out files in the file system needs to be smarter. - don't try to migrate old region directories in which the region name is a part of the directory name. - add a version number to the file system -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1398) Add in-memory caching of data
[ https://issues.apache.org/jira/browse/HADOOP-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560099#action_12560099 ] Jim Kellerman commented on HADOOP-1398: --- Tom, Yes, we need to start versioning everything that goes out to disk. And if we make an incompatible change, we either need to correct for it on the fly or augment the migration tool (hbase.util.Migrate.java) Add in-memory caching of data - Key: HADOOP-1398 URL: https://issues.apache.org/jira/browse/HADOOP-1398 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Jim Kellerman Priority: Trivial Attachments: hadoop-blockcache-v2.patch, hadoop-blockcache.patch Bigtable provides two in-memory caches: one for row/column data and one for disk block caches. The size of each cache should be configurable, data should be loaded lazily, and the cache managed by an LRU mechanism. One complication of the block cache is that all data is read through a SequenceFile.Reader which ultimately reads data off of disk via a RPC proxy for ClientProtocol. This would imply that the block caching would have to be pushed down to either the DFSClient or SequenceFile.Reader -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
[ https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2334: - Assignee: (was: Jim Kellerman) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text? -- Key: HADOOP-2334 URL: https://issues.apache.org/jira/browse/HADOOP-2334 Project: Hadoop Issue Type: Wish Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Priority: Minor Fix For: 0.16.0 I have heard from several people that row keys in HBase should be less restricted than hadoop.io.Text. What do you think? At the very least, a row key has to be a WritableComparable. This would lead to the most general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The primary difference between these two classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting a String to a BytesWritable and vice versa can become problematic. hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass in and then does not allow the size to be changed. If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable, because it has a fixed size once set, and operations like get, etc do not have to something like System.arrayCopy where you specify the number of bytes to copy. Your comments, questions are welcome on this issue. If we receive enough feedback that Text is too restrictive, we are willing to change it, but we need to hear what would be the most useful thing to change it to as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
[ https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2525: -- Fix Version/s: 0.16.0 Status: Patch Available (was: Open) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown --- Key: HADOOP-2525 URL: https://issues.apache.org/jira/browse/HADOOP-2525 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Environment: CentOS 5 Reporter: Chris Kline Assignee: Jim Kellerman Priority: Minor Fix For: 0.16.0 Attachments: patch.txt Background: We ran out of disk space on HMaster before this issue occurred. The sequence of events were: 1. Ran out of disk space 2. Freed up 10 GB of disk space 3. Shut down HBase We had the following 2 lines repeated over 11 million times in the span of 10 minutes: 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: ProcessServerShutdown of 10.100.11.64:60020 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
[ https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2525: -- Attachment: patch.txt Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown --- Key: HADOOP-2525 URL: https://issues.apache.org/jira/browse/HADOOP-2525 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Environment: CentOS 5 Reporter: Chris Kline Assignee: Jim Kellerman Priority: Minor Fix For: 0.16.0 Attachments: patch.txt Background: We ran out of disk space on HMaster before this issue occurred. The sequence of events were: 1. Ran out of disk space 2. Freed up 10 GB of disk space 3. Shut down HBase We had the following 2 lines repeated over 11 million times in the span of 10 minutes: 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: ProcessServerShutdown of 10.100.11.64:60020 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2624) [hbase] memory management
[ https://issues.apache.org/jira/browse/HADOOP-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2624: - Assignee: (was: Jim Kellerman) [hbase] memory management - Key: HADOOP-2624 URL: https://issues.apache.org/jira/browse/HADOOP-2624 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Each Store has a Memcache of edits that is flushed on a fixed period (It used to be flushed when it grew beyond a limit). A Region can be made up of N Stores. A regionserver has no upper bound on the number of regions that can be deployed to it currently. Add to this that per mapfile, we have read the index into memory. We're also talking about adding caching of blocks and cells. We need a means of keeping an account of memory usage adjusting cache sizes and flush rates (or sizes) dynamically -- using References where possible -- to accomodate deployment of added regions. If memory is strained, we should reject regions proffered by the master with a resouce-constrained, or some such, message. The manual sizing we currently do ain't going to cut it for clusters of any decent size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2039) [hbase] When a get or scan request spans multiple columns, execute the reads in parallel
[ https://issues.apache.org/jira/browse/HADOOP-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2039: - Assignee: (was: Jim Kellerman) [hbase] When a get or scan request spans multiple columns, execute the reads in parallel Key: HADOOP-2039 URL: https://issues.apache.org/jira/browse/HADOOP-2039 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Priority: Trivial Fix For: 0.16.0 When a get or scan request spans multiple columns, execute the reads in parallel and use a CountDownLatch to wait for them to complete before returning the results. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()
[ https://issues.apache.org/jira/browse/HADOOP-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2364: - Assignee: Jim Kellerman when hbase regionserver restarts, it says impossible state for createLease() -- Key: HADOOP-2364 URL: https://issues.apache.org/jira/browse/HADOOP-2364 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Assignee: Jim Kellerman Priority: Minor I restarted a regionserver, and got this error in its logs: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.AssertionError: Impossible state for createLease(): Lease -435227488/-435227488 is still held. at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145) at org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278 ) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at org.apache.hadoop.ipc.Client.call(Client.java:482) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) at $Proxy0.regionServerStartup(Unknown Source) at org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav a:1025) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
[ https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560143#action_12560143 ] jimk edited comment on HADOOP-2525 at 1/17/08 3:14 PM: Otherwise patch looks good. How you think it fixes the issue? The crux of the patch is the following change: {code} - for (RegionServerOperation op = null; !closed.get(); ) { + while (!closed.get()) { +RegionServerOperation op = null; {code} the old code only declared and nulled out 'op' for the first iteration. If op was set non-null and went back to top of loop, it would fall through and just re-execute op again, rather than polling the queues and waiting. was (Author: jimk): Otherwise patch looks good. How you think it fixes the issue? The crux of the patch is the following change: - for (RegionServerOperation op = null; !closed.get(); ) { + while (!closed.get()) { +RegionServerOperation op = null; the old code only declared and nulled out 'op' for the first iteration. If op was set non-null and went back to top of loop, it would fall through and just re-execute op again, rather than polling the queues and waiting. Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown --- Key: HADOOP-2525 URL: https://issues.apache.org/jira/browse/HADOOP-2525 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Environment: CentOS 5 Reporter: Chris Kline Assignee: Jim Kellerman Priority: Minor Fix For: 0.16.0 Attachments: patch.txt Background: We ran out of disk space on HMaster before this issue occurred. The sequence of events were: 1. Ran out of disk space 2. Freed up 10 GB of disk space 3. Shut down HBase We had the following 2 lines repeated over 11 million times in the span of 10 minutes: 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: ProcessServerShutdown of 10.100.11.64:60020 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
[ https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560143#action_12560143 ] Jim Kellerman commented on HADOOP-2525: --- Otherwise patch looks good. How you think it fixes the issue? The crux of the patch is the following change: - for (RegionServerOperation op = null; !closed.get(); ) { + while (!closed.get()) { +RegionServerOperation op = null; the old code only declared and nulled out 'op' for the first iteration. If op was set non-null and went back to top of loop, it would fall through and just re-execute op again, rather than polling the queues and waiting. Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown --- Key: HADOOP-2525 URL: https://issues.apache.org/jira/browse/HADOOP-2525 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Environment: CentOS 5 Reporter: Chris Kline Assignee: Jim Kellerman Priority: Minor Fix For: 0.16.0 Attachments: patch.txt Background: We ran out of disk space on HMaster before this issue occurred. The sequence of events were: 1. Ran out of disk space 2. Freed up 10 GB of disk space 3. Shut down HBase We had the following 2 lines repeated over 11 million times in the span of 10 minutes: 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop: ProcessServerShutdown of 10.100.11.64:60020 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2651) [Hbase] Caching for read performance
[ https://issues.apache.org/jira/browse/HADOOP-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman resolved HADOOP-2651. --- Resolution: Duplicate [Hbase] Caching for read performance Key: HADOOP-2651 URL: https://issues.apache.org/jira/browse/HADOOP-2651 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Edward Yoon Assignee: Edward Yoon * Use two level of caching to improve read performance * Scan cache ** Higher-level cache *** Caches the K,V pairs returned by the SSTable(HStore?) interface to the region server code ** Most useful for applications that tend to read the same data repeatedly * Block cache ** Lower-level cache *** Caches SSTables blocks that were read from HDFS ** Useful for applications that read data close to the data that they recently read *** E.g. Sequential read or random read of different column in same locality group within a hot row -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Subclassing SequenceFile and MapFile
HBase has several subclasses of MapFile already: org.apache.hadoop.hbase.HStoreFile$ HbaseMapFile BloomFilterMapFile HalfMapFileReader If MapFile were more subclassable (had protected members instead of private or accessor methods) we would probably add client side caching, bloom filters (to determine if a key exists in a map file - different from BloomFilterMapFile above which is a mix-in of MapFile and BloomFilter) Tom White said (in https://issues.apache.org/jira/browse/HADOOP-2604) If MapFile.Reader were an interface (or an abstract class with a no args constructor) then BloomFilterMapFile.Reader, HalfMapFileReader and caching Readers could be implemented as wrappers instead of in a static hierarchy. This would make it easier to mix and match readers (e.g. with or without caching) without passing all possible parameters in the constructor. So we'd like to make MapFile (and probably SequenceFile) subclassable by providing accessors and/or making members protected instead of private. If these classes should not be subclassed, they should be declared as final classes. Thoughts? Opinions? Comments? --- Jim Kellerman, Senior Engineer; Powerset No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.5/1228 - Release Date: 1/16/2008 9:01 AM
Multiplexing sockets in DFSClient/datanodes?
HBase has a problem with running out of file handles on machines that act as region servers. From https://issues.apache.org/jira/browse/HADOOP-2577 Today the rapleaf gave me an lsof listing from a regionserver. Had thousands of open sockets to datanodes all in ESTABLISHED and CLOSE_WAIT state. On average they seem to have about ten file descriptors/sockets open per region (They have 3 column families IIRC. Per family, can have between 1-5 or so mapfiles open per family – 3 is max... but compacting we open a new one, etc.). They have thousands of regions. 400 regions – ~100G, which is not that much – takes about 4k open file handles. If they want a regionserver to server a decent disk worths – 300-400G – then thats maybe 1600 regions... 16k file handles. If more than just 3 column families. then we are in danger of blowing out limits if they are 32k. One possible solution we've thought of is multiplexing sockets between the DFSClient and the data node. In this case, there would be one socket per client -- datanode and would run in async mode using select. This would consume far fewer sockets than the current 1 socket / client / datanode / open file. We used a socket multiplexer at Yahoo for the data store I worked on there, the user data base (or UDB) which stored all the preference data for all Yahoo pages that could be customized. All the UDB clients each had one socket open for each machine in the UDB server cluster. Similarly, each UDB server had one socket open to talk to all of its clients. When you consider each UDB server had to talk to several thousand clients, and that each server machine ran many server processes to handle load, this was a huge savings in OS overhead. While the 1 socket / client / datanode / open file is a simple model, if we are talking about scaling Hadoop or HBase to thousands of nodes, it seems like socket multiplexing would be a big win in terms of server overhead, especially considering that many of these connections are more idle than in use. Yes, multiplexing a socket is more complicated than having one socket per file, but saving system resources seems like a way to scale. Questions? Comments? Opinions? Flames? --- Jim Kellerman, Senior Engineer; Powerset No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.5/1228 - Release Date: 1/16/2008 9:01 AM
[jira] Commented: (HADOOP-2621) Memcache flush flushing every 60 secs with out considering the max memcache size
[ https://issues.apache.org/jira/browse/HADOOP-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559660#action_12559660 ] Jim Kellerman commented on HADOOP-2621: --- You can configure the memcache flush size by setting the config parameter hbase.hregion.memcache.flush.size the default is 64M. When a HRegion reaches this threshold, it will call for a cache flush. If the cache is flushed, a request is queued to determine if a compaction is necessary. If a compaction is done, then a request is queued to determine if the region needs to be split. Memcache flush flushing every 60 secs with out considering the max memcache size Key: HADOOP-2621 URL: https://issues.apache.org/jira/browse/HADOOP-2621 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Billy Pearson Fix For: 0.16.0 looks like hbase is flushing all memcache to disk every 60 secs causing a lot of work for the compactor to keep up because column gets its own mapfile and every region is flushed at one time. This could be a vary large number of mapfiles to write if a region server is hosting 100 regions all with milti columns. Idea memcache flush keep all data in memory until memcache get larger then the conf size with hbase.hregion.memcache.flush.size. When we reach this size we should flush the regions that are the largest first stopping once we drop back below the memcache max size maybe 20% below the max. This will to flush only as needed as each flush takes time to compact when compaction runs on a region. while we are flushing a region we should also be blocking new updates from happening on that region so the region server does not get over ran when a high update load hits a region server. By only blocking on the region we are flushing at that time other regions will still be able to do updates this. We we still want to use the hbase.regionserver.optionalcacheflushinterval we should set to to run once an hour so something like that so we can recover memory from the memcache on region that do not have a lot updates in memory. But running at the default set now of 60 secs is not so good for the compactor if it has many regions to handle also not good for a scanner to have to scan many small files vs a few larger ones Example a compactor may take 15 mins to compact a region in that time we will flush 15 times causeing all other regions to get a new mapfile to compact when it becomes it turn to get compacted if you had many regions getting compacted the last one on the list of say 10 regions would have 10 regions * 15 mins each = 150 mapfiles for each column in the last region written before the compactor can get to it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2356) Set memcache flush size per column
[ https://issues.apache.org/jira/browse/HADOOP-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2356: -- Summary: Set memcache flush size per column (was: Set memcache flush size per table) Set memcache flush size per column -- Key: HADOOP-2356 URL: https://issues.apache.org/jira/browse/HADOOP-2356 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Paul Saab Priority: Minor The amount of memory taken by the memcache before a flush is currently a global parameter. It should be configurable per-table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Resolution: Fixed Status: Resolved (was: Patch Available) Tests passed, patch verified by Billy Pearson (who reported the problem). Committed. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata
[jira] Resolved: (HADOOP-2348) [hbase] lock_id in HTable.startUpdate and commit/abort is misleading and useless
[ https://issues.apache.org/jira/browse/HADOOP-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman resolved HADOOP-2348. --- Resolution: Won't Fix [hbase] lock_id in HTable.startUpdate and commit/abort is misleading and useless Key: HADOOP-2348 URL: https://issues.apache.org/jira/browse/HADOOP-2348 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Bryan Duxbury Assignee: Jim Kellerman Priority: Minor In the past, the lock id returned by HTable.startUpdate was a real lock id from a remote server. However, that has been superceeded by the BatchUpdate process, so now the lock id is just an arbitrary value. More, it doesn't actually add any value, because while it implies that you could start two updates on the same HTable and commit them separately, this is in fact not the case. Any attempt to do a second startUpdate throws an IllegalStateException. Since there is no added functionality afforded by the presence of this parameter, I suggest that we overload all methods that use it to ignore it and print a deprecation notice. startUpdate can just return a constant like 1 and eventually turn into a boolean or some other useful value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2138) [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness
[ https://issues.apache.org/jira/browse/HADOOP-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2138: -- Priority: Minor (was: Major) Summary: [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness (was: [hbase] Master should allocate regions to the regionserver hosting the region data where possible) Downgrading priority because we should leverage Hadoop's rack awareness where possible, and there is a lot of work left to do (in Hadoop) before we can [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness --- Key: HADOOP-2138 URL: https://issues.apache.org/jira/browse/HADOOP-2138 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Priority: Minor Currently, regions are assigned regionservers based off a basic loading attribute. A factor to include in the assignment calcuation is the location of the region in hdfs; i.e. servers hosting region replicas. If the cluster is such that regionservers are being run on the same nodes as those running hdfs, then ideally the regionserver for a particular region should be running on the same server as hosts a region replica. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2291) [hbase] Add row count estimator
[ https://issues.apache.org/jira/browse/HADOOP-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559202#action_12559202 ] Jim Kellerman commented on HADOOP-2291: --- What is the status of this issue? [hbase] Add row count estimator --- Key: HADOOP-2291 URL: https://issues.apache.org/jira/browse/HADOOP-2291 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: stack Assignee: Edward Yoon Priority: Minor Attachments: 2291_v01.patch, Keying.java Internally we have a little tool that will do a rough estimate of how many rows there are in a dataHbase. It keeps getting larger and larger partitions running scanners until it turns up N occupied rows. Once it has a number N, it multiples by the partition size to get an approximate row count. This issue is about generalizing this feature so it could sit in the general hbase install. It would look something like: {code} long getApproximateRowCount(final Text startRow, final Text endRow, final long minimumCountPerPartition, final long maximumPartitionSize) {code} Larger minimumCountPerPartition and maximumPartitionSize values would make the count more accurate but would mean the method ran longer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2343) [hbase] Stuck regionserver?
[ https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2343: -- Priority: Trivial (was: Minor) [hbase] Stuck regionserver? --- Key: HADOOP-2343 URL: https://issues.apache.org/jira/browse/HADOOP-2343 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: stack Priority: Trivial Looking in logs, a regionserver went down because it could not contact the master after 60 seconds. Watching logging, the HRS is repeatedly checking all 150 loaded regions over and over again w/ a pause of about 5 seconds between runs... then there is a suspicious 60+ second gap with no logging as though the regionserver had hung up on something: {code} 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() determined that there was nothing to do 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region postlog,img247/230/seanpaul4li.jpg,1196615889965 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() determined that there was nothing to do 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to master for 67467 milliseconds - aborting server 2007-12-03 13:16:04,455 INFO hbase.Leases - regionserver/0:0:0:0:0:0:0:0:60020 closing leases 2007-12-03 13:16:04,455 INFO hbase.Leases$LeaseMonitor - regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting {code} Master seems to be running fine scanning its ~700 regions. Then you see this in log, before the HRS shuts itself down. {code} 2007-12-03 13:14:31,416 INFO hbase.Leases - HMaster.leaseChecker lease expired 153260899/1532608992007-12-03 13:14:31,417 INFO hbase.HMaster - XX.XX.XX.102:60020 lease expired {code} ... and we go on to process shutdown. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2343) [hbase] Stuck regionserver?
[ https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559204#action_12559204 ] Jim Kellerman commented on HADOOP-2343: --- I believe this issue was (eventually) addressed by HADOOP-2338. Leaving open in case issue re-occurs. But will downgrade priority. [hbase] Stuck regionserver? --- Key: HADOOP-2343 URL: https://issues.apache.org/jira/browse/HADOOP-2343 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: stack Priority: Minor Looking in logs, a regionserver went down because it could not contact the master after 60 seconds. Watching logging, the HRS is repeatedly checking all 150 loaded regions over and over again w/ a pause of about 5 seconds between runs... then there is a suspicious 60+ second gap with no logging as though the regionserver had hung up on something: {code} 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() determined that there was nothing to do 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region postlog,img247/230/seanpaul4li.jpg,1196615889965 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() determined that there was nothing to do 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to master for 67467 milliseconds - aborting server 2007-12-03 13:16:04,455 INFO hbase.Leases - regionserver/0:0:0:0:0:0:0:0:60020 closing leases 2007-12-03 13:16:04,455 INFO hbase.Leases$LeaseMonitor - regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting {code} Master seems to be running fine scanning its ~700 regions. Then you see this in log, before the HRS shuts itself down. {code} 2007-12-03 13:14:31,416 INFO hbase.Leases - HMaster.leaseChecker lease expired 153260899/1532608992007-12-03 13:14:31,417 INFO hbase.HMaster - XX.XX.XX.102:60020 lease expired {code} ... and we go on to process shutdown. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2400) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly
[ https://issues.apache.org/jira/browse/HADOOP-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2400: -- Priority: Trivial (was: Minor) Issue Type: Improvement (was: Bug) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly - Key: HADOOP-2400 URL: https://issues.apache.org/jira/browse/HADOOP-2400 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Priority: Trivial mapreduce has a configuration property called mapred.system.dir which determines where in the DFS a jobtracker stores its data. Similarly, hbase has a configuration property called hbase.rootdir which does something very similar. These should have the same name, eg. hbase.system.dir and mapred.system.dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2136) [hbase] TestTableIndex: variable substitution depth too large: 20
[ https://issues.apache.org/jira/browse/HADOOP-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2136: -- Priority: Trivial (was: Minor) Downgrading priority since it has been some time since this problem was last observed. [hbase] TestTableIndex: variable substitution depth too large: 20 - Key: HADOOP-2136 URL: https://issues.apache.org/jira/browse/HADOOP-2136 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Trivial See 'stack - 30/Oct/07 09:51 PM' comment over in HADOOP-2083 for description of an error or see here: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/testReport/org.apache.hadoop.hbase.mapred/TestTableIndex/testTableIndex/ Seems like its a rare occurrence. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2527) Improve master load balancing
[ https://issues.apache.org/jira/browse/HADOOP-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2527: -- Priority: Major (was: Minor) Affects Version/s: (was: 0.15.0) 0.16.0 Summary: Improve master load balancing (was: Poor distribution of regions) Improve master load balancing - Key: HADOOP-2527 URL: https://issues.apache.org/jira/browse/HADOOP-2527 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Environment: CentOS 5 Reporter: Chris Kline We get poor distribution of regions when we start up HBase. We have a total of 13 nodes and 898 regions, which should yield an average of 69 regions per node. Instead, one node has 173 regions and one node has 16 regions. Address Start Code Load 10.100.11.62:600201199406218912 requests: 0 regions: 63 10.100.11.59:600201199406219179 requests: 0 regions: 55 10.100.11.60:600201199406219062 requests: 0 regions: 90 10.100.11.61:600201199406219132 requests: 1 regions: 54 10.100.11.64:600201199406218817 requests: 0 regions: 173 10.100.11.31:600201199406219039 requests: 1 regions: 16 10.100.11.58:600201199406218895 requests: 0 regions: 89 10.100.11.56:600201199406219037 requests: 0 regions: 76 10.100.11.65:600201199406219135 requests: 0 regions: 56 10.100.11.57:600201199406219183 requests: 1 regions: 56 10.100.11.33:600201199406219174 requests: 1 regions: 56 10.100.11.32:600201199406218944 requests: 0 regions: 66 10.100.11.63:600201199406219182 requests: 0 regions: 48 Total:servers: 13 requests: 4 regions: 898 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()
[ https://issues.apache.org/jira/browse/HADOOP-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559207#action_12559207 ] Jim Kellerman commented on HADOOP-2364: --- Is this still a problem? When did it last occur? when hbase regionserver restarts, it says impossible state for createLease() -- Key: HADOOP-2364 URL: https://issues.apache.org/jira/browse/HADOOP-2364 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Priority: Minor I restarted a regionserver, and got this error in its logs: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.AssertionError: Impossible state for createLease(): Lease -435227488/-435227488 is still held. at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145) at org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278 ) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at org.apache.hadoop.ipc.Client.call(Client.java:482) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) at $Proxy0.regionServerStartup(Unknown Source) at org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav a:1025) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2526) HRegionServer hangs upon exit due to DFSClient Exception
[ https://issues.apache.org/jira/browse/HADOOP-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559217#action_12559217 ] Jim Kellerman commented on HADOOP-2526: --- Is this still an issue? Has it occurred since reported? HRegionServer hangs upon exit due to DFSClient Exception Key: HADOOP-2526 URL: https://issues.apache.org/jira/browse/HADOOP-2526 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.15.0 Environment: CentOS 5 Reporter: Chris Kline Priority: Minor Several HRegionServers hang around indefinitely well after the HMaster has exited. This was triggered executing $HBASE_HOME/bin/stop-hbase.sh. The HMaster exists fine, but here is what happens on one of the HRegionServers: 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.HRegionServer: Got regionserver stop message 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020 closing leases 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases$LeaseMonitor: regionserver/0.0.0.0:60020.leaseChecker exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.hbase.Leases: regionserver/0.0.0.0:60020 closed leases 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: Stopping server on 60020 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 60020: exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 60020: exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 60020: exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 60020: exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 60020: exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 60020: exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 60020: exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60020: exiting 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 60020: exiting 2008-01-02 18:54:01,909 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 60020 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 60020: exiting 2008-01-02 18:54:01,909 INFO org.apache.hadoop.hbase.HRegionServer: Stopping infoServer 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.Container: Stopping [EMAIL PROTECTED] 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: closing ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030] 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: IGNORED java.net.SocketException: Socket closed at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.mortbay.util.ThreadedServer.acceptSocket(ThreadedServer.java:432) at org.mortbay.util.ThreadedServer$Acceptor.run(ThreadedServer.java:631) 2008-01-02 18:54:01,910 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030] 2008-01-02 18:54:01,910 DEBUG org.mortbay.util.ThreadedServer: Self connect to close listener /127.0.0.1:60030 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem stopping acceptor /127.0.0.1: 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem stopping acceptor /127.0.0.1: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at java.net.Socket.connect(Socket.java:469) at java.net.Socket.init(Socket.java:366) at java.net.Socket.init(Socket.java:209) at org.mortbay.util.ThreadedServer$Acceptor.forceStop(ThreadedServer.java:682) at org.mortbay.util.ThreadedServer.stop(ThreadedServer.java:557) at org.mortbay.http.SocketListener.stop(SocketListener.java:211) at org.mortbay.http.HttpServer.doStop(HttpServer.java:781) at org.mortbay.util.Container.stop(Container.java:154) at org.apache.hadoop.hbase.util.InfoServer.stop(InfoServer.java:237
RE: [jira] Created: (HADOOP-2616) hbase not spliting when the total size of region reaches max region size * 1.5
We do not need to split unless any one column is over the threshold. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Billy Pearson (JIRA) [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 15, 2008 2:46 PM To: hadoop-dev@lucene.apache.org Subject: [jira] Created: (HADOOP-2616) hbase not spliting when the total size of region reaches max region size * 1.5 hbase not spliting when the total size of region reaches max region size * 1.5 -- Key: HADOOP-2616 URL: https://issues.apache.org/jira/browse/HADOOP-2616 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Billy Pearson Priority: Minor Fix For: 0.17.0 right now a region may get larger then the max size set in the conf HRegion.needsSplit Checks the largest column to see if its larger then max region size * 1.5 and then desides to split or not But if we have more then one column the region could be vary large example Say we have 10 columns all about the same size lets say 40MB and the max file size is 64MB we would not split even thought the region size is 400MB well over the 96MB needed to trip a split to happen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.2/1224 - Release Date: 1/14/2008 5:39 PM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.2/1224 - Release Date: 1/14/2008 5:39 PM
[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using
[ https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2588: -- Resolution: Fixed Status: Resolved (was: Patch Available) Tests passed. Committed. org.onelab.filter.BloomFilter class uses 8X the memory it should be using - Key: HADOOP-2588 URL: https://issues.apache.org/jira/browse/HADOOP-2588 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Environment: n/a Reporter: Ian Clarke Priority: Trivial Fix For: 0.16.0 Attachments: patch.txt The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, however in most Java implementations this will use a byte per bit stored, meaning that 8X the actual used memory is required. This is unfortunate as the whole point of a BloomFilter is to save memory. As a sidebar, the implementation looks a bit shaky in other ways, such as the way hashes are generated from a SHA1 digest in the Filter class, such as the way that it just assumes the digestBytes array will be long enough in the hash() method. I discovered this while looking for a good Bloom Filter implementation to use in my own project. In the end I went ahead and implemented my own, its very simple and pretty elegant (even if I do say so myself ;) - you are welcome to use it: http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2597) [hbase] Performance - add a block cache
[ https://issues.apache.org/jira/browse/HADOOP-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman resolved HADOOP-2597. --- Resolution: Duplicate [hbase] Performance - add a block cache --- Key: HADOOP-2597 URL: https://issues.apache.org/jira/browse/HADOOP-2597 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Tom White A block cache would cache fixed size blocks (default 64k) of data read from HDFS by the MapFile. It would help read performance for data close to recently read data (see Bigtable paper, section 6). It would be configurable on a per-column family basis. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Status: Open (was: Patch Available) TestTableIndex is now failing rather consistently. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Status: Patch Available (was: Open) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply
[jira] Reopened: (HADOOP-2443) [hbase] Keep lazy cache of regions in client rather than an 'authoritative' list
[ https://issues.apache.org/jira/browse/HADOOP-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reopened HADOOP-2443: --- Now that this works and has been committed, can we reduce the 'chattiness' of the debug level logging? Thanks. [hbase] Keep lazy cache of regions in client rather than an 'authoritative' list Key: HADOOP-2443 URL: https://issues.apache.org/jira/browse/HADOOP-2443 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Bryan Duxbury Fix For: 0.16.0 Attachments: 2443-v10.patch, 2443-v3.patch, 2443-v4.patch, 2443-v5.patch, 2443-v6.patch, 2443-v7.patch, 2443-v8.patch, 2443-v9.patch Currently, when the client gets a NotServingRegionException -- usually because its in middle of being split or there has been a regionserver crash and region is being moved elsewhere -- the client does a complete refresh of its cache of region locations for a table. Chatting with Jim about a Paul Saab upload issue from Saturday night, when tables are big comprised of regions that are splitting fast (because of bulk upload), its unlikely a client will ever be able to obtain a stable list of all region locations. Given that any update or scan requires that the list of all regions be in place before it proceeds, this can get in the way of the client succeeding when the cluster is under load. Chatting, we figure that it better the client holds a lazy region cache: on NSRE, figure out where that region has gone only and update the client-side cache for that entry only rather than throw out all we know of a table every time. Hopefully this will fix the issue PS was experiencing where during intense upload, he was unable to get/scan/hql the same table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Status: Open (was: Patch Available) Thought of a better way to force cache flushes. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Status: Open (was: Patch Available) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Attachment: patch.txt HTable$ClientScanner.nextScanner was sleeping, but in the wrong place Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Status: Patch Available (was: Open) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA
[jira] Updated: (HADOOP-2440) [hbase] Provide a HBase checker and repair tool similar to fsck
[ https://issues.apache.org/jira/browse/HADOOP-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2440: -- Status: Open (was: Patch Available) Patch was for one of the sub-issues and has been committed. The main issue has not yet been addressed. [hbase] Provide a HBase checker and repair tool similar to fsck --- Key: HADOOP-2440 URL: https://issues.apache.org/jira/browse/HADOOP-2440 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt We need a tool to verify (and repair) HBase much like fsck -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2416) [hbase] IOException: File does not exist
[ https://issues.apache.org/jira/browse/HADOOP-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2416: -- Priority: Minor (was: Major) Downgrading to minor since this problem has not been reported since the original report. [hbase] IOException: File does not exist Key: HADOOP-2416 URL: https://issues.apache.org/jira/browse/HADOOP-2416 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Minor Two fellas today on two unrelated clusters had versions of the below: {code} bryanduxbury 2007-12-12 08:28:22,235 ERROR org.apache.hadoop.hbase.HRegionServer: Compaction failed for region spider_pages,10_149317711,1197468834206 [13:01] bryanduxbury java.io.IOException: java.io.IOException: File does not exist [13:01] bryanduxbury at org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489) [13:01] bryanduxbury at org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1360) [13:01] bryanduxbury at org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:428) [13:01] bryanduxbury at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) [13:01] bryanduxbury at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [13:01] bryanduxbury at java.lang.reflect.Method.invoke(Method.java:597) [13:01] bryanduxbury at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) [13:01] bryanduxbury at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) [13:01] bryanduxbury at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source) [13:01] bryanduxbury at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) [13:01] bryanduxbury at java.lang.reflect.Constructor.newInstance(Constructor.java:513) [13:01] bryanduxbury at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) [13:01] bryanduxbury at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48) [13:01] bryanduxbury at org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:385) {code} Odd is that the file thats missing's name is not cited. The other instance showed in the webui. Seemed to be problem with an HStoreFile in.META. region. I was unable to select content from the .META. table -- it was returning null rows. In both cases a restart fixed things again. Since all state is out in hdfs and the in-memory maps are made from the hdfs state, something must not be getting updated on compaction/split or flush. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558442#action_12558442 ] Jim Kellerman commented on HADOOP-2587: --- The reason for this is that a region split needs to close the parent region for a bit. However, HRegion.close needs to acquire a number of locks before it can proceed. Because splitRegion was calling RegionListener.closing before calling close, the region would be taken offline before close had acquired any locks. If there were compactions, scanners or updates in progress, these would all need to finish before the region could actually close, resulting in long periods where the region was unavailable. The solution is to have HRegion.close call listener.closing only after all the locks had been acquired and the close was really about to proceed. For consistency HRegion.close should also call listener.closed Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1
[jira] Assigned: (HADOOP-2500) [HBase] Unreadable region kills region servers
[ https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2500: - Assignee: Jim Kellerman [HBase] Unreadable region kills region servers -- Key: HADOOP-2500 URL: https://issues.apache.org/jira/browse/HADOOP-2500 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: CentOS 5 Reporter: Chris Kline Assignee: Jim Kellerman Priority: Critical Backgound: The name node (also a DataNode and RegionServer) in our cluster ran out of disk space. I created some space, restarted HDFS and fsck reported corruption with an HBase file. I cleared up that corruption and restarted HBase. I was still unable to read anything from HBase even though HSFS was now healthy. The following was gather from the log files. When HMaster starts up, it finds a region that is no good (Key: 17_125736271): 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current assignment of spider_pages,17_125736271,1198286140018 is no good HMaster then assigns this region to RegionServer X.60: 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 The RegionServer has trouble reading that region (from the RegionServer log on X.60); Note that the worker thread exits 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting spider_pages,17_125736271,1198286140018/meta (2062710340/meta with reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum sequence id for hstore spider_pages,17_125736271,1198286140018/meta (2062710340/meta) is 4549496 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region spider_pages,17_125736271,1198286140018 java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344) at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697) at org.apache.hadoop.hbase.HStore.init(HStore.java:632) at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: Unhandled exception java.lang.NullPointerException at org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker thread exiting The HMaster then tries to assign the same region to X.60 again and fails. The HMaster tries to assign the region to X.31 with the same result (X.31 worker thread exits). The file it is complaining about, /data/hbase1/hregion_2062710340/oldlogfile.log, is a zero-length file in HDFS. After deleting that file and restarting HBase, HBase appears to be back to normal. One thing I can't figure out is that the HMaster log show several entries after the worker thread on X.60 has exited suggesting that the RegionServer is talking with HMaster: 2007-12-24 09:08:23,349 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 2007-12-24 09:10:29,543 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 There is no corresponding entry in the RegionServer's log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2468) [hbase] TestRegionServerExit failed in Hadoop-Nightly #338
[ https://issues.apache.org/jira/browse/HADOOP-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2468: -- Resolution: Fixed Status: Resolved (was: Patch Available) Resolving issue. Issue not seen in recent builds. [hbase] TestRegionServerExit failed in Hadoop-Nightly #338 -- Key: HADOOP-2468 URL: https://issues.apache.org/jira/browse/HADOOP-2468 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Priority: Minor Fix For: 0.16.0 Attachments: patch.txt TestRegionServerExit failed in Hadoop-Nightly #338 From the logs it appears that the client gave up before the mini hbase cluster could recover from a region server failing. Adjusting the timeout and retry configuration parameters should make this more reliable -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Attachment: patch.txt This patch addresses HADOOP-2587 (this issue), HADOOP-2500 and a newly found issue with TestTimestamp.testTimestamps() which was creating two mini dfs clusters. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Status: Patch Available (was: Open) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2440) [hbase] Provide a HBase checker and repair tool similar to fsck
[ https://issues.apache.org/jira/browse/HADOOP-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2440: -- Fix Version/s: (was: 0.16.0) 0.17.0 Pushing fix out to 0.17 since adding the referential integrity needed to make this tool really work will require another migration tool. [hbase] Provide a HBase checker and repair tool similar to fsck --- Key: HADOOP-2440 URL: https://issues.apache.org/jira/browse/HADOOP-2440 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.17.0 Attachments: patch.txt We need a tool to verify (and repair) HBase much like fsck -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2500) [HBase] Unreadable region kills region servers
[ https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman resolved HADOOP-2500. --- Resolution: Fixed Fix Version/s: 0.16.0 Patch submitted for HADOOP-2587 incorporated fix for this issue. Tests passed. Committed. [HBase] Unreadable region kills region servers -- Key: HADOOP-2500 URL: https://issues.apache.org/jira/browse/HADOOP-2500 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: CentOS 5 Reporter: Chris Kline Assignee: Jim Kellerman Priority: Critical Fix For: 0.16.0 Backgound: The name node (also a DataNode and RegionServer) in our cluster ran out of disk space. I created some space, restarted HDFS and fsck reported corruption with an HBase file. I cleared up that corruption and restarted HBase. I was still unable to read anything from HBase even though HSFS was now healthy. The following was gather from the log files. When HMaster starts up, it finds a region that is no good (Key: 17_125736271): 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current assignment of spider_pages,17_125736271,1198286140018 is no good HMaster then assigns this region to RegionServer X.60: 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 The RegionServer has trouble reading that region (from the RegionServer log on X.60); Note that the worker thread exits 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting spider_pages,17_125736271,1198286140018/meta (2062710340/meta with reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum sequence id for hstore spider_pages,17_125736271,1198286140018/meta (2062710340/meta) is 4549496 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region spider_pages,17_125736271,1198286140018 java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344) at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697) at org.apache.hadoop.hbase.HStore.init(HStore.java:632) at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: Unhandled exception java.lang.NullPointerException at org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker thread exiting The HMaster then tries to assign the same region to X.60 again and fails. The HMaster tries to assign the region to X.31 with the same result (X.31 worker thread exits). The file it is complaining about, /data/hbase1/hregion_2062710340/oldlogfile.log, is a zero-length file in HDFS. After deleting that file and restarting HBase, HBase appears to be back to normal. One thing I can't figure out is that the HMaster log show several entries after the worker thread on X.60 has exited suggesting that the RegionServer is talking with HMaster: 2007-12-24 09:08:23,349 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 2007-12-24 09:10:29,543 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 There is no corresponding entry in the RegionServer's log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Resolution: Fixed Status: Resolved (was: Patch Available) Tests passed. Committed. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add
[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using
[ https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2588: -- Issue Type: Improvement (was: Bug) org.onelab.filter.BloomFilter class uses 8X the memory it should be using - Key: HADOOP-2588 URL: https://issues.apache.org/jira/browse/HADOOP-2588 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Environment: n/a Reporter: Ian Clarke Priority: Trivial The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, however in most Java implementations this will use a byte per bit stored, meaning that 8X the actual used memory is required. This is unfortunate as the whole point of a BloomFilter is to save memory. As a sidebar, the implementation looks a bit shaky in other ways, such as the way hashes are generated from a SHA1 digest in the Filter class, such as the way that it just assumes the digestBytes array will be long enough in the hash() method. I discovered this while looking for a good Bloom Filter implementation to use in my own project. In the end I went ahead and implemented my own, its very simple and pretty elegant (even if I do say so myself ;) - you are welcome to use it: http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reopened HADOOP-2587: --- Times reported for splits are inaccurate. Investigate why other operations are blocked during compaction. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply
[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using
[ https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2588: -- Attachment: patch.txt Replace vector of boolean with BitSet org.onelab.filter.BloomFilter class uses 8X the memory it should be using - Key: HADOOP-2588 URL: https://issues.apache.org/jira/browse/HADOOP-2588 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Environment: n/a Reporter: Ian Clarke Priority: Trivial Fix For: 0.16.0 Attachments: patch.txt The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, however in most Java implementations this will use a byte per bit stored, meaning that 8X the actual used memory is required. This is unfortunate as the whole point of a BloomFilter is to save memory. As a sidebar, the implementation looks a bit shaky in other ways, such as the way hashes are generated from a SHA1 digest in the Filter class, such as the way that it just assumes the digestBytes array will be long enough in the hash() method. I discovered this while looking for a good Bloom Filter implementation to use in my own project. In the end I went ahead and implemented my own, its very simple and pretty elegant (even if I do say so myself ;) - you are welcome to use it: http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using
[ https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2588: -- Fix Version/s: 0.16.0 Affects Version/s: 0.16.0 Status: Patch Available (was: Open) org.onelab.filter.BloomFilter class uses 8X the memory it should be using - Key: HADOOP-2588 URL: https://issues.apache.org/jira/browse/HADOOP-2588 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Environment: n/a Reporter: Ian Clarke Priority: Trivial Fix For: 0.16.0 Attachments: patch.txt The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, however in most Java implementations this will use a byte per bit stored, meaning that 8X the actual used memory is required. This is unfortunate as the whole point of a BloomFilter is to save memory. As a sidebar, the implementation looks a bit shaky in other ways, such as the way hashes are generated from a SHA1 digest in the Filter class, such as the way that it just assumes the digestBytes array will be long enough in the hash() method. I discovered this while looking for a good Bloom Filter implementation to use in my own project. In the end I went ahead and implemented my own, its very simple and pretty elegant (even if I do say so myself ;) - you are welcome to use it: http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558486#action_12558486 ] Jim Kellerman commented on HADOOP-2587: --- Updates prevent: - cache flushes - closing a region (and consequently, splits) - final stage of compaction Scanners prevent - final stage of compaction - closing a region (and consequently, splits) During the final stage of compaction - no new scanners may be created - updates are prohibited Cache flushes prevent - closing a region (and consequently, splits) - updates - rolling the HLog Rolling the HLog prevents - cache flushes - updates A region split must close the old region. Consequently before it can start it must: - wait for any compactions or cache flushes to complete - lock the region to prevent new updates - wait for active scanners to terminate - wait for updates in progress to finish Once a split is in progress, the actual process is quick. However even after the region server reports that the split has completed, clients must wait until the master assigns the new regions and the region server(s) report to the master that the new regions are being served. split, , , log roll, , Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Attachment: patch.txt It turns out that scanners and updates were being locked out for the duration of a compaction due to the order in which locks were taken out. This has been modified. Also other methods that used these locks have had their ordering changed to prevent deadlocks. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Affects Version/s: 0.16.0 Status: Patch Available (was: Reopened) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Status: Open (was: Patch Available) Won't apply anymore. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Attachment: patch.txt New version applies and resolves conflicts. Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA
[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2587: -- Status: Patch Available (was: Open) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt, patch.txt, patch.txt The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply to this email
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Status: Open (was: Patch Available) [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt, patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Attachment: patch.txt new patch starts mini dfs for the two tests that failed [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt, patch.txt, patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Status: Patch Available (was: Open) [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt, patch.txt, patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using
[ https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2588: -- Component/s: (was: util) contrib/hbase Priority: Trivial (was: Minor) org.onelab.filter.BloomFilter class uses 8X the memory it should be using - Key: HADOOP-2588 URL: https://issues.apache.org/jira/browse/HADOOP-2588 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: n/a Reporter: Ian Clarke Priority: Trivial The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, however in most Java implementations this will use a byte per bit stored, meaning that 8X the actual used memory is required. This is unfortunate as the whole point of a BloomFilter is to save memory. As a sidebar, the implementation looks a bit shaky in other ways, such as the way hashes are generated from a SHA1 digest in the Filter class, such as the way that it just assumes the digestBytes array will be long enough in the hash() method. I discovered this while looking for a good Bloom Filter implementation to use in my own project. In the end I went ahead and implemented my own, its very simple and pretty elegant (even if I do say so myself ;) - you are welcome to use it: http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using
[ https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558272#action_12558272 ] Jim Kellerman commented on HADOOP-2588: --- You must be looking at an older version than what is in trunk. The current implementation uses a Jenkins hash rather than SHA-1. You are correct that there is no guarantee how JVMs implement an array of boolean. Perhaps using a java.util.BitSet would be better. org.onelab.filter.BloomFilter class uses 8X the memory it should be using - Key: HADOOP-2588 URL: https://issues.apache.org/jira/browse/HADOOP-2588 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: n/a Reporter: Ian Clarke Priority: Minor The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, however in most Java implementations this will use a byte per bit stored, meaning that 8X the actual used memory is required. This is unfortunate as the whole point of a BloomFilter is to save memory. As a sidebar, the implementation looks a bit shaky in other ways, such as the way hashes are generated from a SHA1 digest in the Filter class, such as the way that it just assumes the digestBytes array will be long enough in the hash() method. I discovered this while looking for a good Bloom Filter implementation to use in my own project. In the end I went ahead and implemented my own, its very simple and pretty elegant (even if I do say so myself ;) - you are welcome to use it: http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Resolution: Fixed Status: Resolved (was: Patch Available) Tests passed. committed. resolving issue [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt, patch.txt, patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart
[ https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2311: -- Priority: Trivial (was: Critical) Dropping priority since this bug has not re-occurred. [hbase] Could not complete hdfs write out to flush file forcing regionserver restart Key: HADOOP-2311 URL: https://issues.apache.org/jira/browse/HADOOP-2311 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Trivial Attachments: delete-logging.patch I've spent some time looking into this issue but there are not enough clues in the logs to tell where the problem is. Here's what I know. Two region servers went down last night, a minute apart, during Paul Saab's 6hr run inserting 300million rows into hbase. The regionservers went down to force rerun of hlog and avoid possible data loss after a failure writing memory flushes to hdfs. Here is the lead up to the failed flush: ... 2007-11-28 22:40:02,231 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no reconstruction log) 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/cookie is 29077708 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no reconstruction log) 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/ip is 29077708 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegion - region postlog,img149/4699/133lm0.jpg,1196318393738 available 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 21357623 to 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no reconstruction log) 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/cookie is 29077708 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no reconstruction log) 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/ip is 29077708 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709 2007-11-28 22:40:04,701 INFO hbase.HRegion - region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region postlog,img143/1310/yashrk3.jpg,1196317258704 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region postlog,img142/8773/1001417zc4.jpg,1196317258703 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region postlog,img149/4699/133lm0.jpg,1196318393738 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, sequenceid=29081563 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore postlog,img149/4699/133lm0.jpg,1196318393738/ip needed. 2007-11
[jira] Commented: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart
[ https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558025#action_12558025 ] Jim Kellerman commented on HADOOP-2311: --- Have we seen any more occurrences of this problem? If not should we close this issue as not reproducable and open a new one if it should happen again? [hbase] Could not complete hdfs write out to flush file forcing regionserver restart Key: HADOOP-2311 URL: https://issues.apache.org/jira/browse/HADOOP-2311 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Critical Attachments: delete-logging.patch I've spent some time looking into this issue but there are not enough clues in the logs to tell where the problem is. Here's what I know. Two region servers went down last night, a minute apart, during Paul Saab's 6hr run inserting 300million rows into hbase. The regionservers went down to force rerun of hlog and avoid possible data loss after a failure writing memory flushes to hdfs. Here is the lead up to the failed flush: ... 2007-11-28 22:40:02,231 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no reconstruction log) 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/cookie is 29077708 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no reconstruction log) 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/ip is 29077708 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegion - region postlog,img149/4699/133lm0.jpg,1196318393738 available 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 21357623 to 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no reconstruction log) 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/cookie is 29077708 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no reconstruction log) 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/ip is 29077708 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709 2007-11-28 22:40:04,701 INFO hbase.HRegion - region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region postlog,img143/1310/yashrk3.jpg,1196317258704 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region postlog,img142/8773/1001417zc4.jpg,1196317258703 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region postlog,img149/4699/133lm0.jpg,1196318393738 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, sequenceid=29081563 2007-11-28 22:41
[jira] Commented: (HADOOP-2394) Add supprt for migrating between hbase versions
[ https://issues.apache.org/jira/browse/HADOOP-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558037#action_12558037 ] Jim Kellerman commented on HADOOP-2394: --- stack wrote: I ain't too invested in our supporting reverse migrations but its worth noting that any migration system worth its salt - systems I've worked on in the past and ruby on rails - go both ways if only to facilitate testing of the forward migration (inevitably there's a bug when you try to migrate real data). That's what backups are for :) More importantly though, HADOOP-2478 incorporates a migration tool. The specifics of what the tool does will have to be rewritten for each upgrade, but I think the framework is good. Add supprt for migrating between hbase versions --- Key: HADOOP-2394 URL: https://issues.apache.org/jira/browse/HADOOP-2394 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Johan Oskarsson If Hbase is to be used to serve data to live systems we would need a way to upgrade both the underlying hadoop installation and hbase to newer versions with minimal downtime. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2500) [HBase] Unreadable region kills region servers
[ https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558032#action_12558032 ] Jim Kellerman commented on HADOOP-2500: --- Bryan Duxbury wrote: At the very least, we should not assign a region to a region server if it is detected as no good. That is an unfortunate wording of a log message in the Master. It is saying that the current assignment of the region is no good because the information it read from the meta region had a server or start code that did not match a known server. It does not mean that the master thinks the region itself is no good. Also, if a RegionServer tries to access a region and it has difficulties, it should report to the master that it can't read the region, and the master should stop trying to serve it. From a more general standpoint, maybe when a bad region is detected, its files should be moved to a different location and generally excluded from the cluster. This would allow you to recover from problems better. Yes, we absolutely need to do something, just not sure exactly what yet. One thing for certain. zero length files should be ignored/deleted. [HBase] Unreadable region kills region servers -- Key: HADOOP-2500 URL: https://issues.apache.org/jira/browse/HADOOP-2500 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: CentOS 5 Reporter: Chris Kline Priority: Critical Backgound: The name node (also a DataNode and RegionServer) in our cluster ran out of disk space. I created some space, restarted HDFS and fsck reported corruption with an HBase file. I cleared up that corruption and restarted HBase. I was still unable to read anything from HBase even though HSFS was now healthy. The following was gather from the log files. When HMaster starts up, it finds a region that is no good (Key: 17_125736271): 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current assignment of spider_pages,17_125736271,1198286140018 is no good HMaster then assigns this region to RegionServer X.60: 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020 The RegionServer has trouble reading that region (from the RegionServer log on X.60); Note that the worker thread exits 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting spider_pages,17_125736271,1198286140018/meta (2062710340/meta with reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum sequence id for hstore spider_pages,17_125736271,1198286140018/meta (2062710340/meta) is 4549496 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region spider_pages,17_125736271,1198286140018 java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344) at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697) at org.apache.hadoop.hbase.HStore.init(HStore.java:632) at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: Unhandled exception java.lang.NullPointerException at org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095) at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217) at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162) at java.lang.Thread.run(Thread.java:619) 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker thread exiting The HMaster then tries to assign the same region to X.60 again and fails. The HMaster tries to assign the region to X.31 with the same result (X.31 worker thread exits). The file it is complaining about, /data/hbase1/hregion_2062710340/oldlogfile.log, is a zero
[jira] Assigned: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins
[ https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman reassigned HADOOP-2587: - Assignee: Jim Kellerman Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins --- Key: HADOOP-2587 URL: https://issues.apache.org/jira/browse/HADOOP-2587 Project: Hadoop Issue Type: Bug Components: contrib/hbase Environment: hadoop subversion 611087 Reporter: Billy Pearson Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: hbase-root-regionserver-PE1750-3.log The below is cut out of one of my region servers logs full log attached What is happening is there is one region on a this region server and its is under heave insert load so compaction are back to back one one finishes a new one starts the problem starts when its time to split the region. A compaction starts just millsecs before the split starts blocking the split but the split closes the region before the compaction is finished. Causing the region to be offline until the compaction is done. Once the compaction is done the split finishes and all is returned to normal but this is a big problem for production if the region is offline for 10-15 mins. The solution would be not to let the split thread to issue the below line while a compaction on that region is happening. 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) The only time I have seen this bug is when there is only one region on a region server because if more then one then the compaction happens to the other region(s) after the first one is done compaction and the split can do what it needs on the first region with out getting blocked. {code} 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 16mins, 10sec 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for HStore webdata,,1200085987488/size needed. 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 1773667150/size needs compaction 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting compaction on region webdata,,1200085987488 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 14 files using /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size for webdata,,1200085987488/size 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started memcache flush for region webdata,,1200085987488. Size 31.2m 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting webdata,,1200085987488 because largest aggregate size is 100.7m and desired size is 64.0m 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: webdata,,1200085987488 closing (Adding to retiringRegions) ... lots of NotServingRegionException's ... 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region webdata,,1200085987488. Took 10mins, 58sec ... 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of webdata,,1200085987488 complete; new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split took 11mins, 0sec 2008-01-11 16:33:02,227 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for .META.. Doing a find... 2008-01-11 16:33:02,283 DEBUG org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: {info:={name: info, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating .META. with region split info 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: Reporting region split to master 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region split, META update, and report to master all successful. Old region=webdata,,1200085987488, new regions: webdata,,1200090121237, webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Fix Version/s: (was: 0.17.0) 0.16.0 Affects Version/s: (was: 0.17.0) 0.16.0 Status: Open (was: Patch Available) Cancelling patch to make new one that will apply to trunk. [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Attachment: patch.txt [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt, patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Status: Patch Available (was: Open) [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt, patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Attachment: patch.txt Although this won't go in until 0.17, let's get Hudson used to running it. He doesn't like most patches. [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
[ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Kellerman updated HADOOP-2478: -- Fix Version/s: (was: 0.16.0) 0.17.0 Affects Version/s: (was: 0.16.0) 0.17.0 Status: Patch Available (was: In Progress) See what Hudson thinks. [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.17.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.17.0 Attachments: patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Hadoop Patch Builds
That rocks dude! Awesome fix! --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Nigel Daley [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 09, 2008 5:08 PM To: hadoop-dev@lucene.apache.org Subject: Hadoop Patch Builds Until now, the order that our Hadoop-Patch build would test the patches has been essentially random. Also, there has been no way to see the list of pending patches. Drum roll These 2 pain points are now fixed. I have created a new Hudson job, Hadoop-Patch-Admin, that does 2 things: a) triggers the Hadoop-Patch build when there are waiting patches; the order that patches are now submitted for testing is FIFO :-) b) exposes the current patch queue To see the queue, go to http://lucene.zones.apache.org:8080/hudson/ job/Hadoop-Patch/ and click on the link QUEUE OF PENDING PATCHES (you may want to bookmark the linked page since it won't change). The Hadoop-Patch-Admin build attempts to run every minute and updates the queue information that you see. The build, however, will get stuck behind any other builds (Hadoop, Lucene, etc) that are currently running so the queue information may not always be completely up-to-date. Hope this helps! Nige PS updated wiki documentation to follow No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.0/1216 - Release Date: 1/9/2008 10:16 AM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.0/1216 - Release Date: 1/9/2008 10:16 AM
RE: [jira] Commented: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system
Stack tells me that code freeze for 0.16 is either late this week or early next. So no refactoring yet. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Bryan Duxbury (JIRA) [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 09, 2008 5:38 PM To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system [ https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atl assian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557514#action_12557514 ] Bryan Duxbury commented on HADOOP-2478: --- If this won't be fixed until 0.17, which is months away, should we apply my HStore refactor patch in the meantime? [hbase] restructure how HBase lays out files in the file system --- Key: HADOOP-2478 URL: https://issues.apache.org/jira/browse/HADOOP-2478 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.17.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.17.0 Attachments: patch.txt Currently HBase has a pretty flat directory structure. For example: {code} /hbase/hregion_70236052/info /hbase/hregion_70236052/info/info/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index {code} All the region directories are under the root directory, and with encoded region names, it is impossible to determine what table a region belongs to. This should be restructured to: {code} /root-directory/table-name/encoded-region-name/column-family/{info,map files} {code} It will be necessary to provide a migration script from current trunk to the new structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.0/1216 - Release Date: 1/9/2008 10:16 AM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.19.0/1216 - Release Date: 1/9/2008 10:16 AM
RE: [hbase] table HRegionServer affinity
-1 on this idea as suggested. Even Google does not distribute DFS or Bigtable across data centers (see the Bigtable paper at http://labs.google.com/papers/bigtable.html ). What the paper does not mention is that they can replicate a table to multiple data centers for business continuity and backup. This is on the road map for HBase but is still quite a way down the road. In addition, we do want to add 'rack awareness' within a data center for fault tolerance and load balancing. This is also not going to happen in the immediate future. We are currently focusing on making what we have more fault tolerant and are starting to work on performance issues. Answers to your two questions inline below. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Andrew Purtell [mailto:[EMAIL PROTECTED] Sent: Monday, January 07, 2008 8:49 PM To: hadoop-dev Subject: [hbase] table HRegionServer affinity Hello, Consider the case of a global federation of Hadoop clusters, with a single global HBase master, divided into a number of geographic regions each with a local DFS, local workload, and region server backed by that DFS. This setup allows for a global HBase space, where any region may retrieve rows stored by any other region -- which is quite useful -- but, in addition to this, it would also be useful to be able to specify constraints on data mobility and also to be able to scope queries to a particular region. To be a bit more specific, I have three things in mind: 1) The ability to fix a given key range to a region. This I assume here you mean geographic region and not an HBase table region. would both assign a range to a given region, and also disable splitting over that range. Aside from API changes, ideally there would be a HBase shell command to support this. 2) Syntactic support in HBase shell for table affinity to a given region server: CREATE TABLE ... REGION=10.10.10.10 (or similar) This would fix an entire table to a region. 3) Query support for scoping the result set based on region server: SELECT ... WHERE @REGION=10.10.10.10 AND ... (or similar) Given the inflexibility of IP or hostnames to name regions, perhaps a mechanism for assigning logical labels to a region server (or even group of region servers, where a prohibition on splitting may be relaxed to allow splitting over the group) would also be useful. As I am still coming up to speed on Hadoop and HBase and the code base, I kindly ask for the answers to two questions. First: How invasive to the HBase master/region model is the concept of specifying constraints on data mobility? It would be very disruptive. The current model is that you run one or more HBase clusters per HDFS cluster. An HBase cluster does not span HDFS clusters. As far as I know HDFS clusters do not span data centers. Latency and network partitioning would be big problems for a system that requires sub-second response times. Second: How difficult would the modifications may be to accomplish? A change such as this would require major changes to the architecture and our vision of the model going forward. (replication between data centers and a single table residing in multiple data centers being served by separate HBase instances running on separate HDFS clusters). I believe these questions to be related. :-) Thanks, Andrew Purtell Advanced Threats Research Trend Micro, Inc., Pasadena, CA, USA (personal mail) __ __ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release Date: 1/7/2008 9:14 AM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release Date: 1/7/2008 9:14 AM
RE: [jira] Commented: (HADOOP-2405) [hbase] Merge region tool exposed in shell and/or in UI
Google does dynamic splitting and merging of regions to deal with hot spots. They had to be careful that they did not oscilate between splitting and merging when the load pattern changed. Right now, manual merges are ok because we only do splits when regions grow and the only reason to merge is if many rows are deleted. When we get to doing more sophisticated load balancing, we will want the capability of both splitting and merging based on load. -Original Message- From: Bryan Duxbury (JIRA) [mailto:[EMAIL PROTECTED] Sent: Monday, January 07, 2008 1:10 PM To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-2405) [hbase] Merge region tool exposed in shell and/or in UI [ https://issues.apache.org/jira/browse/HADOOP-2405?page=com.atl assian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId= 12556694#action_12556694 ] Bryan Duxbury commented on HADOOP-2405: --- So, you envision the merge operation to not only require manual triggering but to require manual targeting? Shouldn't the point of merging regions be to maintain the equilibrium of size of regions? Under what circumstances will you have to manually intervene to keep regions appropriately sized? It seems like this should really only happen after a substantial number of deletions has occurred, right? That would cause a compacted region to shrink below a healthy size, and if it could be merged with a neighbor, it would be nice. This logic should be built in and automatic, otherwise it would require constant monitoring of region sizes by an administrator. Other than this sort of automatic merging, when would you want to manually merge two regions? Doesn't that expose a somewhat dangerous amount of functionality to the end user? [hbase] Merge region tool exposed in shell and/or in UI --- Key: HADOOP-2405 URL: https://issues.apache.org/jira/browse/HADOOP-2405 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: stack Priority: Minor hbase has support for merging regions. Expose a merge trigger in the shell or in the UI (Can only merge adjacent features so perhaps only makes sense in UI in the regionserver UI). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release Date: 1/7/2008 9:14 AM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release Date: 1/7/2008 9:14 AM