RE: [hbase] Suggestions on hbase APIs.

2008-01-21 Thread Jim Kellerman
 -Original Message-
 From: Mafish Liu [mailto:[EMAIL PROTECTED]
 Sent: Monday, January 21, 2008 12:23 AM
 To: hadoop-dev@lucene.apache.org
 Subject: [hbase] Suggestions on hbase APIs.

 Hi:
 I'm recently using hbase (included in hadoop 0.15.2
 release)to manage spatial data.
 And found two flaws which I think can be improved.

 First, if you fetch the column names in a hbase table using 
  Set Text columns = tableDes.families().keySet(); 
 You can get a set of column names that ended by a colon,
 which I think should be gotten rid of.

The name that ends with a colon is the name of the column family,
and you can create multiple family members in an adhoc fashion.

For example say you have a column named 'meta:' in which you
store data about web pages. You can create multiple family members
in the same row such as 'meta:mime-type', 'meta:crawl-date',
'meta:encoding', etc.

Example:

HTable table = new HTable(conf, tableName);
long id = table.startUpdate(row);
// enter data in column meta:
table.put(id, new Text(meta:mime-type), data);
table.put(id, new Text(meta:crawl-date), data);
table.put(id, new Text(meta:encoding), data);
// enter data in column contents:
table.put(id, new Text(contents:), data);
table.commit(id);

 Second, if you read all contains in a hbase table by
 HScannerInterface.next method, you will ge a TreeMapText,
 byte[] every time you call. Returning column names every
 time is a waste  of memory and network bandwidth.
 And there should be an efficient way to do such work.

Well, you can retrieve multiple columns with a scanner,
so if the column name was not passed back, how would
you determine which column goes with which data. Scanning
the table in the example above:

HScannerInterface scanner = table.obtainScanner(
  new Text[] {new Text(contents:), new Text(meta)},
  new Text()); // empty start row = start at beginning

now when you do scanner.next you need the map to
find the value for contents: and the (multiple)
values for meta:.

 The above two APIs are used in my program and also in Hbase
 shell program.
 I don't know if there are alternative APIs that have
 performed the improvements.

 Best regards.
 Mafish
 --
 [EMAIL PROTECTED]
 Institute of Computing Technology, Chinese Academy of
 Sciences, Beijing.

 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.5.516 / Virus Database: 269.19.8/1235 - Release
 Date: 1/21/2008 9:39 AM



No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.8/1235 - Release Date: 1/21/2008 9:39 
AM



[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-21 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2668:
--

Status: Open  (was: Patch Available)

It appears that hudson lost this patch when it went down. Resubmitting.

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch, patch.txt


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-21 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2668:
--

Status: Patch Available  (was: Open)

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch, patch.txt


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-20 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2668:
-

Assignee: Jim Kellerman  (was: stack)

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-20 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560893#action_12560893
 ] 

Jim Kellerman commented on HADOOP-2668:
---

Ok, there is definitely some work to do here. I'll work on fixing Migrate.

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-20 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2668:
--

Affects Version/s: 0.16.0
   Status: Patch Available  (was: Open)

Works locally, try hudson. - Stack, please review patch.

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch, patch.txt


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-20 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2668:
--

Attachment: patch.txt

Lots more checking, clean up several bugs, new read-only mode, usage, etc.

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch, patch.txt


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2643) [hbase] Make migration tool smarter.

2008-01-19 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2643:
--

   Resolution: Fixed
Fix Version/s: 0.16.0
   Status: Resolved  (was: Patch Available)

Committed. Ignoring one unrelated core test failure.

 [hbase] Make migration tool smarter.
 

 Key: HADOOP-2643
 URL: https://issues.apache.org/jira/browse/HADOOP-2643
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt


 The migration tool that handles the changes to how hbase lays out files in 
 the file system needs to be smarter.
 - don't try to migrate old region directories in which the region name is a 
 part of the directory name.
 - add a version number to the file system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version

2008-01-19 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560748#action_12560748
 ] 

Jim Kellerman commented on HADOOP-2668:
---

If you run the migrate tool as the exception suggested, it will write the 
version file and then the system will start.

 [hbase] After 2643, cluster won't start if FS was created by an older hbase 
 version
 ---

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version

2008-01-19 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560753#action_12560753
 ] 

Jim Kellerman commented on HADOOP-2668:
---

 It didn't occur to me that migration was the way to fix the missing version 
 file.

From HMaster.java(894, 5):

{code}
throw new IOException(
file system not correct version. Run hbase.util.Migrate);
{code}

 I also figured we should just auto-migrate this one case of a missing version 
 file (If in future, 
 version file goes missing, I'd think it the job of hbsfck recreating it, 
 rather than migration?).

Suppose you have a file system that has not been migrated? (i.e. regions are 
stored in
/hbase/hregion_nnn) The master would start up write the version file and 
then 
proceed to recreate the root and meta regions because they aren't under
/hbase/-ROOT- and /hbase/.META. respectively.

Additionally the first thing the migrate tool does is look for the version 
file. If it finds it and
the version number matches, it figures that the file system has been upgraded 
already
and does nothing.

 But I'm fine w/ forcing users to run the migration. It needs to be better 
 documented and added 
 to the bin/hbase script with verb 'migrate' I'd say.

Agreed. How about this changing this patch to update bin/hbase and add 
documentation
(where ?)?

 I tried to run the migration but it wants to connect to a HMaster. That ain't 
 going to work (Cluster
  won't start because no version file... can't migrate because cluster ain't 
 up...).

It tries to connect to the master to ensure it isn't running (uses 
HBaseAdmin.isMasterRunning())
We wouldn't want to do a upgrade with the cluster running.


 [hbase] After 2643, cluster won't start if FS was created by an older hbase 
 version
 ---

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version

2008-01-19 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560753#action_12560753
 ] 

jimk edited comment on HADOOP-2668 at 1/19/08 5:05 PM:


 It didn't occur to me that migration was the way to fix the missing version 
 file.

From HMaster.java(894, 5):

{code}
throw new IOException(
file system not correct version. Run hbase.util.Migrate);
{code}

 I also figured we should just auto-migrate this one case of a missing version 
 file (If in future, 
 version file goes missing, I'd think it the job of hbsfck recreating it, 
 rather than migration?).

Suppose you have a file system that has not been migrated? (i.e. regions are 
stored in
=/hbase/hregion_nnn=) The master would start up write the version file and 
then 
proceed to recreate the root and meta regions because they aren't under
=/hbase/-ROOT-= and =/hbase/.META.= respectively.

Additionally the first thing the migrate tool does is look for the version 
file. If it finds it and
the version number matches, it figures that the file system has been upgraded 
already
and does nothing.

 But I'm fine w/ forcing users to run the migration. It needs to be better 
 documented and added 
 to the bin/hbase script with verb 'migrate' I'd say.

Agreed. How about this changing this patch to update bin/hbase and add 
documentation
(where ?)?

 I tried to run the migration but it wants to connect to a HMaster. That ain't 
 going to work (Cluster
  won't start because no version file... can't migrate because cluster ain't 
 up...).

It tries to connect to the master to ensure it isn't running (uses 
HBaseAdmin.isMasterRunning())
We wouldn't want to do a upgrade with the cluster running.


  was (Author: jimk):
 It didn't occur to me that migration was the way to fix the missing 
version file.

From HMaster.java(894, 5):

{code}
throw new IOException(
file system not correct version. Run hbase.util.Migrate);
{code}

 I also figured we should just auto-migrate this one case of a missing version 
 file (If in future, 
 version file goes missing, I'd think it the job of hbsfck recreating it, 
 rather than migration?).

Suppose you have a file system that has not been migrated? (i.e. regions are 
stored in
/hbase/hregion_nnn) The master would start up write the version file and 
then 
proceed to recreate the root and meta regions because they aren't under
/hbase/-ROOT- and /hbase/.META. respectively.

Additionally the first thing the migrate tool does is look for the version 
file. If it finds it and
the version number matches, it figures that the file system has been upgraded 
already
and does nothing.

 But I'm fine w/ forcing users to run the migration. It needs to be better 
 documented and added 
 to the bin/hbase script with verb 'migrate' I'd say.

Agreed. How about this changing this patch to update bin/hbase and add 
documentation
(where ?)?

 I tried to run the migration but it wants to connect to a HMaster. That ain't 
 going to work (Cluster
  won't start because no version file... can't migrate because cluster ain't 
 up...).

It tries to connect to the master to ensure it isn't running (uses 
HBaseAdmin.isMasterRunning())
We wouldn't want to do a upgrade with the cluster running.

  
 [hbase] After 2643, cluster won't start if FS was created by an older hbase 
 version
 ---

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version

2008-01-19 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560753#action_12560753
 ] 

jimk edited comment on HADOOP-2668 at 1/19/08 5:06 PM:


 It didn't occur to me that migration was the way to fix the missing version 
 file.

From HMaster.java(894, 5):

{code}
throw new IOException(
file system not correct version. Run hbase.util.Migrate);
{code}

 I also figured we should just auto-migrate this one case of a missing version 
 file (If in future, 
 version file goes missing, I'd think it the job of hbsfck recreating it, 
 rather than migration?).

Suppose you have a file system that has not been migrated? (i.e. regions are 
stored in
{code}/hbase/hregion_nnn{code}) The master would start up write the version 
file and then 
proceed to recreate the root and meta regions because they aren't under
{code}/hbase/-ROOT-{code} and {code}/hbase/.META.{code} respectively.

Additionally the first thing the migrate tool does is look for the version 
file. If it finds it and
the version number matches, it figures that the file system has been upgraded 
already
and does nothing.

 But I'm fine w/ forcing users to run the migration. It needs to be better 
 documented and added 
 to the bin/hbase script with verb 'migrate' I'd say.

Agreed. How about this changing this patch to update bin/hbase and add 
documentation
(where ?)?

 I tried to run the migration but it wants to connect to a HMaster. That ain't 
 going to work (Cluster
  won't start because no version file... can't migrate because cluster ain't 
 up...).

It tries to connect to the master to ensure it isn't running (uses 
HBaseAdmin.isMasterRunning())
We wouldn't want to do a upgrade with the cluster running.


  was (Author: jimk):
 It didn't occur to me that migration was the way to fix the missing 
version file.

From HMaster.java(894, 5):

{code}
throw new IOException(
file system not correct version. Run hbase.util.Migrate);
{code}

 I also figured we should just auto-migrate this one case of a missing version 
 file (If in future, 
 version file goes missing, I'd think it the job of hbsfck recreating it, 
 rather than migration?).

Suppose you have a file system that has not been migrated? (i.e. regions are 
stored in
=/hbase/hregion_nnn=) The master would start up write the version file and 
then 
proceed to recreate the root and meta regions because they aren't under
=/hbase/-ROOT-= and =/hbase/.META.= respectively.

Additionally the first thing the migrate tool does is look for the version 
file. If it finds it and
the version number matches, it figures that the file system has been upgraded 
already
and does nothing.

 But I'm fine w/ forcing users to run the migration. It needs to be better 
 documented and added 
 to the bin/hbase script with verb 'migrate' I'd say.

Agreed. How about this changing this patch to update bin/hbase and add 
documentation
(where ?)?

 I tried to run the migration but it wants to connect to a HMaster. That ain't 
 going to work (Cluster
  won't start because no version file... can't migrate because cluster ain't 
 up...).

It tries to connect to the master to ensure it isn't running (uses 
HBaseAdmin.isMasterRunning())
We wouldn't want to do a upgrade with the cluster running.

  
 [hbase] After 2643, cluster won't start if FS was created by an older hbase 
 version
 ---

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2643) [hbase] Make migration tool smarter.

2008-01-18 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2643:
--

Attachment: patch.txt

 [hbase] Make migration tool smarter.
 

 Key: HADOOP-2643
 URL: https://issues.apache.org/jira/browse/HADOOP-2643
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Attachments: patch.txt


 The migration tool that handles the changes to how hbase lays out files in 
 the file system needs to be smarter.
 - don't try to migrate old region directories in which the region name is a 
 part of the directory name.
 - add a version number to the file system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown

2008-01-18 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2525:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Tests passed. Committed

 Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
 ---

 Key: HADOOP-2525
 URL: https://issues.apache.org/jira/browse/HADOOP-2525
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
 Environment: CentOS 5
Reporter: Chris Kline
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.16.0

 Attachments: patch.txt


 Background: We ran out of disk space on HMaster before this issue occurred.  
 The sequence of events were:
 1.  Ran out of disk space
 2.  Freed up 10 GB of disk space
 3.  Shut down HBase
 We had the following 2 lines repeated over 11 million times in the span of 10 
 minutes:
 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process 
 shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, 
 rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main 
 processing loop: ProcessServerShutdown of 10.100.11.64:60020

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-2616) hbase not spliting when the total size of region reaches max region size * 1.5

2008-01-18 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman resolved HADOOP-2616.
---

   Resolution: Fixed
Fix Version/s: (was: 0.17.0)
   0.16.0

Clarified documentation. Committed with changes for HADOOP-2525

 hbase not spliting when the total size of region reaches max region size * 1.5
 --

 Key: HADOOP-2616
 URL: https://issues.apache.org/jira/browse/HADOOP-2616
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Billy Pearson
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.16.0


 right now a region may get larger then the max size set in the conf 
 HRegion.needsSplit
 Checks the largest column to see if its larger then max region size * 1.5 and 
 then desides to split or not 
 But  if we have more then one column the region could be vary large
 example
 Say we have 10 columns all about the same size lets say 40MB and the max file 
 size is 64MB we would not split even thought the region size is 400MB well 
 over the 96MB needed to trip a split to happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2636) [hbase] Make flusher less dumb

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2636:
-

Assignee: Jim Kellerman

 [hbase] Make flusher less dumb
 --

 Key: HADOOP-2636
 URL: https://issues.apache.org/jira/browse/HADOOP-2636
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman
Priority: Minor

 When flusher runs -- its triggered when the sum of all Stores in a Region  a 
 configurable max size -- we flush all Stores though a Store memcache might 
 have but a few bytes.
 I would think Stores should only dump their memcache disk if they have some 
 substance.
 The problem becomes more acute, the more families you have in a Region.
 Possible behaviors would be to dump the biggest Store only, or only those 
 Stores  50% of max memcache size.  Behavior would vary dependent on the 
 prompt that provoked the flush.  Would also log why the flush is running: 
 optional or  max size.
 This issue comes out of HADOOP-2621.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2636) [hbase] Make flusher less dumb

2008-01-17 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559938#action_12559938
 ] 

Jim Kellerman commented on HADOOP-2636:
---

Better yet, move triggering of cache flush to the store level instead of the 
region level. Same for compactions.

Split still has to happen at the region level because it is the region that 
embodies the concept of row range. However the split could be triggered by a 
single store reaching the split threshold.

 [hbase] Make flusher less dumb
 --

 Key: HADOOP-2636
 URL: https://issues.apache.org/jira/browse/HADOOP-2636
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Priority: Minor

 When flusher runs -- its triggered when the sum of all Stores in a Region  a 
 configurable max size -- we flush all Stores though a Store memcache might 
 have but a few bytes.
 I would think Stores should only dump their memcache disk if they have some 
 substance.
 The problem becomes more acute, the more families you have in a Region.
 Possible behaviors would be to dump the biggest Store only, or only those 
 Stores  50% of max memcache size.  Behavior would vary dependent on the 
 prompt that provoked the flush.  Would also log why the flush is running: 
 optional or  max size.
 This issue comes out of HADOOP-2621.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2496) Snapshot of table

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2496:
--

Issue Type: New Feature  (was: Bug)

 Snapshot of table
 -

 Key: HADOOP-2496
 URL: https://issues.apache.org/jira/browse/HADOOP-2496
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Billy Pearson
Priority: Minor
 Fix For: 0.17.0


 Havening an option to take a snapshot of a table would be vary useful in 
 production.
 What I would like to see this option do is do a merge of all the data into 
 one or more files stored in the same folder on the dfs. This way we could 
 save data in case of a software bug in hadoop or user code. 
 The other advantage would be to be able to export a table to multi locations. 
 Say I had a read_only table that must be online. I could take a snapshot of 
 it when needed and export it to a separate data center and have it loaded 
 there and then i would have it online at multi data centers for load 
 balancing and failover.
 I understand that hadoop takes the need out of havening backup to protect 
 from failed servers, but this does not protect use from software bugs that 
 might delete or alter data in ways we did not plan. We should have a way we 
 can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2619) Compaction errors after a region splits

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2619:
-

Assignee: stack

 Compaction errors after a region splits
 ---

 Key: HADOOP-2619
 URL: https://issues.apache.org/jira/browse/HADOOP-2619
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop snv 612165
Reporter: Billy Pearson
Assignee: stack
 Fix For: 0.16.0

 Attachments: compactiondir-v4.patch, 
 hbase-root-regionserver-PE1750-4.log


 I am getting compaction errors from regions after they split not all of them 
 have this problem but some do
 I attached a log I picked out one region 
 webdata,com.technorati/tag/potiron:http,1200430376177
 it is loaded then splits at 
 2008-01-15 14:52:56,116
 the split is finshed at
 2008-01-15 14:53:01,653
 the first compaction for the new top half region starts at
 2008-01-15 14:54:07,612 - 
 webdata,com.technorati/tag/potiron:http,1200430376177
 and ends successful at
 2008-01-15 14:54:30,229
 ten the next compaction starts at
 2008-01-15 14:56:16,315
 This one ends with an error at 
 2008-01-15 14:56:40,246
 {code}
 2008-01-15 14:57:53,002 ERROR org.apache.hadoop.hbase.HRegionServer: 
 Compaction failed for region 
 webdata,com.technorati/tag/potiron:http,1200430376177
 org.apache.hadoop.dfs.LeaseExpiredException: 
 org.apache.hadoop.dfs.LeaseExpiredException: No lease on 
 /gfs_storage/hadoop-root/hbase/webdata/compaction.dir/1438658724/in_rank/mapfiles/8222904438849251562/data
   at org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1123)
   at 
 org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1061)
   at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:303)
   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:908)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
   at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
   at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)
   at 
 org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:418)
 {code}
 and all other compaction's for this region fail after this one fail with the 
 same error I will have to keep testing to see if it ever finishes 
 successfully. 
 maybe after a restart it will successfully finsh a compaction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2624) [hbase] memory management

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2624:
-

Assignee: Jim Kellerman

 [hbase] memory management
 -

 Key: HADOOP-2624
 URL: https://issues.apache.org/jira/browse/HADOOP-2624
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman

 Each Store has a Memcache of edits that is flushed on a fixed period (It used 
 to be flushed when it grew beyond a limit). A Region can be made up of N 
 Stores.  A regionserver has no upper bound on the number of regions that can 
 be deployed to it currently.  Add to this that per mapfile, we have read the 
 index into memory.  We're also talking about adding caching of blocks and 
 cells.
 We need a means of keeping an account of memory usage adjusting cache sizes 
 and flush rates (or sizes) dynamically -- using References where possible -- 
 to accomodate deployment of added regions.  If memory is strained, we should 
 reject regions proffered by the master with a resouce-constrained, or some 
 such, message.
 The manual sizing we currently do ain't going to cut it for clusters of any 
 decent size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2525:
-

Assignee: Jim Kellerman  (was: stack)

 Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
 ---

 Key: HADOOP-2525
 URL: https://issues.apache.org/jira/browse/HADOOP-2525
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
 Environment: CentOS 5
Reporter: Chris Kline
Assignee: Jim Kellerman
Priority: Minor

 Background: We ran out of disk space on HMaster before this issue occurred.  
 The sequence of events were:
 1.  Ran out of disk space
 2.  Freed up 10 GB of disk space
 3.  Shut down HBase
 We had the following 2 lines repeated over 11 million times in the span of 10 
 minutes:
 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process 
 shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, 
 rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main 
 processing loop: ProcessServerShutdown of 10.100.11.64:60020

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2615) Add max number of mapfiles to compact at one time giveing us a minor major compaction

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2615:
-

Assignee: Jim Kellerman

 Add max number of mapfiles to compact at one time giveing us a minor  major 
 compaction
 ---

 Key: HADOOP-2615
 URL: https://issues.apache.org/jira/browse/HADOOP-2615
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Billy Pearson
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.17.0


 Currently we do compaction on a region when the 
 hbase.hstore.compactionThreshold is reached - default 3
 I thank we should configure a max number of mapfiles to compact at one time 
 simulator to doing a minor compaction in bigtable. This keep compaction's 
 form getting tied up in one region to long letting other regions get way to 
 many memcache flushes making compaction take longer and longer for each region
 If we did that when a regions updates start to slack off the max number will 
 eventuly include all mapfiles causeing a major compaction on that region. 
 Unlike big table this would leave the master out of the process and letting 
 the region server handle the major compaction when it has time.
 When doing a minor compaction on a few files I thank we should compact the 
 newest mapfiles first leave the larger/older ones for when we have low 
 updates to a region.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2616) hbase not spliting when the total size of region reaches max region size * 1.5

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2616:
-

Assignee: Jim Kellerman

 hbase not spliting when the total size of region reaches max region size * 1.5
 --

 Key: HADOOP-2616
 URL: https://issues.apache.org/jira/browse/HADOOP-2616
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Billy Pearson
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.17.0


 right now a region may get larger then the max size set in the conf 
 HRegion.needsSplit
 Checks the largest column to see if its larger then max region size * 1.5 and 
 then desides to split or not 
 But  if we have more then one column the region could be vary large
 example
 Say we have 10 columns all about the same size lets say 40MB and the max file 
 size is 64MB we would not split even thought the region size is 400MB well 
 over the 96MB needed to trip a split to happen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2621) Memcache flush flushing every 60 secs with out considering the max memcache size

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2621:
-

Assignee: stack

 Memcache flush flushing every 60 secs with out considering the max memcache 
 size
 

 Key: HADOOP-2621
 URL: https://issues.apache.org/jira/browse/HADOOP-2621
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Billy Pearson
Assignee: stack
 Fix For: 0.16.0

 Attachments: optionalcacheflushinterval.patch


 looks like hbase is flushing all memcache to disk every 60 secs causing a lot 
 of work for the compactor to keep up because column gets its own mapfile and 
 every region is flushed at one time. This could be a vary large number of 
 mapfiles to write if a region server is hosting 100 regions all with milti 
 columns.
 Idea memcache flush
 keep all data in memory until memcache get larger then the conf size with 
 hbase.hregion.memcache.flush.size.
 When we reach this size we should flush the regions that are the largest 
 first stopping once we drop back below the memcache max size maybe 20% below 
 the max. This will to flush only as needed as each flush takes time to 
 compact when compaction runs on a region. while we are flushing a region we 
 should also be blocking new updates from happening on that region so the 
 region server does not get over ran when a high update load hits a region 
 server. By only blocking on the region we are flushing at that time other 
 regions will still be able to do updates this.
 We we still want to use the hbase.regionserver.optionalcacheflushinterval we 
 should set to to run once an hour so something like that so we can recover 
 memory from the memcache on region that do not have a lot updates in memory. 
 But running at the default set now of 60 secs is not so good for the 
 compactor if it has many regions to handle also not good for a scanner to 
 have to scan many small files vs a few larger ones
 Example a compactor may take 15 mins to compact a region in that time we will 
 flush 15 times causeing all other regions to get a new mapfile to compact 
 when it becomes it turn to get compacted if you had many regions getting 
 compacted the last one on the list of say 10 regions would have 10 regions * 
 15 mins each = 150 mapfiles for each column in the last region written before 
 the compactor can get to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HADOOP-2643) [hbase] Make migration tool smarter.

2008-01-17 Thread Jim Kellerman (JIRA)
[hbase] Make migration tool smarter.


 Key: HADOOP-2643
 URL: https://issues.apache.org/jira/browse/HADOOP-2643
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman


The migration tool that handles the changes to how hbase lays out files in the 
file system needs to be smarter.
- don't try to migrate old region directories in which the region name is a 
part of the directory name.
- add a version number to the file system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-1398) Add in-memory caching of data

2008-01-17 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560099#action_12560099
 ] 

Jim Kellerman commented on HADOOP-1398:
---

Tom,

Yes, we need to start versioning everything that goes out to disk. And if we 
make an incompatible change, we either need to correct for it on the fly or 
augment the migration tool (hbase.util.Migrate.java)


 Add in-memory caching of data
 -

 Key: HADOOP-1398
 URL: https://issues.apache.org/jira/browse/HADOOP-1398
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Jim Kellerman
Priority: Trivial
 Attachments: hadoop-blockcache-v2.patch, hadoop-blockcache.patch


 Bigtable provides two in-memory caches: one for row/column data and one for 
 disk block caches.
 The size of each cache should be configurable, data should be loaded lazily, 
 and the cache managed by an LRU mechanism.
 One complication of the block cache is that all data is read through a 
 SequenceFile.Reader which ultimately reads data off of disk via a RPC proxy 
 for ClientProtocol. This would imply that the block caching would have to be 
 pushed down to either the DFSClient or SequenceFile.Reader

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2334:
-

Assignee: (was: Jim Kellerman)

 [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
 --

 Key: HADOOP-2334
 URL: https://issues.apache.org/jira/browse/HADOOP-2334
 Project: Hadoop
  Issue Type: Wish
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Priority: Minor
 Fix For: 0.16.0


 I have heard from several people that row keys in HBase should be less 
 restricted than hadoop.io.Text.
 What do you think?
 At the very least, a row key has to be a WritableComparable. This would lead 
 to the most general case being either hadoop.io.BytesWritable or 
 hbase.io.ImmutableBytesWritable. The primary difference between these two 
 classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if 
 you do not pay attention to the length, (BytesWritable.getSize()), converting 
 a String to a BytesWritable and vice versa can become problematic. 
 hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as 
 you pass in and then does not allow the size to be changed.
 If we were to change from Text to a non-text key, my preference would be for 
 ImmutableBytesWritable, because it has a fixed size once set, and operations 
 like get, etc do not have to something like System.arrayCopy where you 
 specify the number of bytes to copy.
 Your comments, questions are welcome on this issue. If we receive enough 
 feedback that Text is too restrictive, we are willing to change it, but we 
 need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2525:
--

Fix Version/s: 0.16.0
   Status: Patch Available  (was: Open)

 Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
 ---

 Key: HADOOP-2525
 URL: https://issues.apache.org/jira/browse/HADOOP-2525
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
 Environment: CentOS 5
Reporter: Chris Kline
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.16.0

 Attachments: patch.txt


 Background: We ran out of disk space on HMaster before this issue occurred.  
 The sequence of events were:
 1.  Ran out of disk space
 2.  Freed up 10 GB of disk space
 3.  Shut down HBase
 We had the following 2 lines repeated over 11 million times in the span of 10 
 minutes:
 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process 
 shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, 
 rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main 
 processing loop: ProcessServerShutdown of 10.100.11.64:60020

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2525:
--

Attachment: patch.txt

 Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
 ---

 Key: HADOOP-2525
 URL: https://issues.apache.org/jira/browse/HADOOP-2525
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
 Environment: CentOS 5
Reporter: Chris Kline
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.16.0

 Attachments: patch.txt


 Background: We ran out of disk space on HMaster before this issue occurred.  
 The sequence of events were:
 1.  Ran out of disk space
 2.  Freed up 10 GB of disk space
 3.  Shut down HBase
 We had the following 2 lines repeated over 11 million times in the span of 10 
 minutes:
 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process 
 shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, 
 rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main 
 processing loop: ProcessServerShutdown of 10.100.11.64:60020

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2624) [hbase] memory management

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2624:
-

Assignee: (was: Jim Kellerman)

 [hbase] memory management
 -

 Key: HADOOP-2624
 URL: https://issues.apache.org/jira/browse/HADOOP-2624
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack

 Each Store has a Memcache of edits that is flushed on a fixed period (It used 
 to be flushed when it grew beyond a limit). A Region can be made up of N 
 Stores.  A regionserver has no upper bound on the number of regions that can 
 be deployed to it currently.  Add to this that per mapfile, we have read the 
 index into memory.  We're also talking about adding caching of blocks and 
 cells.
 We need a means of keeping an account of memory usage adjusting cache sizes 
 and flush rates (or sizes) dynamically -- using References where possible -- 
 to accomodate deployment of added regions.  If memory is strained, we should 
 reject regions proffered by the master with a resouce-constrained, or some 
 such, message.
 The manual sizing we currently do ain't going to cut it for clusters of any 
 decent size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2039) [hbase] When a get or scan request spans multiple columns, execute the reads in parallel

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2039:
-

Assignee: (was: Jim Kellerman)

 [hbase] When a get or scan request spans multiple columns, execute the reads 
 in parallel
 

 Key: HADOOP-2039
 URL: https://issues.apache.org/jira/browse/HADOOP-2039
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Priority: Trivial
 Fix For: 0.16.0


 When a get or scan request spans multiple columns, execute the reads in 
 parallel and use a CountDownLatch to wait for them to complete before 
 returning the results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2364:
-

Assignee: Jim Kellerman

 when hbase regionserver restarts, it says impossible state for createLease()
 --

 Key: HADOOP-2364
 URL: https://issues.apache.org/jira/browse/HADOOP-2364
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Assignee: Jim Kellerman
Priority: Minor

 I restarted a regionserver, and got this error in its logs:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.AssertionError: Impossible state for createLease(): Lease 
 -435227488/-435227488 is still held.
 at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145)
 at 
 org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278
 )
 at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 at org.apache.hadoop.ipc.Client.call(Client.java:482)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
 at $Proxy0.regionServerStartup(Unknown Source)
 at 
 org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav
 a:1025)
 at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659)
 at java.lang.Thread.run(Unknown Source)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown

2008-01-17 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560143#action_12560143
 ] 

jimk edited comment on HADOOP-2525 at 1/17/08 3:14 PM:


 Otherwise patch looks good. How you think it fixes the issue?

The crux of the patch is the following change:

{code}
-  for (RegionServerOperation op = null; !closed.get(); ) {
+  while (!closed.get()) {
+RegionServerOperation op = null;
{code}

the old code only declared and nulled out 'op' for the first iteration. If op 
was set non-null and went back to top of loop, it would fall through and just 
re-execute op again, rather than polling the queues and waiting.


  was (Author: jimk):
 Otherwise patch looks good. How you think it fixes the issue?

The crux of the patch is the following change:

-  for (RegionServerOperation op = null; !closed.get(); ) {
+  while (!closed.get()) {
+RegionServerOperation op = null;

the old code only declared and nulled out 'op' for the first iteration. If op 
was set non-null and went back to top of loop, it would fall through and just 
re-execute op again, rather than polling the queues and waiting.

  
 Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
 ---

 Key: HADOOP-2525
 URL: https://issues.apache.org/jira/browse/HADOOP-2525
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
 Environment: CentOS 5
Reporter: Chris Kline
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.16.0

 Attachments: patch.txt


 Background: We ran out of disk space on HMaster before this issue occurred.  
 The sequence of events were:
 1.  Ran out of disk space
 2.  Freed up 10 GB of disk space
 3.  Shut down HBase
 We had the following 2 lines repeated over 11 million times in the span of 10 
 minutes:
 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process 
 shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, 
 rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main 
 processing loop: ProcessServerShutdown of 10.100.11.64:60020

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2525) Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown

2008-01-17 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560143#action_12560143
 ] 

Jim Kellerman commented on HADOOP-2525:
---

 Otherwise patch looks good. How you think it fixes the issue?

The crux of the patch is the following change:

-  for (RegionServerOperation op = null; !closed.get(); ) {
+  while (!closed.get()) {
+RegionServerOperation op = null;

the old code only declared and nulled out 'op' for the first iteration. If op 
was set non-null and went back to top of loop, it would fall through and just 
re-execute op again, rather than polling the queues and waiting.


 Same 2 lines repeated 11 million times in HMaster log upon HMaster shutdown
 ---

 Key: HADOOP-2525
 URL: https://issues.apache.org/jira/browse/HADOOP-2525
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
 Environment: CentOS 5
Reporter: Chris Kline
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.16.0

 Attachments: patch.txt


 Background: We ran out of disk space on HMaster before this issue occurred.  
 The sequence of events were:
 1.  Ran out of disk space
 2.  Freed up 10 GB of disk space
 3.  Shut down HBase
 We had the following 2 lines repeated over 11 million times in the span of 10 
 minutes:
 2007-12-24 08:50:41,851 INFO org.apache.hadoop.hbase.HMaster: process 
 shutdown of server 10.100.11.64:60020: logSplit: true, rootChecked: false, 
 rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 0
 2007-12-24 08:50:43,980 DEBUG org.apache.hadoop.hbase.HMaster: Main 
 processing loop: ProcessServerShutdown of 10.100.11.64:60020

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-2651) [Hbase] Caching for read performance

2008-01-17 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman resolved HADOOP-2651.
---

Resolution: Duplicate

 [Hbase] Caching for read performance
 

 Key: HADOOP-2651
 URL: https://issues.apache.org/jira/browse/HADOOP-2651
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Edward Yoon
Assignee: Edward Yoon

 * Use two level of caching to improve read performance
 * Scan cache
 ** Higher-level cache
 *** Caches the K,V pairs returned by the SSTable(HStore?) interface to the 
 region server code
 ** Most useful for applications that tend to read the same data repeatedly
 * Block cache
 ** Lower-level cache
 *** Caches SSTables blocks that were read from HDFS
 ** Useful for applications that read data close to the data that they 
 recently read
 *** E.g. Sequential read or random read of different column in same locality 
 group within a hot row

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Subclassing SequenceFile and MapFile

2008-01-16 Thread Jim Kellerman
HBase has several subclasses of MapFile already:
org.apache.hadoop.hbase.HStoreFile$
  HbaseMapFile
  BloomFilterMapFile
  HalfMapFileReader

If MapFile were more subclassable (had protected members instead of private or 
accessor methods) we would probably add client side caching, bloom filters (to 
determine if a key exists in a map file - different from BloomFilterMapFile 
above which is a mix-in of MapFile and BloomFilter)

Tom White said (in https://issues.apache.org/jira/browse/HADOOP-2604)
 If MapFile.Reader were an interface (or an abstract class with a no
 args constructor) then BloomFilterMapFile.Reader, HalfMapFileReader and
 caching Readers could be implemented as wrappers instead of in a static
 hierarchy.

 This would make it easier to mix and match readers (e.g. with or
 without caching) without passing all possible parameters in the
 constructor.

So we'd like to make MapFile (and probably SequenceFile) subclassable by 
providing accessors and/or making members protected instead of private.

If these classes should not be subclassed, they should be declared as final 
classes.

Thoughts? Opinions? Comments?

---
Jim Kellerman, Senior Engineer; Powerset

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.5/1228 - Release Date: 1/16/2008 9:01 
AM



Multiplexing sockets in DFSClient/datanodes?

2008-01-16 Thread Jim Kellerman
HBase has a problem with running out of file handles on machines that
act as region servers. From https://issues.apache.org/jira/browse/HADOOP-2577

 Today the rapleaf gave me an lsof listing from a regionserver. Had thousands
 of open sockets to datanodes all in ESTABLISHED and CLOSE_WAIT state. On
 average they seem to have about ten file descriptors/sockets open per region
 (They have 3 column families IIRC. Per family, can have between 1-5 or so
 mapfiles open per family – 3 is max... but compacting we open a new one,
 etc.).

 They have thousands of regions. 400 regions – ~100G, which is not that much –
 takes about 4k open file handles.

 If they want a regionserver to server a decent disk worths – 300-400G – then
 thats maybe 1600 regions... 16k file handles. If more than just 3 column
 families. then we are in danger of blowing out limits if they are 32k.

One possible solution we've thought of is multiplexing sockets between the
DFSClient and the data node. In this case, there would be one socket per
client -- datanode and would run in async mode using select. This would
consume far fewer sockets than the current
1 socket / client / datanode / open file.

We used a socket multiplexer at Yahoo for the data store I worked on there,
the user data base (or UDB) which stored all the preference data for all
Yahoo pages that could be customized. All the UDB clients each had one
socket open for each machine in the UDB server cluster. Similarly, each
UDB server had one socket open to talk to all of its clients. When you
consider each UDB server had to talk to several thousand clients, and that
each server machine ran many server processes to handle load, this was a
huge savings in OS overhead.

While the 1 socket / client / datanode / open file is a simple model,
if we are talking about scaling Hadoop or HBase to thousands of nodes,
it seems like socket multiplexing would be a big win in terms of server
overhead, especially considering that many of these connections are
more idle than in use.

Yes, multiplexing a socket is more complicated than having one socket
per file, but saving system resources seems like a way to scale.

Questions? Comments? Opinions? Flames?

---
Jim Kellerman, Senior Engineer; Powerset

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.5/1228 - Release Date: 1/16/2008 9:01 
AM



[jira] Commented: (HADOOP-2621) Memcache flush flushing every 60 secs with out considering the max memcache size

2008-01-16 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559660#action_12559660
 ] 

Jim Kellerman commented on HADOOP-2621:
---

You can configure the memcache flush size by setting the config parameter 
hbase.hregion.memcache.flush.size the default is 64M.

When a HRegion reaches this threshold, it will call for a cache flush.

If the cache is flushed, a request is queued to determine if a compaction is 
necessary.

If a compaction is done, then a request is queued to determine if the region 
needs to be split.

 Memcache flush flushing every 60 secs with out considering the max memcache 
 size
 

 Key: HADOOP-2621
 URL: https://issues.apache.org/jira/browse/HADOOP-2621
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Billy Pearson
 Fix For: 0.16.0


 looks like hbase is flushing all memcache to disk every 60 secs causing a lot 
 of work for the compactor to keep up because column gets its own mapfile and 
 every region is flushed at one time. This could be a vary large number of 
 mapfiles to write if a region server is hosting 100 regions all with milti 
 columns.
 Idea memcache flush
 keep all data in memory until memcache get larger then the conf size with 
 hbase.hregion.memcache.flush.size.
 When we reach this size we should flush the regions that are the largest 
 first stopping once we drop back below the memcache max size maybe 20% below 
 the max. This will to flush only as needed as each flush takes time to 
 compact when compaction runs on a region. while we are flushing a region we 
 should also be blocking new updates from happening on that region so the 
 region server does not get over ran when a high update load hits a region 
 server. By only blocking on the region we are flushing at that time other 
 regions will still be able to do updates this.
 We we still want to use the hbase.regionserver.optionalcacheflushinterval we 
 should set to to run once an hour so something like that so we can recover 
 memory from the memcache on region that do not have a lot updates in memory. 
 But running at the default set now of 60 secs is not so good for the 
 compactor if it has many regions to handle also not good for a scanner to 
 have to scan many small files vs a few larger ones
 Example a compactor may take 15 mins to compact a region in that time we will 
 flush 15 times causeing all other regions to get a new mapfile to compact 
 when it becomes it turn to get compacted if you had many regions getting 
 compacted the last one on the list of say 10 regions would have 10 regions * 
 15 mins each = 150 mapfiles for each column in the last region written before 
 the compactor can get to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2356) Set memcache flush size per column

2008-01-16 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2356:
--

Summary: Set memcache flush size per column  (was: Set memcache flush size 
per table)

 Set memcache flush size per column
 --

 Key: HADOOP-2356
 URL: https://issues.apache.org/jira/browse/HADOOP-2356
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Paul Saab
Priority: Minor

 The amount of memory taken by the memcache before a flush is currently a 
 global parameter.  It should be configurable per-table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-15 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Tests passed, patch verified by Billy Pearson (who reported the problem). 
Committed.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata

[jira] Resolved: (HADOOP-2348) [hbase] lock_id in HTable.startUpdate and commit/abort is misleading and useless

2008-01-15 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman resolved HADOOP-2348.
---

Resolution: Won't Fix

 [hbase] lock_id in HTable.startUpdate and commit/abort is misleading and 
 useless
 

 Key: HADOOP-2348
 URL: https://issues.apache.org/jira/browse/HADOOP-2348
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Bryan Duxbury
Assignee: Jim Kellerman
Priority: Minor

 In the past, the lock id returned by HTable.startUpdate was a real lock id 
 from a remote server. However, that has been superceeded by the BatchUpdate 
 process, so now the lock id is just an arbitrary value. More, it doesn't 
 actually add any value, because while it implies that you could start two 
 updates on the same HTable and commit them separately, this is in fact not 
 the case. Any attempt to do a second startUpdate throws an 
 IllegalStateException. 
 Since there is no added functionality afforded by the presence of this 
 parameter, I suggest that we overload all methods that use it to ignore it 
 and print a deprecation notice. startUpdate can just return a constant like 1 
 and eventually turn into a boolean or some other useful value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2138) [hbase] Master should allocate regions to regionservers based upon data locality and rack awareness

2008-01-15 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2138:
--

Priority: Minor  (was: Major)
 Summary: [hbase] Master should allocate regions to regionservers based 
upon data locality and rack awareness  (was: [hbase] Master should allocate 
regions to the regionserver hosting the region data where possible)

Downgrading priority because we should leverage Hadoop's rack awareness where 
possible, and there is a lot of work left to do (in Hadoop) before we can

 [hbase] Master should allocate regions to regionservers based upon data 
 locality and rack awareness
 ---

 Key: HADOOP-2138
 URL: https://issues.apache.org/jira/browse/HADOOP-2138
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Priority: Minor

 Currently, regions are assigned regionservers based off a basic loading 
 attribute.  A factor to include in the assignment calcuation is the location 
 of the region in hdfs; i.e. servers hosting region replicas.  If the cluster 
 is such that regionservers are being run on the same nodes as those running 
 hdfs, then ideally the regionserver for a particular region should be running 
 on the same server as hosts a region replica.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2291) [hbase] Add row count estimator

2008-01-15 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559202#action_12559202
 ] 

Jim Kellerman commented on HADOOP-2291:
---

What is the status of this issue?

 [hbase] Add row count estimator
 ---

 Key: HADOOP-2291
 URL: https://issues.apache.org/jira/browse/HADOOP-2291
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: stack
Assignee: Edward Yoon
Priority: Minor
 Attachments: 2291_v01.patch, Keying.java


 Internally we have a little tool that will do a rough estimate of how many 
 rows there are in a dataHbase.  It keeps getting larger and larger partitions 
 running scanners until it turns up  N occupied rows.  Once it has a number  
 N, it multiples by the partition size to get an approximate row count.  
 This issue is about generalizing this feature so it could sit in the general 
 hbase install.  It would look something like:
 {code}
 long getApproximateRowCount(final Text startRow, final Text endRow, final 
 long minimumCountPerPartition, final long maximumPartitionSize)
 {code}
 Larger minimumCountPerPartition and maximumPartitionSize values would make 
 the count more accurate but would mean the method ran longer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2343) [hbase] Stuck regionserver?

2008-01-15 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2343:
--

Priority: Trivial  (was: Minor)

 [hbase] Stuck regionserver?
 ---

 Key: HADOOP-2343
 URL: https://issues.apache.org/jira/browse/HADOOP-2343
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Trivial

 Looking in logs, a regionserver went down because it could not contact the 
 master after 60 seconds.  Watching logging, the HRS is repeatedly checking 
 all 150 loaded regions over and over again w/ a pause of about 5 seconds 
 between runs... then there is a suspicious 60+ second gap with no logging as 
 though the regionserver had hung up on something:
 {code}
 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965
 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to 
 master for 67467 milliseconds - aborting server
 2007-12-03 13:16:04,455 INFO  hbase.Leases - 
 regionserver/0:0:0:0:0:0:0:0:60020 closing leases
 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - 
 regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting
 {code}
 Master seems to be running fine scanning its ~700 regions.  Then you see this 
 in log, before the HRS shuts itself down.
 {code}
 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease 
 expired 153260899/1532608992007-12-03 13:14:31,417 INFO  hbase.HMaster - 
 XX.XX.XX.102:60020 lease expired
 {code}
 ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2343) [hbase] Stuck regionserver?

2008-01-15 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559204#action_12559204
 ] 

Jim Kellerman commented on HADOOP-2343:
---

I believe this issue was (eventually) addressed by HADOOP-2338.

Leaving open in case issue re-occurs. But will downgrade priority.

 [hbase] Stuck regionserver?
 ---

 Key: HADOOP-2343
 URL: https://issues.apache.org/jira/browse/HADOOP-2343
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Minor

 Looking in logs, a regionserver went down because it could not contact the 
 master after 60 seconds.  Watching logging, the HRS is repeatedly checking 
 all 150 loaded regions over and over again w/ a pause of about 5 seconds 
 between runs... then there is a suspicious 60+ second gap with no logging as 
 though the regionserver had hung up on something:
 {code}
 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965
 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to 
 master for 67467 milliseconds - aborting server
 2007-12-03 13:16:04,455 INFO  hbase.Leases - 
 regionserver/0:0:0:0:0:0:0:0:60020 closing leases
 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - 
 regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting
 {code}
 Master seems to be running fine scanning its ~700 regions.  Then you see this 
 in log, before the HRS shuts itself down.
 {code}
 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease 
 expired 153260899/1532608992007-12-03 13:14:31,417 INFO  hbase.HMaster - 
 XX.XX.XX.102:60020 lease expired
 {code}
 ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2400) Where hbase/mapreduce have analogous configuration parameters, they should be named similarly

2008-01-15 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2400:
--

  Priority: Trivial  (was: Minor)
Issue Type: Improvement  (was: Bug)

 Where hbase/mapreduce have analogous configuration parameters, they should be 
 named similarly
 -

 Key: HADOOP-2400
 URL: https://issues.apache.org/jira/browse/HADOOP-2400
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Priority: Trivial

 mapreduce has a configuration property called mapred.system.dir which 
 determines where in the DFS a jobtracker stores its data.  Similarly, hbase 
 has a configuration property called hbase.rootdir which does something very 
 similar.
 These should have the same name, eg. hbase.system.dir and 
 mapred.system.dir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2136) [hbase] TestTableIndex: variable substitution depth too large: 20

2008-01-15 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2136:
--

Priority: Trivial  (was: Minor)

Downgrading priority since it has been some time since this problem was last 
observed.

 [hbase] TestTableIndex: variable substitution depth too large: 20
 -

 Key: HADOOP-2136
 URL: https://issues.apache.org/jira/browse/HADOOP-2136
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Trivial

 See 'stack - 30/Oct/07 09:51 PM' comment over in HADOOP-2083 for description 
 of an error or see here: 
 http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/970/testReport/org.apache.hadoop.hbase.mapred/TestTableIndex/testTableIndex/
 Seems like its a rare occurrence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2527) Improve master load balancing

2008-01-15 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2527:
--

 Priority: Major  (was: Minor)
Affects Version/s: (was: 0.15.0)
   0.16.0
  Summary: Improve master load balancing  (was: Poor distribution 
of regions)

 Improve master load balancing
 -

 Key: HADOOP-2527
 URL: https://issues.apache.org/jira/browse/HADOOP-2527
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: CentOS 5
Reporter: Chris Kline

 We get poor distribution of regions when we start up HBase.  We have a total 
 of 13 nodes and 898 regions, which should yield an average of 69 regions per 
 node.  Instead, one node has 173 regions and one node has 16 regions.
 Address   Start Code  Load
 10.100.11.62:600201199406218912   requests: 0 regions: 63
 10.100.11.59:600201199406219179   requests: 0 regions: 55
 10.100.11.60:600201199406219062   requests: 0 regions: 90
 10.100.11.61:600201199406219132   requests: 1 regions: 54
 10.100.11.64:600201199406218817   requests: 0 regions: 173
 10.100.11.31:600201199406219039   requests: 1 regions: 16
 10.100.11.58:600201199406218895   requests: 0 regions: 89
 10.100.11.56:600201199406219037   requests: 0 regions: 76
 10.100.11.65:600201199406219135   requests: 0 regions: 56
 10.100.11.57:600201199406219183   requests: 1 regions: 56
 10.100.11.33:600201199406219174   requests: 1 regions: 56
 10.100.11.32:600201199406218944   requests: 0 regions: 66
 10.100.11.63:600201199406219182   requests: 0 regions: 48
 Total:servers: 13 requests: 4 regions: 898

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()

2008-01-15 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559207#action_12559207
 ] 

Jim Kellerman commented on HADOOP-2364:
---

Is this still a problem? When did it last occur?

 when hbase regionserver restarts, it says impossible state for createLease()
 --

 Key: HADOOP-2364
 URL: https://issues.apache.org/jira/browse/HADOOP-2364
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Priority: Minor

 I restarted a regionserver, and got this error in its logs:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.AssertionError: Impossible state for createLease(): Lease 
 -435227488/-435227488 is still held.
 at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145)
 at 
 org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278
 )
 at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 at org.apache.hadoop.ipc.Client.call(Client.java:482)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
 at $Proxy0.regionServerStartup(Unknown Source)
 at 
 org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav
 a:1025)
 at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659)
 at java.lang.Thread.run(Unknown Source)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2526) HRegionServer hangs upon exit due to DFSClient Exception

2008-01-15 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12559217#action_12559217
 ] 

Jim Kellerman commented on HADOOP-2526:
---

Is this still an issue? Has it occurred since reported?

 HRegionServer hangs upon exit due to DFSClient Exception
 

 Key: HADOOP-2526
 URL: https://issues.apache.org/jira/browse/HADOOP-2526
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.15.0
 Environment: CentOS 5
Reporter: Chris Kline
Priority: Minor

 Several HRegionServers hang around indefinitely well after the HMaster has 
 exited.  This was triggered executing $HBASE_HOME/bin/stop-hbase.sh.  The 
 HMaster exists fine, but here is what happens on one of the HRegionServers:
 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.HRegionServer: Got 
 regionserver stop message
 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases: 
 regionserver/0.0.0.0:60020 closing leases
 2008-01-02 18:54:01,907 INFO org.apache.hadoop.hbase.Leases$LeaseMonitor: 
 regionserver/0.0.0.0:60020.leaseChecker exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.hbase.Leases: 
 regionserver/0.0.0.0:60020 closed leases
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: Stopping server on 
 60020
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 2 on 60020: exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 0 on 60020: exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 7 on 60020: exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 3 on 60020: exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 5 on 60020: exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 9 on 60020: exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 6 on 60020: exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 4 on 60020: exiting
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 1 on 60020: exiting
 2008-01-02 18:54:01,909 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
 Server listener on 60020
 2008-01-02 18:54:01,908 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 8 on 60020: exiting
 2008-01-02 18:54:01,909 INFO org.apache.hadoop.hbase.HRegionServer: Stopping 
 infoServer
 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.Container: Stopping [EMAIL 
 PROTECTED]
 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: closing 
 ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
 2008-01-02 18:54:01,909 DEBUG org.mortbay.util.ThreadedServer: IGNORED
 java.net.SocketException: Socket closed
 at java.net.PlainSocketImpl.socketAccept(Native Method)
 at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
 at java.net.ServerSocket.implAccept(ServerSocket.java:453)
 at java.net.ServerSocket.accept(ServerSocket.java:421)
 at 
 org.mortbay.util.ThreadedServer.acceptSocket(ThreadedServer.java:432)
 at 
 org.mortbay.util.ThreadedServer$Acceptor.run(ThreadedServer.java:631)
 2008-01-02 18:54:01,910 INFO org.mortbay.util.ThreadedServer: Stopping 
 Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
 2008-01-02 18:54:01,910 DEBUG org.mortbay.util.ThreadedServer: Self connect 
 to close listener /127.0.0.1:60030
 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem 
 stopping acceptor /127.0.0.1:
 2008-01-02 18:54:01,911 DEBUG org.mortbay.util.ThreadedServer: problem 
 stopping acceptor /127.0.0.1:
 java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
 at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
 at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
 at java.net.Socket.connect(Socket.java:519)
 at java.net.Socket.connect(Socket.java:469)
 at java.net.Socket.init(Socket.java:366)
 at java.net.Socket.init(Socket.java:209)
 at 
 org.mortbay.util.ThreadedServer$Acceptor.forceStop(ThreadedServer.java:682)
 at org.mortbay.util.ThreadedServer.stop(ThreadedServer.java:557)
 at org.mortbay.http.SocketListener.stop(SocketListener.java:211)
 at org.mortbay.http.HttpServer.doStop(HttpServer.java:781)
 at org.mortbay.util.Container.stop(Container.java:154)
 at org.apache.hadoop.hbase.util.InfoServer.stop(InfoServer.java:237

RE: [jira] Created: (HADOOP-2616) hbase not spliting when the total size of region reaches max region size * 1.5

2008-01-15 Thread Jim Kellerman
We do not need to split unless any one column is over the threshold.

---
Jim Kellerman, Senior Engineer; Powerset


 -Original Message-
 From: Billy Pearson (JIRA) [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, January 15, 2008 2:46 PM
 To: hadoop-dev@lucene.apache.org
 Subject: [jira] Created: (HADOOP-2616) hbase not spliting
 when the total size of region reaches max region size * 1.5

 hbase not spliting when the total size of region reaches max
 region size * 1.5
 --
 

  Key: HADOOP-2616
  URL:
 https://issues.apache.org/jira/browse/HADOOP-2616
  Project: Hadoop
   Issue Type: Bug
   Components: contrib/hbase
 Reporter: Billy Pearson
 Priority: Minor
  Fix For: 0.17.0


 right now a region may get larger then the max size set in the conf

 HRegion.needsSplit

 Checks the largest column to see if its larger then max
 region size * 1.5 and then desides to split or not

 But  if we have more then one column the region could be vary
 large example

 Say we have 10 columns all about the same size lets say 40MB
 and the max file size is 64MB we would not split even thought
 the region size is 400MB well over the 96MB needed to trip a
 split to happen.


 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.


 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.5.516 / Virus Database: 269.19.2/1224 - Release
 Date: 1/14/2008 5:39 PM



No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.2/1224 - Release Date: 1/14/2008 5:39 
PM



[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2588:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Tests passed. Committed.

 org.onelab.filter.BloomFilter class uses 8X the memory it should be using
 -

 Key: HADOOP-2588
 URL: https://issues.apache.org/jira/browse/HADOOP-2588
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: n/a
Reporter: Ian Clarke
Priority: Trivial
 Fix For: 0.16.0

 Attachments: patch.txt


 The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, 
 however in most Java implementations this will use a byte per bit stored, 
 meaning that 8X the actual used memory is required.  This is unfortunate as 
 the whole point of a BloomFilter is to save memory.
 As a sidebar, the implementation looks a bit shaky in other ways, such as the 
 way hashes are generated from a SHA1 digest in the Filter class, such as the 
 way that it just assumes the digestBytes array will be long enough in the 
 hash() method.
 I discovered this while looking for a good Bloom Filter implementation to use 
 in my own project.  In the end I went ahead and implemented my own, its very 
 simple and pretty elegant (even if I do say so myself ;) - you are welcome to 
 use it:
 http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-2597) [hbase] Performance - add a block cache

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman resolved HADOOP-2597.
---

Resolution: Duplicate

 [hbase] Performance - add a block cache
 ---

 Key: HADOOP-2597
 URL: https://issues.apache.org/jira/browse/HADOOP-2597
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Tom White

 A block cache would cache fixed size blocks (default 64k) of data read from 
 HDFS by the MapFile. It would help read performance for data close to 
 recently read data (see Bigtable paper, section 6). It would be configurable 
 on a per-column family basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Status: Open  (was: Patch Available)

TestTableIndex is now failing rather consistently.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Status: Patch Available  (was: Open)

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply

[jira] Reopened: (HADOOP-2443) [hbase] Keep lazy cache of regions in client rather than an 'authoritative' list

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reopened HADOOP-2443:
---


Now that this works and has been committed, can we reduce the 'chattiness' of 
the debug level logging?

Thanks.


 [hbase] Keep lazy cache of regions in client rather than an 'authoritative' 
 list
 

 Key: HADOOP-2443
 URL: https://issues.apache.org/jira/browse/HADOOP-2443
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: Bryan Duxbury
 Fix For: 0.16.0

 Attachments: 2443-v10.patch, 2443-v3.patch, 2443-v4.patch, 
 2443-v5.patch, 2443-v6.patch, 2443-v7.patch, 2443-v8.patch, 2443-v9.patch


 Currently, when the client gets a NotServingRegionException -- usually 
 because its in middle of being split or there has been a regionserver crash 
 and region is being moved elsewhere -- the client does a complete refresh of 
 its cache of region locations for a table.
 Chatting with Jim about a Paul Saab upload issue from Saturday night, when 
 tables are big comprised of regions that are splitting fast (because of bulk 
 upload), its unlikely a client will ever be able to obtain a stable list of 
 all region locations.  Given that any update or scan requires that the list 
 of all regions be in place before it proceeds, this can get in the way of the 
 client succeeding when the cluster is under load.
 Chatting, we figure that it better the client holds a lazy region cache: on 
 NSRE, figure out where that region has gone only and update the client-side 
 cache for that entry only rather than throw out all we know of a table every 
 time.
 Hopefully this will fix the issue PS was experiencing where during intense 
 upload, he was unable to get/scan/hql the same table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Status: Open  (was: Patch Available)

Thought of a better way to force cache flushes.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Status: Open  (was: Patch Available)

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Attachment: patch.txt

HTable$ClientScanner.nextScanner was sleeping, but in the wrong place

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-14 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Status: Patch Available  (was: Open)

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt, patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA

[jira] Updated: (HADOOP-2440) [hbase] Provide a HBase checker and repair tool similar to fsck

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2440:
--

Status: Open  (was: Patch Available)

Patch was for one of the sub-issues and has been committed.

The main issue has not yet been addressed.

 [hbase] Provide a HBase checker and repair tool similar to fsck
 ---

 Key: HADOOP-2440
 URL: https://issues.apache.org/jira/browse/HADOOP-2440
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt


 We need a tool to verify (and repair) HBase much like fsck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2416) [hbase] IOException: File does not exist

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2416:
--

Priority: Minor  (was: Major)

Downgrading to minor since this problem has not been reported since the 
original report.

 [hbase] IOException: File does not exist
 

 Key: HADOOP-2416
 URL: https://issues.apache.org/jira/browse/HADOOP-2416
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Minor

 Two fellas today on two unrelated clusters had versions of the below:
 {code}
   bryanduxbury  2007-12-12 08:28:22,235 ERROR 
 org.apache.hadoop.hbase.HRegionServer: Compaction failed for region 
 spider_pages,10_149317711,1197468834206
 [13:01]   bryanduxbury  java.io.IOException: java.io.IOException: File 
 does not exist
 [13:01]   bryanduxbury  at 
 org.apache.hadoop.dfs.FSDirectory.getFileInfo(FSDirectory.java:489)
 [13:01]   bryanduxbury  at 
 org.apache.hadoop.dfs.FSNamesystem.getFileInfo(FSNamesystem.java:1360)
 [13:01]   bryanduxbury  at 
 org.apache.hadoop.dfs.NameNode.getFileInfo(NameNode.java:428)
 [13:01]   bryanduxbury  at 
 sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
 [13:01]   bryanduxbury  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [13:01]   bryanduxbury  at 
 java.lang.reflect.Method.invoke(Method.java:597)
 [13:01]   bryanduxbury  at 
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 [13:01]   bryanduxbury  at 
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 [13:01]   bryanduxbury  at 
 sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source)
 [13:01]   bryanduxbury  at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 [13:01]   bryanduxbury  at 
 java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 [13:01]   bryanduxbury  at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
 [13:01]   bryanduxbury  at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:48)
 [13:01]   bryanduxbury  at 
 org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:385)
 {code}
 Odd is that the file thats missing's name is not cited.
 The other instance showed in the webui.  Seemed to be problem with an 
 HStoreFile in.META. region.  I was unable to select content from the .META. 
 table -- it was returning null rows.
 In both cases a restart fixed things again.
 Since all state is out in hdfs and the in-memory maps are made from the hdfs 
 state, something must not be getting updated on compaction/split or flush.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558442#action_12558442
 ] 

Jim Kellerman commented on HADOOP-2587:
---

The reason for this is that a region split needs to close the parent region for 
a bit. However, HRegion.close needs to acquire a number of locks before it can 
proceed. Because splitRegion was calling RegionListener.closing before calling 
close, the region would be taken offline before close had acquired any locks. 
If there were compactions, scanners or updates in progress, these would all 
need to finish before the region could actually close, resulting in long 
periods where the region was unavailable.

The solution is to have HRegion.close call listener.closing only after all the 
locks had been acquired and the close was really about to proceed. For 
consistency HRegion.close should also call listener.closed

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1

[jira] Assigned: (HADOOP-2500) [HBase] Unreadable region kills region servers

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2500:
-

Assignee: Jim Kellerman

 [HBase] Unreadable region kills region servers
 --

 Key: HADOOP-2500
 URL: https://issues.apache.org/jira/browse/HADOOP-2500
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: CentOS 5
Reporter: Chris Kline
Assignee: Jim Kellerman
Priority: Critical

 Backgound: The name node (also a DataNode and RegionServer) in our cluster 
 ran out of disk space.  I created some space, restarted HDFS and fsck 
 reported corruption with an HBase file.  I cleared up that corruption and 
 restarted HBase.  I was still unable to read anything from HBase even though 
 HSFS was now healthy.
 The following was gather from the log files.  When HMaster starts up, it 
 finds a region that is no good (Key: 17_125736271):
 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current 
 assignment of spider_pages,17_125736271,1198286140018 is no good
 HMaster then assigns this region to RegionServer X.60:
 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning 
 region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020
 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 The RegionServer has trouble reading that region (from the RegionServer log 
 on X.60); Note that the worker thread exits
 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting 
 spider_pages,17_125736271,1198286140018/meta (2062710340/meta with 
 reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log
 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum 
 sequence id for hstore spider_pages,17_125736271,1198286140018/meta 
 (2062710340/meta) is 4549496
 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error 
 opening region spider_pages,17_125736271,1198286140018
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344)
 at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697)
 at org.apache.hadoop.hbase.HStore.init(HStore.java:632)
 at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: 
 Unhandled exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker 
 thread exiting
 The HMaster then tries to assign the same region to X.60 again and fails.  
 The HMaster tries to assign the region to X.31 with the same result (X.31 
 worker thread exits).
 The file it is complaining about, 
 /data/hbase1/hregion_2062710340/oldlogfile.log, is a zero-length file in 
 HDFS.  After deleting that file and restarting HBase, HBase appears to be 
 back to normal.
 One thing I can't figure out is that the HMaster log show several entries 
 after the worker thread on X.60 has exited suggesting that the RegionServer 
 is talking with HMaster:
 2007-12-24 09:08:23,349 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 2007-12-24 09:10:29,543 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 There is no corresponding entry in the RegionServer's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2468) [hbase] TestRegionServerExit failed in Hadoop-Nightly #338

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2468:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolving issue. Issue not seen in recent builds.

 [hbase] TestRegionServerExit failed in Hadoop-Nightly #338
 --

 Key: HADOOP-2468
 URL: https://issues.apache.org/jira/browse/HADOOP-2468
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
Priority: Minor
 Fix For: 0.16.0

 Attachments: patch.txt


 TestRegionServerExit failed in Hadoop-Nightly #338
 From the logs it appears that the client gave up before the mini hbase 
 cluster could recover from a region server failing. Adjusting the timeout and 
 retry configuration parameters should make this more reliable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Attachment: patch.txt

This patch addresses HADOOP-2587 (this issue), HADOOP-2500 and a newly found 
issue with TestTimestamp.testTimestamps() which was creating two mini dfs 
clusters.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Status: Patch Available  (was: Open)

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2440) [hbase] Provide a HBase checker and repair tool similar to fsck

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2440:
--

Fix Version/s: (was: 0.16.0)
   0.17.0

Pushing fix out to 0.17 since adding the referential integrity needed to make 
this tool really work will require another migration tool.

 [hbase] Provide a HBase checker and repair tool similar to fsck
 ---

 Key: HADOOP-2440
 URL: https://issues.apache.org/jira/browse/HADOOP-2440
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.17.0

 Attachments: patch.txt


 We need a tool to verify (and repair) HBase much like fsck

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HADOOP-2500) [HBase] Unreadable region kills region servers

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman resolved HADOOP-2500.
---

   Resolution: Fixed
Fix Version/s: 0.16.0

Patch submitted for HADOOP-2587 incorporated fix for this issue. Tests passed. 
Committed.

 [HBase] Unreadable region kills region servers
 --

 Key: HADOOP-2500
 URL: https://issues.apache.org/jira/browse/HADOOP-2500
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: CentOS 5
Reporter: Chris Kline
Assignee: Jim Kellerman
Priority: Critical
 Fix For: 0.16.0


 Backgound: The name node (also a DataNode and RegionServer) in our cluster 
 ran out of disk space.  I created some space, restarted HDFS and fsck 
 reported corruption with an HBase file.  I cleared up that corruption and 
 restarted HBase.  I was still unable to read anything from HBase even though 
 HSFS was now healthy.
 The following was gather from the log files.  When HMaster starts up, it 
 finds a region that is no good (Key: 17_125736271):
 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current 
 assignment of spider_pages,17_125736271,1198286140018 is no good
 HMaster then assigns this region to RegionServer X.60:
 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning 
 region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020
 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 The RegionServer has trouble reading that region (from the RegionServer log 
 on X.60); Note that the worker thread exits
 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting 
 spider_pages,17_125736271,1198286140018/meta (2062710340/meta with 
 reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log
 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum 
 sequence id for hstore spider_pages,17_125736271,1198286140018/meta 
 (2062710340/meta) is 4549496
 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error 
 opening region spider_pages,17_125736271,1198286140018
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344)
 at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697)
 at org.apache.hadoop.hbase.HStore.init(HStore.java:632)
 at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: 
 Unhandled exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker 
 thread exiting
 The HMaster then tries to assign the same region to X.60 again and fails.  
 The HMaster tries to assign the region to X.31 with the same result (X.31 
 worker thread exits).
 The file it is complaining about, 
 /data/hbase1/hregion_2062710340/oldlogfile.log, is a zero-length file in 
 HDFS.  After deleting that file and restarting HBase, HBase appears to be 
 back to normal.
 One thing I can't figure out is that the HMaster log show several entries 
 after the worker thread on X.60 has exited suggesting that the RegionServer 
 is talking with HMaster:
 2007-12-24 09:08:23,349 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 2007-12-24 09:10:29,543 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 There is no corresponding entry in the RegionServer's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Tests passed. Committed.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add

[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2588:
--

Issue Type: Improvement  (was: Bug)

 org.onelab.filter.BloomFilter class uses 8X the memory it should be using
 -

 Key: HADOOP-2588
 URL: https://issues.apache.org/jira/browse/HADOOP-2588
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
 Environment: n/a
Reporter: Ian Clarke
Priority: Trivial

 The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, 
 however in most Java implementations this will use a byte per bit stored, 
 meaning that 8X the actual used memory is required.  This is unfortunate as 
 the whole point of a BloomFilter is to save memory.
 As a sidebar, the implementation looks a bit shaky in other ways, such as the 
 way hashes are generated from a SHA1 digest in the Filter class, such as the 
 way that it just assumes the digestBytes array will be long enough in the 
 hash() method.
 I discovered this while looking for a good Bloom Filter implementation to use 
 in my own project.  In the end I went ahead and implemented my own, its very 
 simple and pretty elegant (even if I do say so myself ;) - you are welcome to 
 use it:
 http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reopened HADOOP-2587:
---


Times reported for splits are inaccurate.

Investigate why other operations are blocked during compaction.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply

[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2588:
--

Attachment: patch.txt

Replace vector of boolean with BitSet

 org.onelab.filter.BloomFilter class uses 8X the memory it should be using
 -

 Key: HADOOP-2588
 URL: https://issues.apache.org/jira/browse/HADOOP-2588
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: n/a
Reporter: Ian Clarke
Priority: Trivial
 Fix For: 0.16.0

 Attachments: patch.txt


 The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, 
 however in most Java implementations this will use a byte per bit stored, 
 meaning that 8X the actual used memory is required.  This is unfortunate as 
 the whole point of a BloomFilter is to save memory.
 As a sidebar, the implementation looks a bit shaky in other ways, such as the 
 way hashes are generated from a SHA1 digest in the Filter class, such as the 
 way that it just assumes the digestBytes array will be long enough in the 
 hash() method.
 I discovered this while looking for a good Bloom Filter implementation to use 
 in my own project.  In the end I went ahead and implemented my own, its very 
 simple and pretty elegant (even if I do say so myself ;) - you are welcome to 
 use it:
 http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2588:
--

Fix Version/s: 0.16.0
Affects Version/s: 0.16.0
   Status: Patch Available  (was: Open)

 org.onelab.filter.BloomFilter class uses 8X the memory it should be using
 -

 Key: HADOOP-2588
 URL: https://issues.apache.org/jira/browse/HADOOP-2588
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: n/a
Reporter: Ian Clarke
Priority: Trivial
 Fix For: 0.16.0

 Attachments: patch.txt


 The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, 
 however in most Java implementations this will use a byte per bit stored, 
 meaning that 8X the actual used memory is required.  This is unfortunate as 
 the whole point of a BloomFilter is to save memory.
 As a sidebar, the implementation looks a bit shaky in other ways, such as the 
 way hashes are generated from a SHA1 digest in the Filter class, such as the 
 way that it just assumes the digestBytes array will be long enough in the 
 hash() method.
 I discovered this while looking for a good Bloom Filter implementation to use 
 in my own project.  In the end I went ahead and implemented my own, its very 
 simple and pretty elegant (even if I do say so myself ;) - you are welcome to 
 use it:
 http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558486#action_12558486
 ] 

Jim Kellerman commented on HADOOP-2587:
---

Updates prevent:
 - cache flushes
 - closing a region (and consequently, splits)
 - final stage of compaction

Scanners prevent
 - final stage of compaction
 - closing a region (and consequently, splits)

During the final stage of compaction
 - no new scanners may be created
 - updates are prohibited

Cache flushes prevent
 - closing a region (and consequently, splits)
 - updates
 - rolling the HLog

Rolling the HLog prevents
 - cache flushes
 - updates

A region split must close the old region. Consequently before it can start it 
must:
 - wait for any compactions or cache flushes to complete
 - lock the region to prevent new updates
 - wait for active scanners to terminate
 - wait for updates in progress to finish

Once a split is in progress, the actual process is quick. However even after 
the region server reports that the split has completed, clients must wait until 
the master assigns the new regions and the region server(s) report to the 
master that
the new regions are being served.
split, , , log roll, , 

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Attachment: patch.txt

It turns out that scanners and updates were being locked out for the duration 
of a compaction due to the order in which locks were taken out. This has been 
modified. Also other methods that used these locks have had their ordering 
changed to prevent deadlocks.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Affects Version/s: 0.16.0
   Status: Patch Available  (was: Reopened)

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Status: Open  (was: Patch Available)

Won't apply anymore.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Attachment: patch.txt

New version applies and resolves conflicts.

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA

[jira] Updated: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-13 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2587:
--

Status: Patch Available  (was: Open)

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log, log.log, 
 patch.txt, patch.txt, patch.txt


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email

[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-12 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Status: Open  (was: Patch Available)

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt, patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-12 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Attachment: patch.txt

new patch starts mini dfs for the two tests that failed

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt, patch.txt, patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-12 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Status: Patch Available  (was: Open)

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt, patch.txt, patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using

2008-01-12 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2588:
--

Component/s: (was: util)
 contrib/hbase
   Priority: Trivial  (was: Minor)

 org.onelab.filter.BloomFilter class uses 8X the memory it should be using
 -

 Key: HADOOP-2588
 URL: https://issues.apache.org/jira/browse/HADOOP-2588
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: n/a
Reporter: Ian Clarke
Priority: Trivial

 The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, 
 however in most Java implementations this will use a byte per bit stored, 
 meaning that 8X the actual used memory is required.  This is unfortunate as 
 the whole point of a BloomFilter is to save memory.
 As a sidebar, the implementation looks a bit shaky in other ways, such as the 
 way hashes are generated from a SHA1 digest in the Filter class, such as the 
 way that it just assumes the digestBytes array will be long enough in the 
 hash() method.
 I discovered this while looking for a good Bloom Filter implementation to use 
 in my own project.  In the end I went ahead and implemented my own, its very 
 simple and pretty elegant (even if I do say so myself ;) - you are welcome to 
 use it:
 http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2588) org.onelab.filter.BloomFilter class uses 8X the memory it should be using

2008-01-12 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558272#action_12558272
 ] 

Jim Kellerman commented on HADOOP-2588:
---

You must be looking at an older version than what is in trunk.

The current implementation uses a Jenkins hash rather than SHA-1.

You are correct that there is no guarantee how JVMs implement an array of 
boolean.
Perhaps using a java.util.BitSet would be better.

 org.onelab.filter.BloomFilter class uses 8X the memory it should be using
 -

 Key: HADOOP-2588
 URL: https://issues.apache.org/jira/browse/HADOOP-2588
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: n/a
Reporter: Ian Clarke
Priority: Minor

 The org.onelab.filter.BloomFilter uses a boolean[] to store the filter, 
 however in most Java implementations this will use a byte per bit stored, 
 meaning that 8X the actual used memory is required.  This is unfortunate as 
 the whole point of a BloomFilter is to save memory.
 As a sidebar, the implementation looks a bit shaky in other ways, such as the 
 way hashes are generated from a SHA1 digest in the Filter class, such as the 
 way that it just assumes the digestBytes array will be long enough in the 
 hash() method.
 I discovered this while looking for a good Bloom Filter implementation to use 
 in my own project.  In the end I went ahead and implemented my own, its very 
 simple and pretty elegant (even if I do say so myself ;) - you are welcome to 
 use it:
 http://locut.us/blog/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-12 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Tests passed. committed. resolving issue

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt, patch.txt, patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart

2008-01-11 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2311:
--

Priority: Trivial  (was: Critical)

Dropping priority since this bug has not re-occurred.

 [hbase] Could not complete hdfs write out to flush file forcing regionserver 
 restart
 

 Key: HADOOP-2311
 URL: https://issues.apache.org/jira/browse/HADOOP-2311
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Trivial
 Attachments: delete-logging.patch


 I've spent some time looking into this issue but there are not enough clues 
 in the logs to tell where the problem is. Here's what I know.
 Two region servers went down last night, a minute apart, during Paul Saab's 
 6hr run inserting 300million rows into hbase. The regionservers went down to 
 force rerun of hlog and avoid possible data loss after a failure writing 
 memory flushes to hdfs.
 Here is the lead up to the failed flush:
 ...
 2007-11-28 22:40:02,231 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: 
 img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: 
 {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
 max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
 compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
 none}}}
 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no 
 reconstruction log)
 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/cookie is 29077708
 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no 
 reconstruction log)
 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/ip is 29077708
 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegion - region 
 postlog,img149/4699/133lm0.jpg,1196318393738 available
 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 
 21357623 to 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, 
 startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, 
 families: {cookie:={name: cookie, max versions: 1, compression: NONE, in 
 memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, 
 max versions: 1, compression: NONE, in memory: false, max length: 2147483647, 
 bloom filter: none}}}
 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no 
 reconstruction log)
 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/cookie is 29077708
 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no 
 reconstruction log)
 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/ip is 29077708
 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709
 2007-11-28 22:40:04,701 INFO  hbase.HRegion - region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available
 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region 
 postlog,img143/1310/yashrk3.jpg,1196317258704
 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined 
 that there was nothing to do
 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703
 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() 
 determined that there was nothing to do
 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region 
 postlog,img149/4699/133lm0.jpg,1196318393738
 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k
 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 
 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k
 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 
 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k
 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, 
 sequenceid=29081563
 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore 
 postlog,img149/4699/133lm0.jpg,1196318393738/ip needed.
 2007-11

[jira] Commented: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart

2008-01-11 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558025#action_12558025
 ] 

Jim Kellerman commented on HADOOP-2311:
---

Have we seen any more occurrences of this problem? 

If not should we close this issue as not reproducable and open a new one if it 
should happen again?

 [hbase] Could not complete hdfs write out to flush file forcing regionserver 
 restart
 

 Key: HADOOP-2311
 URL: https://issues.apache.org/jira/browse/HADOOP-2311
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Critical
 Attachments: delete-logging.patch


 I've spent some time looking into this issue but there are not enough clues 
 in the logs to tell where the problem is. Here's what I know.
 Two region servers went down last night, a minute apart, during Paul Saab's 
 6hr run inserting 300million rows into hbase. The regionservers went down to 
 force rerun of hlog and avoid possible data loss after a failure writing 
 memory flushes to hdfs.
 Here is the lead up to the failed flush:
 ...
 2007-11-28 22:40:02,231 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: 
 img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: 
 {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
 max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
 compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
 none}}}
 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no 
 reconstruction log)
 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/cookie is 29077708
 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no 
 reconstruction log)
 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/ip is 29077708
 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegion - region 
 postlog,img149/4699/133lm0.jpg,1196318393738 available
 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 
 21357623 to 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, 
 startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, 
 families: {cookie:={name: cookie, max versions: 1, compression: NONE, in 
 memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, 
 max versions: 1, compression: NONE, in memory: false, max length: 2147483647, 
 bloom filter: none}}}
 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no 
 reconstruction log)
 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/cookie is 29077708
 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no 
 reconstruction log)
 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/ip is 29077708
 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709
 2007-11-28 22:40:04,701 INFO  hbase.HRegion - region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available
 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region 
 postlog,img143/1310/yashrk3.jpg,1196317258704
 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined 
 that there was nothing to do
 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703
 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() 
 determined that there was nothing to do
 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region 
 postlog,img149/4699/133lm0.jpg,1196318393738
 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k
 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 
 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k
 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 
 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k
 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, 
 sequenceid=29081563
 2007-11-28 22:41

[jira] Commented: (HADOOP-2394) Add supprt for migrating between hbase versions

2008-01-11 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558037#action_12558037
 ] 

Jim Kellerman commented on HADOOP-2394:
---

stack wrote:
 I ain't too invested in our supporting reverse migrations but its worth 
 noting that any migration system worth its salt -
 systems I've worked on in the past and ruby on rails - go both ways if only 
 to facilitate testing of the forward migration
 (inevitably there's a bug when you try to migrate real data).

That's what backups are for :)

More importantly though, HADOOP-2478 incorporates a migration tool. The 
specifics of what the tool does will have to be
rewritten for each upgrade, but I think the framework is good.

 Add supprt for migrating between hbase versions
 ---

 Key: HADOOP-2394
 URL: https://issues.apache.org/jira/browse/HADOOP-2394
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Johan Oskarsson

 If Hbase is to be used to serve data to live systems we would need a way to 
 upgrade both the underlying hadoop installation and hbase to newer versions 
 with minimal downtime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-2500) [HBase] Unreadable region kills region servers

2008-01-11 Thread Jim Kellerman (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12558032#action_12558032
 ] 

Jim Kellerman commented on HADOOP-2500:
---

Bryan Duxbury wrote:
 At the very least, we should not assign a region to a region server if it is 
 detected as no good.

That is an unfortunate wording of a log message in the Master. It is saying 
that the current 
assignment of the region is no good because the information it read from the 
meta region
had a server or start code that did not match a known server. It does not mean 
that the
master thinks the region itself is no good.

 Also, if a RegionServer tries to access a region and it has difficulties, it 
 should report to the
 master that it can't read the region, and the master should stop trying to 
 serve it.
 From a more general standpoint, maybe when a bad region is detected, its 
 files should be 
 moved to a different location and generally excluded from the cluster. This 
 would allow you to 
 recover from problems better.

Yes, we absolutely need to do something, just not sure exactly what yet.

One thing for certain. zero length files should be ignored/deleted.


 [HBase] Unreadable region kills region servers
 --

 Key: HADOOP-2500
 URL: https://issues.apache.org/jira/browse/HADOOP-2500
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: CentOS 5
Reporter: Chris Kline
Priority: Critical

 Backgound: The name node (also a DataNode and RegionServer) in our cluster 
 ran out of disk space.  I created some space, restarted HDFS and fsck 
 reported corruption with an HBase file.  I cleared up that corruption and 
 restarted HBase.  I was still unable to read anything from HBase even though 
 HSFS was now healthy.
 The following was gather from the log files.  When HMaster starts up, it 
 finds a region that is no good (Key: 17_125736271):
 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current 
 assignment of spider_pages,17_125736271,1198286140018 is no good
 HMaster then assigns this region to RegionServer X.60:
 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning 
 region spider_pages,17_125736271,1198286140018 to server 10.100.11.60:60020
 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received 
 MSG_REPORT_PROCESS_OPEN : spider_pages,17_125736271,1198286140018 from 
 10.100.11.60:60020
 The RegionServer has trouble reading that region (from the RegionServer log 
 on X.60); Note that the worker thread exits
 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting 
 spider_pages,17_125736271,1198286140018/meta (2062710340/meta with 
 reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log
 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum 
 sequence id for hstore spider_pages,17_125736271,1198286140018/meta 
 (2062710340/meta) is 4549496
 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error 
 opening region spider_pages,17_125736271,1198286140018
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1360)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1349)
 at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1344)
 at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697)
 at org.apache.hadoop.hbase.HStore.init(HStore.java:632)
 at org.apache.hadoop.hbase.HRegion.init(HRegion.java:288)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: 
 Unhandled exception
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095)
 at 
 org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217)
 at 
 org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
 at java.lang.Thread.run(Thread.java:619)
 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker 
 thread exiting
 The HMaster then tries to assign the same region to X.60 again and fails.  
 The HMaster tries to assign the region to X.31 with the same result (X.31 
 worker thread exits).
 The file it is complaining about, 
 /data/hbase1/hregion_2062710340/oldlogfile.log, is a zero

[jira] Assigned: (HADOOP-2587) Splits getting blocked by compactions causeing region to be offline for the length of the compaction 10-15 mins

2008-01-11 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2587:
-

Assignee: Jim Kellerman

 Splits getting blocked by compactions causeing region to be offline for the 
 length of the compaction 10-15 mins
 ---

 Key: HADOOP-2587
 URL: https://issues.apache.org/jira/browse/HADOOP-2587
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
 Environment: hadoop subversion 611087
Reporter: Billy Pearson
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: hbase-root-regionserver-PE1750-3.log


 The below is cut out of one of my region servers logs full log attached
 What is happening is there is one region on a this region server and its is 
 under heave insert load so compaction are back to back one one finishes a new 
 one starts the problem starts when its time to split the region. 
 A compaction starts just millsecs before the split starts blocking the split 
 but the split closes the region before the compaction is finished. Causing 
 the region to be offline until the compaction is done. Once the compaction is 
 done the split finishes and all is returned to normal but this is a big 
 problem for production if the region is offline for 10-15 mins.
 The solution would be not to let the split thread to issue the below line 
 while a compaction on that region is happening.
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 The only time I have seen this bug is when there is only one region on a 
 region server because if more then one then the compaction happens to the 
 other region(s) after the first one is done compaction and the split can do 
 what it needs on the first region with out getting blocked.
 {code}
 2008-01-11 16:22:01,020 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 16mins, 10sec
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HStore: compaction for 
 HStore webdata,,1200085987488/size needed.
 2008-01-11 16:22:01,020 DEBUG org.apache.hadoop.hbase.HRegion: 
 1773667150/size needs compaction
 2008-01-11 16:22:01,021 INFO org.apache.hadoop.hbase.HRegion: starting 
 compaction on region webdata,,1200085987488
 2008-01-11 16:22:01,021 DEBUG org.apache.hadoop.hbase.HStore: started 
 compaction of 14 files using 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/compaction.dir/hregion_1773667150/size
  for webdata,,1200085987488/size
 2008-01-11 16:22:01,123 DEBUG org.apache.hadoop.hbase.HRegion: Started 
 memcache flush for region webdata,,1200085987488. Size 31.2m
 2008-01-11 16:22:01,232 INFO org.apache.hadoop.hbase.HRegion: Splitting 
 webdata,,1200085987488 because largest aggregate size is 100.7m and desired 
 size is 64.0m
 2008-01-11 16:22:01,247 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 webdata,,1200085987488 closing (Adding to retiringRegions)
 ...
 lots of NotServingRegionException's
 ...
 2008-01-11 16:32:59,876 INFO org.apache.hadoop.hbase.HRegion: compaction 
 completed on region webdata,,1200085987488. Took 10mins, 58sec
 ...
 2008-01-11 16:33:02,193 DEBUG org.apache.hadoop.hbase.HRegion: Cleaned up 
 /gfs_storage/hadoop-root/hbase/hregion_1773667150/splits true
 2008-01-11 16:33:02,194 INFO org.apache.hadoop.hbase.HRegion: Region split of 
 webdata,,1200085987488 complete; new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239. Split 
 took 11mins, 0sec
 2008-01-11 16:33:02,227 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: No servers for 
 .META.. Doing a find...
 2008-01-11 16:33:02,283 DEBUG 
 org.apache.hadoop.hbase.HConnectionManager$TableServers: Found 1 region(s) 
 for .META. at address: 10.0.0.4:60020, regioninfo: regionname: -ROOT-,,0, 
 startKey: , encodedName(70236052) tableDesc: {name: -ROOT-, families: 
 {info:={name: info, max versions: 1, compression: NONE, in memory: false, max 
 length: 2147483647, bloom filter: none}}}
 2008-01-11 16:33:02,284 INFO org.apache.hadoop.hbase.HRegionServer: Updating 
 .META. with region split info
 2008-01-11 16:33:02,290 DEBUG org.apache.hadoop.hbase.HRegionServer: 
 Reporting region split to master
 2008-01-11 16:33:02,291 INFO org.apache.hadoop.hbase.HRegionServer: region 
 split, META update, and report to master all successful. Old 
 region=webdata,,1200085987488, new regions: webdata,,1200090121237, 
 webdata,com.tom.ent/2008-01-04/0PGM/09034104.html:http,1200090121239
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-10 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Fix Version/s: (was: 0.17.0)
   0.16.0
Affects Version/s: (was: 0.17.0)
   0.16.0
   Status: Open  (was: Patch Available)

Cancelling patch to make new one that will apply to trunk.

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-10 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Attachment: patch.txt

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt, patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-10 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Status: Patch Available  (was: Open)

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt, patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-09 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Attachment: patch.txt

Although this won't go in until 0.17, let's get Hudson used to running it. He 
doesn't like most patches.

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-09 Thread Jim Kellerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2478:
--

Fix Version/s: (was: 0.16.0)
   0.17.0
Affects Version/s: (was: 0.16.0)
   0.17.0
   Status: Patch Available  (was: In Progress)

See what Hudson thinks.

 [hbase] restructure how HBase lays out files in the file system
 ---

 Key: HADOOP-2478
 URL: https://issues.apache.org/jira/browse/HADOOP-2478
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.17.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.17.0

 Attachments: patch.txt


 Currently HBase has a pretty flat directory structure. For example:
 {code}
  /hbase/hregion_70236052/info
 /hbase/hregion_70236052/info/info/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
 /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
 {code}
 All the region directories are under the root directory, and with encoded 
 region names, it is impossible to determine what table a region belongs to. 
 This should be restructured to:
 {code}
 /root-directory/table-name/encoded-region-name/column-family/{info,mapfiles}
 {code}
 It will be necessary to provide a migration script from current trunk to the 
 new structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Hadoop Patch Builds

2008-01-09 Thread Jim Kellerman
That rocks dude!

Awesome fix!

---
Jim Kellerman, Senior Engineer; Powerset


 -Original Message-
 From: Nigel Daley [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 09, 2008 5:08 PM
 To: hadoop-dev@lucene.apache.org
 Subject: Hadoop Patch Builds

 Until now, the order that our Hadoop-Patch build would test
 the patches has been essentially random.  Also, there has
 been no way to see the list of pending patches.

 Drum roll

 These 2 pain points are now fixed.

 I have created a new Hudson job, Hadoop-Patch-Admin, that
 does 2 things:
 a) triggers the Hadoop-Patch build when there are waiting
 patches; the order that patches are now submitted for testing
 is FIFO :-)
 b) exposes the current patch queue

 To see the queue, go to http://lucene.zones.apache.org:8080/hudson/
 job/Hadoop-Patch/ and click on the link QUEUE OF PENDING
 PATCHES (you may want to bookmark the linked page since it
 won't change).

 The Hadoop-Patch-Admin build attempts to run every minute and
 updates the queue information that you see.  The build,
 however, will get stuck behind any other builds (Hadoop,
 Lucene, etc) that are currently running so the queue
 information may not always be completely up-to-date.

 Hope this helps!

 Nige

 PS updated wiki documentation to follow

 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.5.516 / Virus Database: 269.19.0/1216 - Release
 Date: 1/9/2008 10:16 AM



No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.0/1216 - Release Date: 1/9/2008 10:16 
AM



RE: [jira] Commented: (HADOOP-2478) [hbase] restructure how HBase lays out files in the file system

2008-01-09 Thread Jim Kellerman
Stack tells me that code freeze for 0.16 is either late this week or early 
next. So no refactoring yet.

---
Jim Kellerman, Senior Engineer; Powerset


 -Original Message-
 From: Bryan Duxbury (JIRA) [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, January 09, 2008 5:38 PM
 To: hadoop-dev@lucene.apache.org
 Subject: [jira] Commented: (HADOOP-2478) [hbase] restructure
 how HBase lays out files in the file system


 [
 https://issues.apache.org/jira/browse/HADOOP-2478?page=com.atl
assian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12557514#action_12557514
  ]

 Bryan Duxbury commented on HADOOP-2478:
 ---

 If this won't be fixed until 0.17, which is months away,
 should we apply my HStore refactor patch in the meantime?

  [hbase] restructure how HBase lays out files in the file system
  ---
 
  Key: HADOOP-2478
  URL:
 https://issues.apache.org/jira/browse/HADOOP-2478
  Project: Hadoop
   Issue Type: Improvement
   Components: contrib/hbase
 Affects Versions: 0.17.0
 Reporter: Jim Kellerman
 Assignee: Jim Kellerman
  Fix For: 0.17.0
 
  Attachments: patch.txt
 
 
  Currently HBase has a pretty flat directory structure. For example:
  {code}
   /hbase/hregion_70236052/info
  /hbase/hregion_70236052/info/info/4328260619704027575
  /hbase/hregion_70236052/info/mapfiles/4328260619704027575
  /hbase/hregion_70236052/info/mapfiles/4328260619704027575/data
  /hbase/hregion_70236052/info/mapfiles/4328260619704027575/index
  {code}
  All the region directories are under the root directory,
 and with encoded region names, it is impossible to determine
 what table a region belongs to. This should be restructured to:
  {code}
 
 /root-directory/table-name/encoded-region-name/column-family/{info,map
  files}
  {code}
  It will be necessary to provide a migration script from
 current trunk to the new structure.

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.


 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.5.516 / Virus Database: 269.19.0/1216 - Release
 Date: 1/9/2008 10:16 AM



No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.0/1216 - Release Date: 1/9/2008 10:16 
AM



RE: [hbase] table HRegionServer affinity

2008-01-08 Thread Jim Kellerman
-1 on this idea as suggested. Even Google does not distribute
DFS or Bigtable across data centers (see the Bigtable paper at
http://labs.google.com/papers/bigtable.html ). What the paper
does not mention is that they can replicate a table to multiple
data centers for business continuity and backup. This is on the
road map for HBase but is still quite a way down the road.

In addition, we do want to add 'rack awareness' within a data
center for fault tolerance and load balancing. This is also
not going to happen in the immediate future.

We are currently focusing on making what we have more fault
tolerant and are starting to work on performance issues.

Answers to your two questions inline below.

---
Jim Kellerman, Senior Engineer; Powerset


 -Original Message-
 From: Andrew Purtell [mailto:[EMAIL PROTECTED]
 Sent: Monday, January 07, 2008 8:49 PM
 To: hadoop-dev
 Subject: [hbase] table HRegionServer affinity

 Hello,

 Consider the case of a global federation of Hadoop clusters,
 with a single global HBase master, divided into a number of
 geographic regions each with a local DFS, local workload, and
 region server backed by that DFS. This setup allows for a
 global HBase space, where any region may retrieve rows stored
 by any other region -- which is quite useful -- but, in
 addition to this, it would also be useful to be able to
 specify constraints on data mobility and also to be able to
 scope queries to a particular region.

 To be a bit more specific, I have three things in mind:

 1) The ability to fix a given key range to a region. This

I assume here you mean geographic region and not an HBase
table region.

 would both assign a range to a given region, and also disable
 splitting over that range. Aside from API changes, ideally
 there would be a HBase shell command to support this.

 2) Syntactic support in HBase shell for table affinity to a
 given region server:

  CREATE TABLE ... REGION=10.10.10.10

 (or similar) This would fix an entire table to a region.

 3) Query support for scoping the result set based on region
 server:

  SELECT ... WHERE @REGION=10.10.10.10 AND ...

 (or similar)

 Given the inflexibility of IP or hostnames to name regions,
 perhaps a mechanism for assigning logical labels to a region
 server (or even group of region servers, where a prohibition
 on splitting may be relaxed to allow splitting over the
 group) would also be useful.

 As I am still coming up to speed on Hadoop and HBase and the
 code base, I kindly ask for the answers to two questions.

 First: How invasive to the HBase master/region model is the
 concept of specifying constraints on data mobility?

It would be very disruptive. The current model is that you
run one or more HBase clusters per HDFS cluster. An HBase
cluster does not span HDFS clusters.

As far as I know HDFS clusters do not span data centers.
Latency and network partitioning would be big problems for
a system that requires sub-second response times.

 Second: How difficult would the modifications may be to accomplish?

A change such as this would require major changes to the
architecture and our vision of the model going forward.
(replication between data centers and a single table residing
in multiple data centers being served by separate HBase
instances running on separate HDFS clusters).

 I believe these questions to be related. :-)

 Thanks,

 Andrew Purtell
 Advanced Threats Research
 Trend Micro, Inc., Pasadena, CA, USA
 (personal mail)




 __
 __
 Looking for last minute shopping deals?
 Find them fast with Yahoo! Search.
 http://tools.search.yahoo.com/newsearch/category.php?category=shopping

 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release
 Date: 1/7/2008 9:14 AM



No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release Date: 1/7/2008 9:14 
AM



RE: [jira] Commented: (HADOOP-2405) [hbase] Merge region tool exposed in shell and/or in UI

2008-01-07 Thread Jim Kellerman
Google does dynamic splitting and merging of regions to deal with hot spots.
They had to be careful that they did not oscilate between splitting and merging 
when the
load pattern changed.

Right now, manual merges are ok because we only do splits when regions grow and 
the only reason
to merge is if many rows are deleted.

When we get to doing more sophisticated load balancing, we will want the 
capability of both
splitting and merging based on load.

 -Original Message-
 From: Bryan Duxbury (JIRA) [mailto:[EMAIL PROTECTED]
 Sent: Monday, January 07, 2008 1:10 PM
 To: hadoop-dev@lucene.apache.org
 Subject: [jira] Commented: (HADOOP-2405) [hbase] Merge region
 tool exposed in shell and/or in UI


 [
 https://issues.apache.org/jira/browse/HADOOP-2405?page=com.atl
assian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId= 
12556694#action_12556694 ]

 Bryan Duxbury commented on HADOOP-2405:
 ---

 So, you envision the merge operation to not only require
 manual triggering but to require manual targeting? Shouldn't
 the point of merging regions be to maintain the equilibrium
 of size of regions? Under what circumstances will you have to
 manually intervene to keep regions appropriately sized?

 It seems like this should really only happen after a
 substantial number of deletions has occurred, right? That
 would cause a compacted region to shrink below a healthy
 size, and if it could be merged with a neighbor, it would be
 nice. This logic should be built in and automatic, otherwise
 it would require constant monitoring of region sizes by an
 administrator.

 Other than this sort of automatic merging, when would you
 want to manually merge two regions? Doesn't that expose a
 somewhat dangerous amount of functionality to the end user?

  [hbase] Merge region tool exposed in shell and/or in UI
  ---
 
  Key: HADOOP-2405
  URL:
 https://issues.apache.org/jira/browse/HADOOP-2405
  Project: Hadoop
   Issue Type: New Feature
   Components: contrib/hbase
 Reporter: stack
 Priority: Minor
 
  hbase has support for merging regions.  Expose a merge
 trigger in the shell or in the UI (Can only merge adjacent
 features so perhaps only makes sense in UI in the regionserver UI).

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.


 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release
 Date: 1/7/2008 9:14 AM



No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.17.13/1213 - Release Date: 1/7/2008 9:14 
AM



  1   2   3   4   5   6   7   8   9   10   >