from:"Jim Kellerman"

RE: [hbase] Suggestions on hbase APIs.

2008-01-21 Thread Jim Kellerman

 -Original Message-
 From: Mafish Liu [mailto:[EMAIL PROTECTED]
 Sent: Monday, January 21, 2008 12:23 AM
 To: hadoop-dev@lucene.apache.org
 Subject: [hbase] Suggestions on hbase APIs.

 Hi:
 I'm recently using hbase (included in hadoop 0.15.2
 release)to manage spatial data.
 And found two flaws which I think can be improved.

 First, if you fetch the column names in a hbase table using 
  Set Text columns = tableDes.families().keySet(); 
 You can get a set of column names that ended by a colon,
 which I think should be gotten rid of.

The name that ends with a colon is the name of the column family,
and you can create multiple family members in an adhoc fashion.

For example say you have a column named 'meta:' in which you
store data about web pages. You can create multiple family members
in the same row such as 'meta:mime-type', 'meta:crawl-date',
'meta:encoding', etc.

Example:

HTable table = new HTable(conf, tableName);
long id = table.startUpdate(row);
// enter data in column meta:
table.put(id, new Text(meta:mime-type), data);
table.put(id, new Text(meta:crawl-date), data);
table.put(id, new Text(meta:encoding), data);
// enter data in column contents:
table.put(id, new Text(contents:), data);
table.commit(id);

 Second, if you read all contains in a hbase table by
 HScannerInterface.next method, you will ge a TreeMapText,
 byte[] every time you call. Returning column names every
 time is a waste  of memory and network bandwidth.
 And there should be an efficient way to do such work.

Well, you can retrieve multiple columns with a scanner,
so if the column name was not passed back, how would
you determine which column goes with which data. Scanning
the table in the example above:

HScannerInterface scanner = table.obtainScanner(
  new Text[] {new Text(contents:), new Text(meta)},
  new Text()); // empty start row = start at beginning

now when you do scanner.next you need the map to
find the value for contents: and the (multiple)
values for meta:.

 The above two APIs are used in my program and also in Hbase
 shell program.
 I don't know if there are alternative APIs that have
 performed the improvements.

 Best regards.
 Mafish
 --
 [EMAIL PROTECTED]
 Institute of Computing Technology, Chinese Academy of
 Sciences, Beijing.

 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.5.516 / Virus Database: 269.19.8/1235 - Release
 Date: 1/21/2008 9:39 AM

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.8/1235 - Release Date: 1/21/2008 9:39 
AM

[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-21 Thread Jim Kellerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2668:
--

Status: Open  (was: Patch Available)

It appears that hudson lost this patch when it went down. Resubmitting.

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch, patch.txt


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-21 Thread Jim Kellerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2668:
--

Status: Patch Available  (was: Open)

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch, patch.txt


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-20 Thread Jim Kellerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman reassigned HADOOP-2668:
-

Assignee: Jim Kellerman  (was: stack)

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-20 Thread Jim Kellerman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560893#action_12560893
 ] 

Jim Kellerman commented on HADOOP-2668:
---

Ok, there is definitely some work to do here. I'll work on fixing Migrate.

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-20 Thread Jim Kellerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2668:
--

Affects Version/s: 0.16.0
   Status: Patch Available  (was: Open)

Works locally, try hudson. - Stack, please review patch.

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch, patch.txt


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2668) [hbase] Documentation and improved logging so fact that hbase now requires migration comes as less of a surprise

2008-01-20 Thread Jim Kellerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2668:
--

Attachment: patch.txt

Lots more checking, clean up several bugs, new read-only mode, usage, etc.

 [hbase] Documentation and improved logging so fact that hbase now requires 
 migration comes as less of a surprise
 

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: stack
Assignee: Jim Kellerman
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch, migration.patch, patch.txt


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2643) [hbase] Make migration tool smarter.

2008-01-19 Thread Jim Kellerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Kellerman updated HADOOP-2643:
--

   Resolution: Fixed
Fix Version/s: 0.16.0
   Status: Resolved  (was: Patch Available)

Committed. Ignoring one unrelated core test failure.

 [hbase] Make migration tool smarter.
 

 Key: HADOOP-2643
 URL: https://issues.apache.org/jira/browse/HADOOP-2643
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: patch.txt


 The migration tool that handles the changes to how hbase lays out files in 
 the file system needs to be smarter.
 - don't try to migrate old region directories in which the region name is a 
 part of the directory name.
 - add a version number to the file system

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version

2008-01-19 Thread Jim Kellerman (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560748#action_12560748
 ] 

Jim Kellerman commented on HADOOP-2668:
---

If you run the migrate tool as the exception suggested, it will write the 
version file and then the system will start.

 [hbase] After 2643, cluster won't start if FS was created by an older hbase 
 version
 ---

 Key: HADOOP-2668
 URL: https://issues.apache.org/jira/browse/HADOOP-2668
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Blocker
 Fix For: 0.16.0

 Attachments: migrate.patch


 Hbase now checks for a version file.  If none, it reports a version mismatch. 
  There will be no version file if the hbase was made by a version older than 
 r613469

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version

2008-01-19 Thread Jim Kellerman (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560753#action_12560753
]

Jim Kellerman commented on HADOOP-2668:
---

It didn't occur to me that migration was the way to fix the missing version
file.

From HMaster.java(894, 5):

{code}
throw new IOException(
file system not correct version. Run hbase.util.Migrate);
{code}

I also figured we should just auto-migrate this one case of a missing version
file (If in future,
version file goes missing, I'd think it the job of hbsfck recreating it,
rather than migration?).

Suppose you have a file system that has not been migrated? (i.e. regions are
stored in
/hbase/hregion_nnn) The master would start up write the version file and
then
proceed to recreate the root and meta regions because they aren't under
/hbase/-ROOT- and /hbase/.META. respectively.

Additionally the first thing the migrate tool does is look for the version
file. If it finds it and
the version number matches, it figures that the file system has been upgraded
already
and does nothing.

But I'm fine w/ forcing users to run the migration. It needs to be better
documented and added
to the bin/hbase script with verb 'migrate' I'd say.

Agreed. How about this changing this patch to update bin/hbase and add
documentation
(where ?)?

I tried to run the migration but it wants to connect to a HMaster. That ain't
going to work (Cluster
won't start because no version file... can't migrate because cluster ain't
up...).

It tries to connect to the master to ensure it isn't running (uses
HBaseAdmin.isMasterRunning())
We wouldn't want to do a upgrade with the cluster running.

[hbase] After 2643, cluster won't start if FS was created by an older hbase
version
---

Key: HADOOP-2668
URL: https://issues.apache.org/jira/browse/HADOOP-2668
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Blocker
Fix For: 0.16.0

Attachments: migrate.patch

Hbase now checks for a version file. If none, it reports a version mismatch.
There will be no version file if the hbase was made by a version older than
r613469

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2668) [hbase] After 2643, cluster won't start if FS was created by an older hbase version

2008-01-19 Thread Jim Kellerman (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560753#action_12560753
]

jimk edited comment on HADOOP-2668 at 1/19/08 5:05 PM:

It didn't occur to me that migration was the way to fix the missing version
file.

From HMaster.java(894, 5):

{code}
throw new IOException(
file system not correct version. Run hbase.util.Migrate);
{code}

I also figured we should just auto-migrate this one case of a missing version
file (If in future,
version file goes missing, I'd think it the job of hbsfck recreating it,
rather than migration?).

Suppose you have a file system that has not been migrated? (i.e. regions are
stored in
=/hbase/hregion_nnn=) The master would start up write the version file and
then
proceed to recreate the root and meta regions because they aren't under
=/hbase/-ROOT-= and =/hbase/.META.= respectively.

But I'm fine w/ forcing users to run the migration. It needs to be better
documented and added
to the bin/hbase script with verb 'migrate' I'd say.

Agreed. How about this changing this patch to update bin/hbase and add
documentation
(where ?)?

I tried to run the migration but it wants to connect to a HMaster. That ain't
going to work (Cluster
won't start because no version file... can't migrate because cluster ain't
up...).

It tries to connect to the master to ensure it isn't running (uses
HBaseAdmin.isMasterRunning())
We wouldn't want to do a upgrade with the cluster running.

was (Author: jimk):
It didn't occur to me that migration was the way to fix the missing
version file.