Fwd: What should we expect from Hama Examples Rand?

2009-12-15 Thread Edward J. Yoon
Hi,

'RAND' example of hama-examples.jar is basically a simple M/R job that
creates a table filled with random numbers. So, before the run Hama,
Pls check whether you able to create tables via hbase shell.

Can someone of Hbase the developers help this problem?

Thanks

-- Forwarded message --
From: Ratner, Alan S (IS) 
Date: Sat, Dec 12, 2009 at 1:32 AM
Subject: What should we expect from Hama Examples Rand?
To: hama-u...@incubator.apache.org


Having fixed the groomserver problem I did the following:



1) clean out /tmp files

2) format Hadoop namenode

3) start Hadoop

4) start HBase/Zookeeper

5) start Hama

6) launch Hama examples rand -m 10 -r 10 2000 2000 30.5% matrixA



The outcome is puzzling.  A 3-second long diary shows up in the Hama
log file reporting nothing unusual.  But the terminal reported
problems with HBase.  When I Googled this problem all I saw was
someone who had multiple versions of HBase on their system.  I am
using a fresh VM with Ubuntu 8.04, Hadoop 0.20.1, Zookeeper 3.2.1,
HBase 0.20.2, Hama 0.2.0 and JDK 1.6.0_17.  No older versions of
anything was ever installed.



BTW: I thought the problem might be related to my running HBase in
standalone mode so I just switched to HBase pseudo-distributed mode
but I see the same problems.



Any help appreciated.  -- Alan



Hama log

==

Fri Dec 11 09:23:03 EST 2009 Starting master on ngc

ulimit -n 1024

2009-12-11 09:23:05,512 INFO org.apache.hama.HamaMaster: STARTUP_MSG:

/

STARTUP_MSG: Starting HamaMaster

STARTUP_MSG:   host = ngc/127.0.1.1

STARTUP_MSG:   args = [start]

STARTUP_MSG:   version = 0.20.1

STARTUP_MSG:   build =
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1
-r 810220; compiled by 'oom' on Tue Sep  1 20:55:56 UTC 2009

/

2009-12-11 09:23:05,939 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=HamaMaster, port=4

2009-12-11 09:23:06,634 INFO org.apache.hama.HamaMaster: Cleaning up
the system directory

2009-12-11 09:23:06,710 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting

2009-12-11 09:23:06,716 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 4: starting

2009-12-11 09:23:06,721 INFO org.apache.hama.HamaMaster: Starting RUNNING

2009-12-11 09:23:06,721 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 4: starting



Hama terminal

/hadoop-0.20.1-test.jar:/home/ngc/Desktop/hama/bin/../lib/hbase-0.20.0.jar:/home/ngc/Desktop/hama/bin/../lib/hbase-0.20.0-test.jar:/home/ngc/Desktop/hama/bin/../lib/jasper-

compiler-5.5.12.jar:/home/ngc/Desktop/hama/bin/../lib/jasper-runtime-5.5.12.jar:/home/ngc/Desktop/hama/bin/../lib/javacc.jar:/home/ngc/Desktop/hama/bin/../lib/jetty-6.1.14.jar:/home/ngc/Desktop/hama/bin/../lib/jetty-util-6.1.14.jar:/home/ngc/Desktop/hama/bin/../lib/jruby-complete-1.2.0.jar:/home/ngc/Desktop/hama/bin/../lib/json.jar:/home/ngc/Desktop/hama/bin/../lib/junit-3.8.1.jar:/home/ngc/Desktop/hama/bin/../lib/libthrift-r771587.jar:/home/ngc/Desktop/hama/bin/../lib/log4j-1.2.13.jar:/home/ngc/Desktop/hama/bin/../lib/log4j-1.2.15.jar:/home/ngc/Desktop/hama/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/ngc/Desktop/hama/bin/../lib/xmlenc-0.52.jar:/home/ngc/Desktop/hama/bin/../lib/zookeeper-3.2.1.jar:/home/ngc/Desktop/hama/bin/../lib/jetty-ext/*.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/annotations.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/ant.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-analysis-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-commons-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-tree-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-util-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-xml-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/bcel.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/dom4j-full.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/findbugs-ant.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/findbugsGUI.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/findbugs.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/jsr305.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/plugin/coreplugin.jar:/home/ngc/Desktop/hadoop-0.20.1/conf:/home/ngc/Desktop/hbase-0.20.2/conf

09/12/11 09:25:12 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/home/ngc/Desktop/jre1.6.0_17/lib/i386/client:/home/ngc/Desktop/jre1.6.0_17/lib/i386:/home/ngc/Desktop/jre1.6.0_17/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib

09/12/11 09:25:12 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp

09/12/11 09:25:12 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=

09/12/11 09:25:12 INFO zookeeper.ZooKeeper: Client environment:os.na

Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Edward Capriolo
On Tue, Dec 15, 2009 at 1:03 AM, stack  wrote:
> HBase requires java 6 (1.6) or above.
> St.Ack
>
> On Mon, Dec 14, 2009 at 7:41 PM, Paul Smith  wrote:
>
>> Just wondering if anyone knows of an existing Hbase utility library that is
>> open sourced that can assist those that have Java5 and above.  I'm starting
>> off in Hbase, and thinking it'd be great to have API calls similar to the
>> Google Collections framework.  If one doesn't exist, I think I could start
>> off a new project in Google Code (ASL it).  I think Hbase is targetted <
>> Java 5, so can't take advantage of this yet internally.
>>
>> The sorts of API functions I thought would be useful to make code more
>> readable would be something like:
>>
>>
>>        HTable hTable = new
>> TableBuilder(hbaseConfiguration).withTableName("foo")
>>                .withSimpleColumnFamilies("bar", "eek",
>> "moo").deleteAndRecreate();
>>
>> and
>>
>>        ResultScanner scanner = new
>> ResultScannerBuilder(hTable).withColumnFamilies(
>>                "family1", "family2").build();
>>
>>
>> taking advantage of varargs liberally and using nice Patterns etc.  While
>> the Bytes class is useful, I'd personally benefit from an API that can
>> easily pack arbitrary multiple ints (and other data types) together into
>> byte[] for RowKeyGoodness(tm) ala:
>>
>> byte[] rowKey = BytePacker.pack(fooId, barId, eekId, mooId);
>>
>> (behind the scenes this is a vararg method that recursively packs each into
>> into byte[] via Bytes.add(byte[] b1, byte[] b2) etc.
>>
>> If anyone knows of a library that does this, pointers please.
>>
>> cheers,
>>
>> Paul
>

I could see this being very useful. My first barrier to hbase was
trying to figure out how to turn what I knew of as an SQL select cause
into a set of HBaser server side filters. Mostly, I pieced this
together with help from the list, and the Test Cases. That could be
frustrating for some.  Now that I am used to it, I notice that the
HBase way is actually much cleaner and much less code.

So, yes a helper library is a great thing.

As part of the "proof of concept" I am working on, large sections of
it are mostly descriptions of doing things like column projections in
both SQL and HBase with filters. So I think both are very helpful for
making Hbase more attractive to an end user.


Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Gary Helmling
This definitely seems to be a common initial hurdle, though I think each
project comes at it with their own specific needs.  There are a variety of
frameworks or libraries you can check out on the Supporting Projects page:
http://wiki.apache.org/hadoop/SupportingProjects

In my case, I wanted a simple object -> hbase mapping layer that would take
care of the boilerplate work of persistence and provide a slightly higher
level API for queries.  It's been open-sourced on github:

http://github.com/ghelmling/meetup.beeno

It still only really account for my project needs -- we're serving realtime
requests from our web site and not currently doing any MR processing.  But
if it could be of use, I could always use feedback on how to evolve it. :)

Some of the other projects listed on the wiki page are doubtless more
mature, so they may meet your needs as well.  If none of them are quite what
you're looking for, then there's always room for another!

--gh


On Tue, Dec 15, 2009 at 10:39 AM, Edward Capriolo wrote:

> On Tue, Dec 15, 2009 at 1:03 AM, stack  wrote:
> > HBase requires java 6 (1.6) or above.
> > St.Ack
> >
> > On Mon, Dec 14, 2009 at 7:41 PM, Paul Smith  wrote:
> >
> >> Just wondering if anyone knows of an existing Hbase utility library that
> is
> >> open sourced that can assist those that have Java5 and above.  I'm
> starting
> >> off in Hbase, and thinking it'd be great to have API calls similar to
> the
> >> Google Collections framework.  If one doesn't exist, I think I could
> start
> >> off a new project in Google Code (ASL it).  I think Hbase is targetted <
> >> Java 5, so can't take advantage of this yet internally.
> >>
> >> The sorts of API functions I thought would be useful to make code more
> >> readable would be something like:
> >>
> >>
> >>HTable hTable = new
> >> TableBuilder(hbaseConfiguration).withTableName("foo")
> >>.withSimpleColumnFamilies("bar", "eek",
> >> "moo").deleteAndRecreate();
> >>
> >> and
> >>
> >>ResultScanner scanner = new
> >> ResultScannerBuilder(hTable).withColumnFamilies(
> >>"family1", "family2").build();
> >>
> >>
> >> taking advantage of varargs liberally and using nice Patterns etc.
>  While
> >> the Bytes class is useful, I'd personally benefit from an API that can
> >> easily pack arbitrary multiple ints (and other data types) together into
> >> byte[] for RowKeyGoodness(tm) ala:
> >>
> >> byte[] rowKey = BytePacker.pack(fooId, barId, eekId, mooId);
> >>
> >> (behind the scenes this is a vararg method that recursively packs each
> into
> >> into byte[] via Bytes.add(byte[] b1, byte[] b2) etc.
> >>
> >> If anyone knows of a library that does this, pointers please.
> >>
> >> cheers,
> >>
> >> Paul
> >
>
> I could see this being very useful. My first barrier to hbase was
> trying to figure out how to turn what I knew of as an SQL select cause
> into a set of HBaser server side filters. Mostly, I pieced this
> together with help from the list, and the Test Cases. That could be
> frustrating for some.  Now that I am used to it, I notice that the
> HBase way is actually much cleaner and much less code.
>
> So, yes a helper library is a great thing.
>
> As part of the "proof of concept" I am working on, large sections of
> it are mostly descriptions of doing things like column projections in
> both SQL and HBase with filters. So I think both are very helpful for
> making Hbase more attractive to an end user.
>


Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Edward Capriolo
On Tue, Dec 15, 2009 at 11:04 AM, Gary Helmling  wrote:
> This definitely seems to be a common initial hurdle, though I think each
> project comes at it with their own specific needs.  There are a variety of
> frameworks or libraries you can check out on the Supporting Projects page:
> http://wiki.apache.org/hadoop/SupportingProjects
>
> In my case, I wanted a simple object -> hbase mapping layer that would take
> care of the boilerplate work of persistence and provide a slightly higher
> level API for queries.  It's been open-sourced on github:
>
> http://github.com/ghelmling/meetup.beeno
>
> It still only really account for my project needs -- we're serving realtime
> requests from our web site and not currently doing any MR processing.  But
> if it could be of use, I could always use feedback on how to evolve it. :)
>
> Some of the other projects listed on the wiki page are doubtless more
> mature, so they may meet your needs as well.  If none of them are quite what
> you're looking for, then there's always room for another!
>
> --gh
>
>
> On Tue, Dec 15, 2009 at 10:39 AM, Edward Capriolo 
> wrote:
>
>> On Tue, Dec 15, 2009 at 1:03 AM, stack  wrote:
>> > HBase requires java 6 (1.6) or above.
>> > St.Ack
>> >
>> > On Mon, Dec 14, 2009 at 7:41 PM, Paul Smith  wrote:
>> >
>> >> Just wondering if anyone knows of an existing Hbase utility library that
>> is
>> >> open sourced that can assist those that have Java5 and above.  I'm
>> starting
>> >> off in Hbase, and thinking it'd be great to have API calls similar to
>> the
>> >> Google Collections framework.  If one doesn't exist, I think I could
>> start
>> >> off a new project in Google Code (ASL it).  I think Hbase is targetted <
>> >> Java 5, so can't take advantage of this yet internally.
>> >>
>> >> The sorts of API functions I thought would be useful to make code more
>> >> readable would be something like:
>> >>
>> >>
>> >>        HTable hTable = new
>> >> TableBuilder(hbaseConfiguration).withTableName("foo")
>> >>                .withSimpleColumnFamilies("bar", "eek",
>> >> "moo").deleteAndRecreate();
>> >>
>> >> and
>> >>
>> >>        ResultScanner scanner = new
>> >> ResultScannerBuilder(hTable).withColumnFamilies(
>> >>                "family1", "family2").build();
>> >>
>> >>
>> >> taking advantage of varargs liberally and using nice Patterns etc.
>>  While
>> >> the Bytes class is useful, I'd personally benefit from an API that can
>> >> easily pack arbitrary multiple ints (and other data types) together into
>> >> byte[] for RowKeyGoodness(tm) ala:
>> >>
>> >> byte[] rowKey = BytePacker.pack(fooId, barId, eekId, mooId);
>> >>
>> >> (behind the scenes this is a vararg method that recursively packs each
>> into
>> >> into byte[] via Bytes.add(byte[] b1, byte[] b2) etc.
>> >>
>> >> If anyone knows of a library that does this, pointers please.
>> >>
>> >> cheers,
>> >>
>> >> Paul
>> >
>>
>> I could see this being very useful. My first barrier to hbase was
>> trying to figure out how to turn what I knew of as an SQL select cause
>> into a set of HBaser server side filters. Mostly, I pieced this
>> together with help from the list, and the Test Cases. That could be
>> frustrating for some.  Now that I am used to it, I notice that the
>> HBase way is actually much cleaner and much less code.
>>
>> So, yes a helper library is a great thing.
>>
>> As part of the "proof of concept" I am working on, large sections of
>> it are mostly descriptions of doing things like column projections in
>> both SQL and HBase with filters. So I think both are very helpful for
>> making Hbase more attractive to an end user.
>>
>

All interesting. In a sense, I believe you should learn to walk before
you can run :). It is hard to troubleshoot how an ORM mapper is
working if you basically clueless on the Hbase API.

You know when lots of user tools get pulled in the mix:
q: How do I only get column X?
a: You need to get a spring inject able, grails, restful, ORM mapper,
that is only found in git, but there is like 4 forks of it, so pick
this one :)


Re: Performance related question

2009-12-15 Thread Something Something
Thanks J-D & Mtohiko for the tips.  Significant improvement in performance,
but there's still room for improvement.  In my local pseudo distributed mode
the 2 map reduce jobs now run in less than 4 minutes (from 32 mins) and in
cluster of 10 nodes + 5 zk nodes they run in 11 minutes (down from 1 hour &
30 mins).  But still I would like to come to a point where they run faster
on a cluster than on my local machine.

Here's what I did:

1)  Fixed a bug in my code that was causing unnecessary writes to HBase.
2)  Added these two lines after creating 'new HTable':
table.setAutoFlush(false);
table.setWriteBufferSize(1024*1024*12);
3)  Added this line after Put:
put.setWriteToWAL(false);
4)  Added this line (only when running on cluster):
job.setNumReduceTasks(20);

There are other 64-bit related improvements which I cannot try; mainly
because Amazon charges (way) too much for 64-bit machines.  It costs me over
$25 for 15 machines for less than 3 hours, so I switched to 'm1.small'
32-bit machines.  Of course, one of the promises of the distributed
computing is that we will be able to use "cheap commodity hardware", right
:)  So I would like to stick with 'm1.small' for now.  (But I am willing to
use about 30 machines if that's going to help.)

Anyway, I have noticed that one of my Mappers is taking too long.  If anyone
would share ideas of how to improve Mapper speed, that would be greatly
appreciated.  Basically, in this Mapper I read about 50,000 rows from a
HBase table using TableMapReduceUtil.initTableMapperJob() and do some
complex processing for "values" of each row.  I don't write anything back in
HBase, but I do write quite a few lines (context.write()) to HDFS.  Any
suggestions?

Thanks once again for the help.



2009/12/13 

> Hello,
>
> Something Something  wrote:
> > PS:  One thing I have noticed is that it goes to 66% very fast and then
> > slows down from there..
>
> It seems that only one reducer works. You should increase reduce tasks.
> The default reduce task's number is written on
> hadoop/docs/mapred-default.html.
> The default parameter of mapred.reduce.tasks is 1. So only one reduce task
> runs.
>
> There are two ways to increase reduce tasks:
> 1. Use Job.setNumReduceTasks(int tasks) on your MapReduce job file.
> 2. Denote more mapred.reduce.tasks on hadoop/conf/mapred-site.xml.
>
> You can get the best perfomance if you run 20 reduce tasks. The detail of
> the number
> of reduce tasks is written on
> http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Reducer
> at "How many Reduces?" as J-D wrote. Notice that
> JobConf.setNumReduceTasks(int) is
> already deprecated, so you should use Job.setNumReduceTasks(int tasks)
> rather than
> JobConf.setNumReduceTasks(int).
> --
> Motohiko Mouri
>


Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Gary Helmling
I completely agree with the need to understand both the fundamental HBase
API, and how HBase stores data at a low level.  Both are very important in
knowing how to structure your data for best performance.  Which you should
figure out before moving on to other niceties.

As far as the actual data storage, Lars George did a really informative
write-up:
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

And of course there's the HBase Architecture doc:
http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
and the Google BigTable paper.


On Tue, Dec 15, 2009 at 11:21 AM, Edward Capriolo wrote:

> On Tue, Dec 15, 2009 at 11:04 AM, Gary Helmling 
> wrote:
> > This definitely seems to be a common initial hurdle, though I think each
> > project comes at it with their own specific needs.  There are a variety
> of
> > frameworks or libraries you can check out on the Supporting Projects
> page:
> > http://wiki.apache.org/hadoop/SupportingProjects
> >
> > In my case, I wanted a simple object -> hbase mapping layer that would
> take
> > care of the boilerplate work of persistence and provide a slightly higher
> > level API for queries.  It's been open-sourced on github:
> >
> > http://github.com/ghelmling/meetup.beeno
> >
> > It still only really account for my project needs -- we're serving
> realtime
> > requests from our web site and not currently doing any MR processing.
>  But
> > if it could be of use, I could always use feedback on how to evolve it.
> :)
> >
> > Some of the other projects listed on the wiki page are doubtless more
> > mature, so they may meet your needs as well.  If none of them are quite
> what
> > you're looking for, then there's always room for another!
> >
> > --gh
> >
> >
> > On Tue, Dec 15, 2009 at 10:39 AM, Edward Capriolo  >wrote:
> >
> >> On Tue, Dec 15, 2009 at 1:03 AM, stack  wrote:
> >> > HBase requires java 6 (1.6) or above.
> >> > St.Ack
> >> >
> >> > On Mon, Dec 14, 2009 at 7:41 PM, Paul Smith 
> wrote:
> >> >
> >> >> Just wondering if anyone knows of an existing Hbase utility library
> that
> >> is
> >> >> open sourced that can assist those that have Java5 and above.  I'm
> >> starting
> >> >> off in Hbase, and thinking it'd be great to have API calls similar to
> >> the
> >> >> Google Collections framework.  If one doesn't exist, I think I could
> >> start
> >> >> off a new project in Google Code (ASL it).  I think Hbase is
> targetted <
> >> >> Java 5, so can't take advantage of this yet internally.
> >> >>
> >> >> The sorts of API functions I thought would be useful to make code
> more
> >> >> readable would be something like:
> >> >>
> >> >>
> >> >>HTable hTable = new
> >> >> TableBuilder(hbaseConfiguration).withTableName("foo")
> >> >>.withSimpleColumnFamilies("bar", "eek",
> >> >> "moo").deleteAndRecreate();
> >> >>
> >> >> and
> >> >>
> >> >>ResultScanner scanner = new
> >> >> ResultScannerBuilder(hTable).withColumnFamilies(
> >> >>"family1", "family2").build();
> >> >>
> >> >>
> >> >> taking advantage of varargs liberally and using nice Patterns etc.
> >>  While
> >> >> the Bytes class is useful, I'd personally benefit from an API that
> can
> >> >> easily pack arbitrary multiple ints (and other data types) together
> into
> >> >> byte[] for RowKeyGoodness(tm) ala:
> >> >>
> >> >> byte[] rowKey = BytePacker.pack(fooId, barId, eekId, mooId);
> >> >>
> >> >> (behind the scenes this is a vararg method that recursively packs
> each
> >> into
> >> >> into byte[] via Bytes.add(byte[] b1, byte[] b2) etc.
> >> >>
> >> >> If anyone knows of a library that does this, pointers please.
> >> >>
> >> >> cheers,
> >> >>
> >> >> Paul
> >> >
> >>
> >> I could see this being very useful. My first barrier to hbase was
> >> trying to figure out how to turn what I knew of as an SQL select cause
> >> into a set of HBaser server side filters. Mostly, I pieced this
> >> together with help from the list, and the Test Cases. That could be
> >> frustrating for some.  Now that I am used to it, I notice that the
> >> HBase way is actually much cleaner and much less code.
> >>
> >> So, yes a helper library is a great thing.
> >>
> >> As part of the "proof of concept" I am working on, large sections of
> >> it are mostly descriptions of doing things like column projections in
> >> both SQL and HBase with filters. So I think both are very helpful for
> >> making Hbase more attractive to an end user.
> >>
> >
>
> All interesting. In a sense, I believe you should learn to walk before
> you can run :). It is hard to troubleshoot how an ORM mapper is
> working if you basically clueless on the Hbase API.
>
> You know when lots of user tools get pulled in the mix:
> q: How do I only get column X?
> a: You need to get a spring inject able, grails, restful, ORM mapper,
> that is only found in git, but there is like 4 forks of it, so pick
> this one :)
>


Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Kevin Peterson
On Tue, Dec 15, 2009 at 9:21 AM, Gary Helmling  wrote:

> I completely agree with the need to understand both the fundamental HBase
> API, and how HBase stores data at a low level.  Both are very important in
> knowing how to structure your data for best performance.  Which you should
> figure out before moving on to other niceties. Code (ASL it).  I think
> Hbase is
>
> On the other hand, forcing the user to understand the details of how data
is stored instead of presenting them with a well abstracted api makes the
learning curve steeper.

These kinds of cleaner APIs would be a good way to prevent the standard
situation of one engineer on the team figuring out HBase, then others say
"why is this so complicated" so they write an internal set of wrappers and
utility methods.

This wouldn't solve the problems for people who want a full ORM, but I think
there's an in-between sweet spot that abstracts away byte[] but still
exposes column families and such.


Re: running unit test based on HBaseClusterTestCase

2009-12-15 Thread Stack

Do you have hadoop jars in your eclipse classpath?
Stack



On Dec 14, 2009, at 10:58 PM, Guohua Hao  wrote:


Hello All,

In my own application, I have a unit test case which extends
HBaseClusterTestCase in order to test some of my operation over HBase
cluster. I override the setup function in my own test case, and this  
setup

function begins with super.setup() function call.

When I try to run my unit test from within Eclipse, I got the  
following

error:

java.lang.NoSuchMethodError:
org.apache.hadoop.security.UserGroupInformation.setCurrentUser(Lorg/ 
apache/hadoop/security/UserGroupInformation;)V
   at org.apache.hadoop.hdfs.MiniDFSCluster. 
(MiniDFSCluster.java:236)
   at org.apache.hadoop.hdfs.MiniDFSCluster. 
(MiniDFSCluster.java:119)

   at
org.apache.hadoop.hbase.HBaseClusterTestCase.setUp 
(HBaseClusterTestCase.java:123)


I included the hadoop-0.20.1-core.jar in my classpath, since this  
jar file

contains the org.apache.hadoop.security.UserGroupInformation class.

Could anybody give me some hint on how to solve this problem?

Thank you very much,
Guohua


Re: Fwd: What should we expect from Hama Examples Rand?

2009-12-15 Thread Andrew Purtell
You are missing some supporting jar.

> java.io.IOException: java.io.IOException: java.lang.NullPointerException
>  at java.lang.Class.searchMethods(Unknown Source)

Note that the exception is in a JVM method (java.lang.Class.searchMethods).
This is not really a HBase problem per se, but instead very likely a 
classpath issue. It looks to me like the JVM cannot (transitively) load a
class to handle the RPC.

Are all of the supporting jars for Hadoop and HBase on the classpath? It
seems at least one is not. 

   - Andy






From: Edward J. Yoon 
To: hbase-user@hadoop.apache.org
Cc: alan.rat...@ngc.com
Sent: Tue, December 15, 2009 1:07:02 AM
Subject: Fwd: What should we expect from Hama Examples Rand?

Hi,

'RAND' example of hama-examples.jar is basically a simple M/R job that
creates a table filled with random numbers. So, before the run Hama,
Pls check whether you able to create tables via hbase shell.

Can someone of Hbase the developers help this problem?

Thanks

-- Forwarded message --
From: Ratner, Alan S (IS) 
Date: Sat, Dec 12, 2009 at 1:32 AM
Subject: What should we expect from Hama Examples Rand?
To: hama-u...@incubator.apache.org


Having fixed the groomserver problem I did the following:



1) clean out /tmp files

2) format Hadoop namenode

3) start Hadoop

4) start HBase/Zookeeper

5) start Hama

6) launch Hama examples rand -m 10 -r 10 2000 2000 30.5% matrixA



The outcome is puzzling.  A 3-second long diary shows up in the Hama
log file reporting nothing unusual.  But the terminal reported
problems with HBase.  When I Googled this problem all I saw was
someone who had multiple versions of HBase on their system.  I am
using a fresh VM with Ubuntu 8.04, Hadoop 0.20.1, Zookeeper 3.2.1,
HBase 0.20.2, Hama 0.2.0 and JDK 1.6.0_17.  No older versions of
anything was ever installed.



BTW: I thought the problem might be related to my running HBase in
standalone mode so I just switched to HBase pseudo-distributed mode
but I see the same problems.



Any help appreciated.  -- Alan



Hama log

==

Fri Dec 11 09:23:03 EST 2009 Starting master on ngc

ulimit -n 1024

2009-12-11 09:23:05,512 INFO org.apache.hama.HamaMaster: STARTUP_MSG:

/

STARTUP_MSG: Starting HamaMaster

STARTUP_MSG:   host = ngc/127.0.1.1

STARTUP_MSG:   args = [start]

STARTUP_MSG:   version = 0.20.1

STARTUP_MSG:   build =
http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.20.1-rc1
-r 810220; compiled by 'oom' on Tue Sep  1 20:55:56 UTC 2009

/

2009-12-11 09:23:05,939 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=HamaMaster, port=4

2009-12-11 09:23:06,634 INFO org.apache.hama.HamaMaster: Cleaning up
the system directory

2009-12-11 09:23:06,710 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting

2009-12-11 09:23:06,716 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 4: starting

2009-12-11 09:23:06,721 INFO org.apache.hama.HamaMaster: Starting RUNNING

2009-12-11 09:23:06,721 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 4: starting



Hama terminal

/hadoop-0.20.1-test.jar:/home/ngc/Desktop/hama/bin/../lib/hbase-0.20.0.jar:/home/ngc/Desktop/hama/bin/../lib/hbase-0.20.0-test.jar:/home/ngc/Desktop/hama/bin/../lib/jasper-

compiler-5.5.12.jar:/home/ngc/Desktop/hama/bin/../lib/jasper-runtime-5.5.12.jar:/home/ngc/Desktop/hama/bin/../lib/javacc.jar:/home/ngc/Desktop/hama/bin/../lib/jetty-6.1.14.jar:/home/ngc/Desktop/hama/bin/../lib/jetty-util-6.1.14.jar:/home/ngc/Desktop/hama/bin/../lib/jruby-complete-1.2.0.jar:/home/ngc/Desktop/hama/bin/../lib/json.jar:/home/ngc/Desktop/hama/bin/../lib/junit-3.8.1.jar:/home/ngc/Desktop/hama/bin/../lib/libthrift-r771587.jar:/home/ngc/Desktop/hama/bin/../lib/log4j-1.2.13.jar:/home/ngc/Desktop/hama/bin/../lib/log4j-1.2.15.jar:/home/ngc/Desktop/hama/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/ngc/Desktop/hama/bin/../lib/xmlenc-0.52.jar:/home/ngc/Desktop/hama/bin/../lib/zookeeper-3.2.1.jar:/home/ngc/Desktop/hama/bin/../lib/jetty-ext/*.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/annotations.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/ant.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/as
m-analysis-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-commons-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-tree-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-util-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/asm-xml-3.0.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/bcel.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/dom4j-full.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs/findbugs-ant.jar:/home/ngc/Desktop/hama/bin/../lib/findbugs

hlogs do not get cleared

2009-12-15 Thread Kevin Peterson
We're running a 13 node HBase cluster. We had some problems a week ago with
it being overloaded and errors related to not being able to find a block on
HDFS, but adding four more nodes and increasing max heap from 3GB to 4.5GB
on all nodes fixed any problems.

Looking at the logs now, though, we see that HLogs are not getting removed:

2009-12-15 01:45:48,426 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll
/hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260867136036,
entries=210524, calcsize=63757422, filesize=41073798. New hlog
/hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260870348421
2009-12-15 01:45:48,427 INFO org.apache.hadoop.hbase.regionserver.HLog: Too
many hlogs: logs=130, maxlogs=96; forcing flush of region with oldest edits:
articles-article-id,f15489ea-38a4-4127-9179-1b2dc5f3b5d4,1260083783909
2009-12-15 01:57:14,188 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on region
articles,\x00\x00\x01\x25\x8C\x0F\xCB\x18\xB5U\xF7\xC6\x5DoH\xB8\x98\xEBH,E\x7C\x07\x14,1260830133341
2009-12-15 01:57:17,519 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll
/hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260870348421,
entries=92795, calcsize=63908073, filesize=54042783. New hlog
/hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260871037510
2009-12-15 01:57:17,519 INFO org.apache.hadoop.hbase.regionserver.HLog: Too
many hlogs: logs=131, maxlogs=96; forcing flush of region with oldest edits:
articles-article-id,f1cd1b02-3d1b-453c-b44f-94ec5a1e3a46,1260007536878

>From reading the log message, I interpret this as saying that every time it
rolls an hlog, if there are more than maxlogs logs, it will flush one
region. I'm assuming that a log could have edits for multiple regions, so
this seems to mean that if we have 100 regions and maxlogs set to 96, if it
flushes one region each time it rolls a log, it will create 100 logs before
it flushes all regions and is able to delete the log, so it will reach
steady state at 196 hlogs. Is this correct?

We're concerned because when we had problems last week, we saw lots of log
messages related to "Too many hlogs" and had assumed they were related to
the problems. Is this anything to worry about?


Re: hlogs do not get cleared

2009-12-15 Thread Jean-Daniel Cryans
Kevin,

Too many hlogs means that the inserts are hitting a lot of regions,
that those regions aren't filled enough to flush so that we have to
force flush them to give some room. When you added region servers, it
spread the regions load so that hlogs were getting filled at a slower
rate.

Could you tell us more about the rate of insertion, size of data, and
number of regions per region server?

Thx,

J-D

On Tue, Dec 15, 2009 at 10:34 AM, Kevin Peterson  wrote:
> We're running a 13 node HBase cluster. We had some problems a week ago with
> it being overloaded and errors related to not being able to find a block on
> HDFS, but adding four more nodes and increasing max heap from 3GB to 4.5GB
> on all nodes fixed any problems.
>
> Looking at the logs now, though, we see that HLogs are not getting removed:
>
> 2009-12-15 01:45:48,426 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll
> /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260867136036,
> entries=210524, calcsize=63757422, filesize=41073798. New hlog
> /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260870348421
> 2009-12-15 01:45:48,427 INFO org.apache.hadoop.hbase.regionserver.HLog: Too
> many hlogs: logs=130, maxlogs=96; forcing flush of region with oldest edits:
> articles-article-id,f15489ea-38a4-4127-9179-1b2dc5f3b5d4,1260083783909
> 2009-12-15 01:57:14,188 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on region
> articles,\x00\x00\x01\x25\x8C\x0F\xCB\x18\xB5U\xF7\xC6\x5DoH\xB8\x98\xEBH,E\x7C\x07\x14,1260830133341
> 2009-12-15 01:57:17,519 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll
> /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260870348421,
> entries=92795, calcsize=63908073, filesize=54042783. New hlog
> /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260871037510
> 2009-12-15 01:57:17,519 INFO org.apache.hadoop.hbase.regionserver.HLog: Too
> many hlogs: logs=131, maxlogs=96; forcing flush of region with oldest edits:
> articles-article-id,f1cd1b02-3d1b-453c-b44f-94ec5a1e3a46,1260007536878
>
> From reading the log message, I interpret this as saying that every time it
> rolls an hlog, if there are more than maxlogs logs, it will flush one
> region. I'm assuming that a log could have edits for multiple regions, so
> this seems to mean that if we have 100 regions and maxlogs set to 96, if it
> flushes one region each time it rolls a log, it will create 100 logs before
> it flushes all regions and is able to delete the log, so it will reach
> steady state at 196 hlogs. Is this correct?
>
> We're concerned because when we had problems last week, we saw lots of log
> messages related to "Too many hlogs" and had assumed they were related to
> the problems. Is this anything to worry about?
>


Re: Performance related question

2009-12-15 Thread Jean-Daniel Cryans
Given that m1.small has 1 CPU, 1.7GB of RAM and 1/8 (or less) the IO
of the host machine and counting in the fact that those machines are
networked as a whole I expect it to much much slower that your local
machine. Those machines are so under-powered that the overhead of
hadoop/hbase probably overwhelms any gain from the total number of
nodes. Instead do this:

- Replace all your m1.small with m1.large in a factor of 4:1.
- Don't give ZK their own machine, in such a small environment it
doesn't make much sense. (give them their own EBS maybe)
- Use an ensemble of only 3 peers.
- Give HBase plenty of RAM like 4GB.

WRT your mappers, make sure you use scanner pre-fetching. In your job
setup set hbase.client.scanner.caching to something like 30.

J-D

On Tue, Dec 15, 2009 at 9:14 AM, Something Something
 wrote:
> Thanks J-D & Mtohiko for the tips.  Significant improvement in performance,
> but there's still room for improvement.  In my local pseudo distributed mode
> the 2 map reduce jobs now run in less than 4 minutes (from 32 mins) and in
> cluster of 10 nodes + 5 zk nodes they run in 11 minutes (down from 1 hour &
> 30 mins).  But still I would like to come to a point where they run faster
> on a cluster than on my local machine.
>
> Here's what I did:
>
> 1)  Fixed a bug in my code that was causing unnecessary writes to HBase.
> 2)  Added these two lines after creating 'new HTable':
>        table.setAutoFlush(false);
>        table.setWriteBufferSize(1024*1024*12);
> 3)  Added this line after Put:
>        put.setWriteToWAL(false);
> 4)  Added this line (only when running on cluster):
>    job.setNumReduceTasks(20);
>
> There are other 64-bit related improvements which I cannot try; mainly
> because Amazon charges (way) too much for 64-bit machines.  It costs me over
> $25 for 15 machines for less than 3 hours, so I switched to 'm1.small'
> 32-bit machines.  Of course, one of the promises of the distributed
> computing is that we will be able to use "cheap commodity hardware", right
> :)  So I would like to stick with 'm1.small' for now.  (But I am willing to
> use about 30 machines if that's going to help.)
>
> Anyway, I have noticed that one of my Mappers is taking too long.  If anyone
> would share ideas of how to improve Mapper speed, that would be greatly
> appreciated.  Basically, in this Mapper I read about 50,000 rows from a
> HBase table using TableMapReduceUtil.initTableMapperJob() and do some
> complex processing for "values" of each row.  I don't write anything back in
> HBase, but I do write quite a few lines (context.write()) to HDFS.  Any
> suggestions?
>
> Thanks once again for the help.
>
>
>
> 2009/12/13 
>
>> Hello,
>>
>> Something Something  wrote:
>> > PS:  One thing I have noticed is that it goes to 66% very fast and then
>> > slows down from there..
>>
>> It seems that only one reducer works. You should increase reduce tasks.
>> The default reduce task's number is written on
>> hadoop/docs/mapred-default.html.
>> The default parameter of mapred.reduce.tasks is 1. So only one reduce task
>> runs.
>>
>> There are two ways to increase reduce tasks:
>> 1. Use Job.setNumReduceTasks(int tasks) on your MapReduce job file.
>> 2. Denote more mapred.reduce.tasks on hadoop/conf/mapred-site.xml.
>>
>> You can get the best perfomance if you run 20 reduce tasks. The detail of
>> the number
>> of reduce tasks is written on
>> http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Reducer
>> at "How many Reduces?" as J-D wrote. Notice that
>> JobConf.setNumReduceTasks(int) is
>> already deprecated, so you should use Job.setNumReduceTasks(int tasks)
>> rather than
>> JobConf.setNumReduceTasks(int).
>> --
>> Motohiko Mouri
>>
>


Re: running unit test based on HBaseClusterTestCase

2009-12-15 Thread Guohua Hao
Yes, I included all the necessary jar files I think.  I guess my problem is
probably related to my eclipse setup.

I can create a MiniDFSCluster object by running my application in command
line (e.g., bin/hadoop myApplicationClass) , and a MiniDFSCluster object is
created inside the main function of myApplicationClass. But I can NOT run
this program within eclipse, probably I did not do it in the right way. I
got the similar error message saying

java.lang.NoSuchMethodError:
org.apache.hadoop.security.
>
>
> UserGroupInformation.setCurrentUser(Lorg/apache/hadoop/security/UserGroupInformation;)V
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:236)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119)
>

 Could you guys please give me more hint?

Thanks
Guohua



On Tue, Dec 15, 2009 at 11:59 AM, Stack  wrote:

> Do you have hadoop jars in your eclipse classpath?
> Stack
>
>
>
>
> On Dec 14, 2009, at 10:58 PM, Guohua Hao  wrote:
>
>  Hello All,
>>
>> In my own application, I have a unit test case which extends
>> HBaseClusterTestCase in order to test some of my operation over HBase
>> cluster. I override the setup function in my own test case, and this setup
>> function begins with super.setup() function call.
>>
>> When I try to run my unit test from within Eclipse, I got the following
>> error:
>>
>> java.lang.NoSuchMethodError:
>>
>> org.apache.hadoop.security.UserGroupInformation.setCurrentUser(Lorg/apache/hadoop/security/UserGroupInformation;)V
>>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:236)
>>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119)
>>   at
>>
>> org.apache.hadoop.hbase.HBaseClusterTestCase.setUp(HBaseClusterTestCase.java:123)
>>
>> I included the hadoop-0.20.1-core.jar in my classpath, since this jar file
>> contains the org.apache.hadoop.security.UserGroupInformation class.
>>
>> Could anybody give me some hint on how to solve this problem?
>>
>> Thank you very much,
>> Guohua
>>
>


Re: Performance related question

2009-12-15 Thread Patrick Hunt
Btw, nothing says that ZK users (incl hbase) _must_ run a multi-node ZK 
ensemble. For coordination tasks a single ZK server (standalone mode) is 
often sufficient, you just need to realize you are sacrificing 
reliability/availability.


Going from 1 -> 3 -> 5 -> 7 ZK servers in an ensemble should primarily 
be driven by reliability requirements. See this page for details on 
performance studies I've made for standalone and 3 server ZK ensembles: 
http://bit.ly/4ekN8G


Patrick

Jean-Daniel Cryans wrote:

Given that m1.small has 1 CPU, 1.7GB of RAM and 1/8 (or less) the IO
of the host machine and counting in the fact that those machines are
networked as a whole I expect it to much much slower that your local
machine. Those machines are so under-powered that the overhead of
hadoop/hbase probably overwhelms any gain from the total number of
nodes. Instead do this:

- Replace all your m1.small with m1.large in a factor of 4:1.
- Don't give ZK their own machine, in such a small environment it
doesn't make much sense. (give them their own EBS maybe)
- Use an ensemble of only 3 peers.
- Give HBase plenty of RAM like 4GB.

WRT your mappers, make sure you use scanner pre-fetching. In your job
setup set hbase.client.scanner.caching to something like 30.

J-D

On Tue, Dec 15, 2009 at 9:14 AM, Something Something
 wrote:

Thanks J-D & Mtohiko for the tips.  Significant improvement in performance,
but there's still room for improvement.  In my local pseudo distributed mode
the 2 map reduce jobs now run in less than 4 minutes (from 32 mins) and in
cluster of 10 nodes + 5 zk nodes they run in 11 minutes (down from 1 hour &
30 mins).  But still I would like to come to a point where they run faster
on a cluster than on my local machine.

Here's what I did:

1)  Fixed a bug in my code that was causing unnecessary writes to HBase.
2)  Added these two lines after creating 'new HTable':
   table.setAutoFlush(false);
   table.setWriteBufferSize(1024*1024*12);
3)  Added this line after Put:
   put.setWriteToWAL(false);
4)  Added this line (only when running on cluster):
   job.setNumReduceTasks(20);

There are other 64-bit related improvements which I cannot try; mainly
because Amazon charges (way) too much for 64-bit machines.  It costs me over
$25 for 15 machines for less than 3 hours, so I switched to 'm1.small'
32-bit machines.  Of course, one of the promises of the distributed
computing is that we will be able to use "cheap commodity hardware", right
:)  So I would like to stick with 'm1.small' for now.  (But I am willing to
use about 30 machines if that's going to help.)

Anyway, I have noticed that one of my Mappers is taking too long.  If anyone
would share ideas of how to improve Mapper speed, that would be greatly
appreciated.  Basically, in this Mapper I read about 50,000 rows from a
HBase table using TableMapReduceUtil.initTableMapperJob() and do some
complex processing for "values" of each row.  I don't write anything back in
HBase, but I do write quite a few lines (context.write()) to HDFS.  Any
suggestions?

Thanks once again for the help.



2009/12/13 


Hello,

Something Something  wrote:

PS:  One thing I have noticed is that it goes to 66% very fast and then
slows down from there..

It seems that only one reducer works. You should increase reduce tasks.
The default reduce task's number is written on
hadoop/docs/mapred-default.html.
The default parameter of mapred.reduce.tasks is 1. So only one reduce task
runs.

There are two ways to increase reduce tasks:
1. Use Job.setNumReduceTasks(int tasks) on your MapReduce job file.
2. Denote more mapred.reduce.tasks on hadoop/conf/mapred-site.xml.

You can get the best perfomance if you run 20 reduce tasks. The detail of
the number
of reduce tasks is written on
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Reducer
at "How many Reduces?" as J-D wrote. Notice that
JobConf.setNumReduceTasks(int) is
already deprecated, so you should use Job.setNumReduceTasks(int tasks)
rather than
JobConf.setNumReduceTasks(int).
--
Motohiko Mouri



Re: running unit test based on HBaseClusterTestCase

2009-12-15 Thread stack
Order can be important.  Don't forget to include conf directories.  Below is
from an eclipse .classpath that seems to work for me:












































St.Ack


On Tue, Dec 15, 2009 at 11:21 AM, Guohua Hao  wrote:

> Yes, I included all the necessary jar files I think.  I guess my problem is
> probably related to my eclipse setup.
>
> I can create a MiniDFSCluster object by running my application in command
> line (e.g., bin/hadoop myApplicationClass) , and a MiniDFSCluster object is
> created inside the main function of myApplicationClass. But I can NOT run
> this program within eclipse, probably I did not do it in the right way. I
> got the similar error message saying
>
> java.lang.NoSuchMethodError:
> org.apache.hadoop.security.
> >
> >
> >
> UserGroupInformation.setCurrentUser(Lorg/apache/hadoop/security/UserGroupInformation;)V
> >   at
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:236)
> >   at
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119)
> >
>
>  Could you guys please give me more hint?
>
> Thanks
> Guohua
>
>
>
> On Tue, Dec 15, 2009 at 11:59 AM, Stack  wrote:
>
> > Do you have hadoop jars in your eclipse classpath?
> > Stack
> >
> >
> >
> >
> > On Dec 14, 2009, at 10:58 PM, Guohua Hao  wrote:
> >
> >  Hello All,
> >>
> >> In my own application, I have a unit test case which extends
> >> HBaseClusterTestCase in order to test some of my operation over HBase
> >> cluster. I override the setup function in my own test case, and this
> setup
> >> function begins with super.setup() function call.
> >>
> >> When I try to run my unit test from within Eclipse, I got the following
> >> error:
> >>
> >> java.lang.NoSuchMethodError:
> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.setCurrentUser(Lorg/apache/hadoop/security/UserGroupInformation;)V
> >>   at
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:236)
> >>   at
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:119)
> >>   at
> >>
> >>
> org.apache.hadoop.hbase.HBaseClusterTestCase.setUp(HBaseClusterTestCase.java:123)
> >>
> >> I included the hadoop-0.20.1-core.jar in my classpath, since this jar
> file
> >> contains the org.apache.hadoop.security.UserGroupInformation class.
> >>
> >> Could anybody give me some hint on how to solve this problem?
> >>
> >> Thank you very much,
> >> Guohua
> >>
> >
>


Re: hlogs do not get cleared

2009-12-15 Thread stack
I'd advise setting the upper limit for WALs back down to 32 rather than the
96 you have.  Lets figure why old logs are not being cleared up even if only
32.  When 96, it means that on crash, the log splitting process has more
logs to process (~96 rather than ~32).  It'll take longer for the split
process to run and therefore longer for the regions to come back on line.

Is this the state of things across all regionservers or just one or two?  As
J-D asks, your loading profile, how many regions per regionserver would be
of interest.  Next up would be your putting up a regionserver log that we
could pull and look at.  We'd check the edit sequence numbers to figure why
we're not letting logs go.

Thanks Kevin,
St.Ack

On Tue, Dec 15, 2009 at 10:34 AM, Kevin Peterson  wrote:

> We're running a 13 node HBase cluster. We had some problems a week ago with
> it being overloaded and errors related to not being able to find a block on
> HDFS, but adding four more nodes and increasing max heap from 3GB to 4.5GB
> on all nodes fixed any problems.
>
> Looking at the logs now, though, we see that HLogs are not getting removed:
>
> 2009-12-15 01:45:48,426 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Roll
> /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260867136036,
> entries=210524, calcsize=63757422, filesize=41073798. New hlog
> /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260870348421
> 2009-12-15 01:45:48,427 INFO org.apache.hadoop.hbase.regionserver.HLog: Too
> many hlogs: logs=130, maxlogs=96; forcing flush of region with oldest
> edits:
> articles-article-id,f15489ea-38a4-4127-9179-1b2dc5f3b5d4,1260083783909
> 2009-12-15 01:57:14,188 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on region
>
> articles,\x00\x00\x01\x25\x8C\x0F\xCB\x18\xB5U\xF7\xC6\x5DoH\xB8\x98\xEBH,E\x7C\x07\x14,1260830133341
> 2009-12-15 01:57:17,519 INFO org.apache.hadoop.hbase.regionserver.HLog:
> Roll
> /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260870348421,
> entries=92795, calcsize=63908073, filesize=54042783. New hlog
> /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260871037510
> 2009-12-15 01:57:17,519 INFO org.apache.hadoop.hbase.regionserver.HLog: Too
> many hlogs: logs=131, maxlogs=96; forcing flush of region with oldest
> edits:
> articles-article-id,f1cd1b02-3d1b-453c-b44f-94ec5a1e3a46,1260007536878
>
> From reading the log message, I interpret this as saying that every time it
> rolls an hlog, if there are more than maxlogs logs, it will flush one
> region. I'm assuming that a log could have edits for multiple regions, so
> this seems to mean that if we have 100 regions and maxlogs set to 96, if it
> flushes one region each time it rolls a log, it will create 100 logs before
> it flushes all regions and is able to delete the log, so it will reach
> steady state at 196 hlogs. Is this correct?
>
> We're concerned because when we had problems last week, we saw lots of log
> messages related to "Too many hlogs" and had assumed they were related to
> the problems. Is this anything to worry about?
>


Re: Help on HBase shell alter command usage

2009-12-15 Thread Ted Yu
Hi,
I saw the following from scan 'crawltable' command in hbase shell:
...
 com.onsoft.www:http/column=stt:, timestamp=1260405530801,
value=\003
3 row(s) in 0.2490 seconds

How do I query the value for stt column ?

hbase(main):005:0> get 'crawltable', 'com.onsoft.www:http/', { column='stt:'
}
SyntaxError: (hbase):6: odd number list for Hash.
from (hbase):6

Can someone explain this 'odd number' error ?

Thanks

On Mon, Dec 14, 2009 at 10:16 PM, stack  wrote:

> Are you using hbase 0.20?  If so, there is no 'compress'.  Its NONE, LZO,
> or
> GZIP (You'll have to build lzo yourself.  See hbase wiki for how).
>
> See the shell help.  It has examples of how to change parameters on column
> families.
>
> St.Ack
>
> 2009/12/14 Xin Jing 
>
> > Hi all,
> >
> > I want to change the column family property for a existing hbase table.
> > Setting one comlumn family COMPRESSION from 'none' to comress, and chagne
> > one column family IN_MEMORY from 'false' to 'true'.
> >
> > I want to use hbase shell to achieve that, but I cannot find the detailed
> > description on 'alter' command. Could anyone point me to a reference on
> > that?
> >
> > Thanks
> > - Xin
> >
>


Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread stack
On Tue, Dec 15, 2009 at 9:56 AM, Kevin Peterson  wrote:

> These kinds of cleaner APIs would be a good way to prevent the standard
> situation of one engineer on the team figuring out HBase, then others say
> "why is this so complicated" so they write an internal set of wrappers and
> utility methods.
>

This wouldn't solve the problems for people who want a full ORM, but I think
> there's an in-between sweet spot that abstracts away byte[] but still
> exposes column families and such.
>


What do fellas think of Lars' George's genercizing (sp? word?) of the client
API?  See his patch up in https://issues.apache.org/jira/browse/HBASE-1990.
Would this be enough?
St.Ack


Re: Help on HBase shell alter command usage

2009-12-15 Thread stack
Try:

hbase(main):005:0> get 'crawltable', 'com.onsoft.www:http/', { COLUMNS =>
'stt:'}

i.e. '=>' rather than '='.  Also, its COLUMNS (uppercase I believe) rather
than column.

Run 'help' in the shell for help and examples.

St.Ack

On Tue, Dec 15, 2009 at 11:53 AM, Ted Yu  wrote:

> Hi,
> I saw the following from scan 'crawltable' command in hbase shell:
> ...
>  com.onsoft.www:http/column=stt:, timestamp=1260405530801,
> value=\003
> 3 row(s) in 0.2490 seconds
>
> How do I query the value for stt column ?
>
> hbase(main):005:0> get 'crawltable', 'com.onsoft.www:http/', {
> column='stt:'
> }
> SyntaxError: (hbase):6: odd number list for Hash.
>from (hbase):6
>
> Can someone explain this 'odd number' error ?
>
> Thanks
>
> On Mon, Dec 14, 2009 at 10:16 PM, stack  wrote:
>
> > Are you using hbase 0.20?  If so, there is no 'compress'.  Its NONE, LZO,
> > or
> > GZIP (You'll have to build lzo yourself.  See hbase wiki for how).
> >
> > See the shell help.  It has examples of how to change parameters on
> column
> > families.
> >
> > St.Ack
> >
> > 2009/12/14 Xin Jing 
> >
> > > Hi all,
> > >
> > > I want to change the column family property for a existing hbase table.
> > > Setting one comlumn family COMPRESSION from 'none' to comress, and
> chagne
> > > one column family IN_MEMORY from 'false' to 'true'.
> > >
> > > I want to use hbase shell to achieve that, but I cannot find the
> detailed
> > > description on 'alter' command. Could anyone point me to a reference on
> > > that?
> > >
> > > Thanks
> > > - Xin
> > >
> >
>


Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Tim Robertson
Seems like an intuitive option to me.

Tim

On Tue, Dec 15, 2009 at 9:04 PM, stack  wrote:
> On Tue, Dec 15, 2009 at 9:56 AM, Kevin Peterson  wrote:
>
>> These kinds of cleaner APIs would be a good way to prevent the standard
>> situation of one engineer on the team figuring out HBase, then others say
>> "why is this so complicated" so they write an internal set of wrappers and
>> utility methods.
>>
>
> This wouldn't solve the problems for people who want a full ORM, but I think
>> there's an in-between sweet spot that abstracts away byte[] but still
>> exposes column families and such.
>>
>
>
> What do fellas think of Lars' George's genercizing (sp? word?) of the client
> API?  See his patch up in https://issues.apache.org/jira/browse/HBASE-1990.
> Would this be enough?
> St.Ack
>


Re: Help on HBase shell alter command usage

2009-12-15 Thread Ted Yu
That works.

scan command gives values for columns.
Is there a shell command which lists unique row values, such as
'com.onsoft.www:http/' ?

Thanks

On Tue, Dec 15, 2009 at 12:09 PM, stack  wrote:

> Try:
>
> hbase(main):005:0> get 'crawltable', 'com.onsoft.www:http/', { COLUMNS =>
> 'stt:'}
>
> i.e. '=>' rather than '='.  Also, its COLUMNS (uppercase I believe) rather
> than column.
>
> Run 'help' in the shell for help and examples.
>
> St.Ack
>
> On Tue, Dec 15, 2009 at 11:53 AM, Ted Yu  wrote:
>
> > Hi,
> > I saw the following from scan 'crawltable' command in hbase shell:
> > ...
> >  com.onsoft.www:http/column=stt:, timestamp=1260405530801,
> > value=\003
> > 3 row(s) in 0.2490 seconds
> >
> > How do I query the value for stt column ?
> >
> > hbase(main):005:0> get 'crawltable', 'com.onsoft.www:http/', {
> > column='stt:'
> > }
> > SyntaxError: (hbase):6: odd number list for Hash.
> >from (hbase):6
> >
> > Can someone explain this 'odd number' error ?
> >
> > Thanks
> >
> > On Mon, Dec 14, 2009 at 10:16 PM, stack  wrote:
> >
> > > Are you using hbase 0.20?  If so, there is no 'compress'.  Its NONE,
> LZO,
> > > or
> > > GZIP (You'll have to build lzo yourself.  See hbase wiki for how).
> > >
> > > See the shell help.  It has examples of how to change parameters on
> > column
> > > families.
> > >
> > > St.Ack
> > >
> > > 2009/12/14 Xin Jing 
> > >
> > > > Hi all,
> > > >
> > > > I want to change the column family property for a existing hbase
> table.
> > > > Setting one comlumn family COMPRESSION from 'none' to comress, and
> > chagne
> > > > one column family IN_MEMORY from 'false' to 'true'.
> > > >
> > > > I want to use hbase shell to achieve that, but I cannot find the
> > detailed
> > > > description on 'alter' command. Could anyone point me to a reference
> on
> > > > that?
> > > >
> > > > Thanks
> > > > - Xin
> > > >
> > >
> >
>


Re: hlogs do not get cleared

2009-12-15 Thread Kevin Peterson
On Tue, Dec 15, 2009 at 10:43 AM, Jean-Daniel Cryans wrote:

>
> Too many hlogs means that the inserts are hitting a lot of regions,
> that those regions aren't filled enough to flush so that we have to
> force flush them to give some room. When you added region servers, it
> spread the regions load so that hlogs were getting filled at a slower
> rate.
>
> Could you tell us more about the rate of insertion, size of data, and
> number of regions per region server?
>
>
This makes some sense now. I currently have 2200 regions across 3 tables. My
largest table accounts for about 1600 of those regions and is mostly active
at one end of the keyspace -- our key is based on date, but data only
roughly arrives in order. I also write to two secondary indexes, which have
no pattern to the key at all. One of these secondary tables has 488 regions
and the other has 96 regions.

We write about 10M items per day to the main table (articles). All of these
get written to one of the secondary indexes (article-ids). About a third get
written to the other secondary index. Total volume of data is about 10GB /
day written.

I think the key is as you say that the regions aren't filled enough to
flush. The articles table gets mostly written to near one end and I see
splits happening regularly. The index tables have no pattern so the 10
millions writes get scattered across the different regions. I've looked more
closely at a log file (linked below), and if I forget about my main table
(which would tend to get flushed), and look only at the indexes, this seems
to be what's happening:

1. Up to maxLogs HLogs, it doesn't do any flushes.
2. Once it gets above maxLogs, it will start flushing one region each time
it creates a new HLog.
3. If the first HLog had edits for say 50 regions, it will need to flush the
region with oldest edits 50 times before the HLog can be removed.

If N is the number of regions getting written to, but not getting enough
writes to flush on their own, then I think this converges to maxLogs + N
logs on average. If I think of maxLogs as "number of logs to start flushing
regions at" this makes sense.

http://kdpeterson.net/paste/hbase-hadoop-regionserver-mi-prod-app35.ec2.biz360.com.log.2009-12-14


Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Paul Smith

On 16/12/2009, at 7:04 AM, stack wrote:

> On Tue, Dec 15, 2009 at 9:56 AM, Kevin Peterson  wrote:
> 
>> These kinds of cleaner APIs would be a good way to prevent the standard
>> situation of one engineer on the team figuring out HBase, then others say
>> "why is this so complicated" so they write an internal set of wrappers and
>> utility methods.
>> 
> 
> This wouldn't solve the problems for people who want a full ORM, but I think
>> there's an in-between sweet spot that abstracts away byte[] but still
>> exposes column families and such.
>> 
> 
> 
> What do fellas think of Lars' George's genercizing (sp? word?) of the client
> API?  See his patch up in https://issues.apache.org/jira/browse/HBASE-1990.
> Would this be enough?
> St.Ack

That's a pretty good start, but I think a good collection of useful builders 
and utilities that handle the 80% case will help HBase gain much more traction. 
 As an person starting with HBase, there are a lot of concepts to get, Bytes 
definitely get in the way of seeing the real underlying patterns.  I'm a total 
believer in understanding the internals to get the best out of a product, but 
that often comes after experimentation, and these high-level libraries grease 
the wheels for faster 'grok'ing the concepts.

Thinking out loud here, but something like this may be useful (more useful?, I 
dunno, I'm still used to this):

PutBuilder builder = new PutBuilder(hTable);
// first Row
builder.withRowKey(1stRowKey).withColumnFamily("foo").put("columnA", 
valueA).put("columnB",valueB);
// secondRow
builder.withRowKey(2ndRowKey).withColumnFamily("eek").put("columnC", 
valueC).put("columnD",valueD);
..
builder.putAll();

I also feel a little silly, because I've only JUST discovered the Writables 
class, my initial example of packing 4 ints is silly, a simple Class that 
implements Writeable is a much more elegant solution (I wasn't sure why 
Bytes.add(..) only took 2 or 3 args).

Paul




FilterList and SingleColumnValueFilter

2009-12-15 Thread Paul Ambrose
I ran into some problems with FilterList and SingleColumnValueFilter.

I created a FilterList with MUST_PASS_ONE and two SingleColumnValueFilters
(each testing equality on a different columns) and query some trivial data:

http://pastie.org/744890

The problem that I encountered were two-fold:

SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE
if the column names do not match. If FilterList is employed, then when the
first Filter returns INCLUDE (because the column names do not match), no 
more filters for that KeyValue are evaluated.  That is problematic because
when filterRow() is finally called for those filters, matchedColumn is never
found to be true because they were not invoked (due to FilterList exiting from 
the filterList iteration when the name mismatched INCLUDE was returned).  
The fix (at least for this scenario) is for 
SingleColumnValueFilter.filterKeyValues() to 
return ReturnCode.NEXT_ROW (rather than INCLUDE).

The second problem is at the bottom of FilterList.filterKeyValue()
where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator,
rather than always returning ReturnCode.INCLUDE and then leaving the
final filter decision to be made by the call to filterRow().   I am sure there 
is a good 
reason for returning SKIP in other scenarios, but it is problematic in mine.

Feedback would be much appreciated.

Paul









Re: FilterList and SingleColumnValueFilter

2009-12-15 Thread Ram Kulbak
Hi Paul,

I've encountered the same problem. I think its fixed as part of
https://issues.apache.org/jira/browse/HBASE-2037

Regards,
Yoram



On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose  wrote:

> I ran into some problems with FilterList and SingleColumnValueFilter.
>
> I created a FilterList with MUST_PASS_ONE and two SingleColumnValueFilters
> (each testing equality on a different columns) and query some trivial data:
>
> http://pastie.org/744890
>
> The problem that I encountered were two-fold:
>
> SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE
> if the column names do not match. If FilterList is employed, then when the
> first Filter returns INCLUDE (because the column names do not match), no
> more filters for that KeyValue are evaluated.  That is problematic because
> when filterRow() is finally called for those filters, matchedColumn is
> never
> found to be true because they were not invoked (due to FilterList exiting
> from
> the filterList iteration when the name mismatched INCLUDE was returned).
> The fix (at least for this scenario) is for
> SingleColumnValueFilter.filterKeyValues() to
> return ReturnCode.NEXT_ROW (rather than INCLUDE).
>
> The second problem is at the bottom of FilterList.filterKeyValue()
> where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator,
> rather than always returning ReturnCode.INCLUDE and then leaving the
> final filter decision to be made by the call to filterRow().   I am sure
> there is a good
> reason for returning SKIP in other scenarios, but it is problematic in
> mine.
>
> Feedback would be much appreciated.
>
> Paul
>
>
>
>
>
>
>
>


Re: HBase Utility functions (for Java 5+)

2009-12-15 Thread Andrew Purtell
Thanks for the feedback Paul.

I agree the Builder pattern is an interesting option. Please see
https://issues.apache.org/jira/browse/HBASE-2051

- Andy





From: Paul Smith 
To: hbase-user@hadoop.apache.org
Sent: Tue, December 15, 2009 3:21:44 PM
Subject: Re: HBase Utility functions (for Java 5+)


On 16/12/2009, at 7:04 AM, stack wrote:

> On Tue, Dec 15, 2009 at 9:56 AM, Kevin Peterson  wrote:
> 
>> These kinds of cleaner APIs would be a good way to prevent the standard
>> situation of one engineer on the team figuring out HBase, then others say
>> "why is this so complicated" so they write an internal set of wrappers and
>> utility methods.
>> 
> 
> This wouldn't solve the problems for people who want a full ORM, but I think
>> there's an in-between sweet spot that abstracts away byte[] but still
>> exposes column families and such.
>> 
> 
> 
> What do fellas think of Lars' George's genercizing (sp? word?) of the client
> API?  See his patch up in https://issues.apache.org/jira/browse/HBASE-1990.
> Would this be enough?
> St.Ack

That's a pretty good start, but I think a good collection of useful builders 
and utilities that handle the 80% case will help HBase gain much more traction. 
 As an person starting with HBase, there are a lot of concepts to get, Bytes 
definitely get in the way of seeing the real underlying patterns.  I'm a total 
believer in understanding the internals to get the best out of a product, but 
that often comes after experimentation, and these high-level libraries grease 
the wheels for faster 'grok'ing the concepts.

Thinking out loud here, but something like this may be useful (more useful?, I 
dunno, I'm still used to this):

PutBuilder builder = new PutBuilder(hTable);
// first Row
builder.withRowKey(1stRowKey).withColumnFamily("foo").put("columnA", 
valueA).put("columnB",valueB);
// secondRow
builder.withRowKey(2ndRowKey).withColumnFamily("eek").put("columnC", 
valueC).put("columnD",valueD);
...
builder.putAll();

I also feel a little silly, because I've only JUST discovered the Writables 
class, my initial example of packing 4 ints is silly, a simple Class that 
implements Writeable is a much more elegant solution (I wasn't sure why 
Bytes.add(..) only took 2 or 3 args).

Paul


  

Re: FilterList and SingleColumnValueFilter

2009-12-15 Thread stack
Paul:

I can apply the fix from hbase-2037... I can break it out of the posted
patch thats up there.  Just say the word.

St.Ack


On Tue, Dec 15, 2009 at 4:17 PM, Ram Kulbak  wrote:

> Hi Paul,
>
> I've encountered the same problem. I think its fixed as part of
> https://issues.apache.org/jira/browse/HBASE-2037
>
> Regards,
> Yoram
>
>
>
> On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose  wrote:
>
> > I ran into some problems with FilterList and SingleColumnValueFilter.
> >
> > I created a FilterList with MUST_PASS_ONE and two
> SingleColumnValueFilters
> > (each testing equality on a different columns) and query some trivial
> data:
> >
> > http://pastie.org/744890
> >
> > The problem that I encountered were two-fold:
> >
> > SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE
> > if the column names do not match. If FilterList is employed, then when
> the
> > first Filter returns INCLUDE (because the column names do not match), no
> > more filters for that KeyValue are evaluated.  That is problematic
> because
> > when filterRow() is finally called for those filters, matchedColumn is
> > never
> > found to be true because they were not invoked (due to FilterList exiting
> > from
> > the filterList iteration when the name mismatched INCLUDE was returned).
> > The fix (at least for this scenario) is for
> > SingleColumnValueFilter.filterKeyValues() to
> > return ReturnCode.NEXT_ROW (rather than INCLUDE).
> >
> > The second problem is at the bottom of FilterList.filterKeyValue()
> > where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator,
> > rather than always returning ReturnCode.INCLUDE and then leaving the
> > final filter decision to be made by the call to filterRow().   I am sure
> > there is a good
> > reason for returning SKIP in other scenarios, but it is problematic in
> > mine.
> >
> > Feedback would be much appreciated.
> >
> > Paul
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: FilterList and SingleColumnValueFilter

2009-12-15 Thread Paul Ambrose
Hey Michael,

If hbase-2037 will make it into 0.20.3, I am fine.
If not, I would greatly appreciate you breaking it out for 0.20.3.

Thanks,
Paul



On Dec 15, 2009, at 10:28 PM, stack wrote:

> Paul:
> 
> I can apply the fix from hbase-2037... I can break it out of the posted
> patch thats up there.  Just say the word.
> 
> St.Ack
> 
> 
> On Tue, Dec 15, 2009 at 4:17 PM, Ram Kulbak  wrote:
> 
>> Hi Paul,
>> 
>> I've encountered the same problem. I think its fixed as part of
>> https://issues.apache.org/jira/browse/HBASE-2037
>> 
>> Regards,
>> Yoram
>> 
>> 
>> 
>> On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose  wrote:
>> 
>>> I ran into some problems with FilterList and SingleColumnValueFilter.
>>> 
>>> I created a FilterList with MUST_PASS_ONE and two
>> SingleColumnValueFilters
>>> (each testing equality on a different columns) and query some trivial
>> data:
>>> 
>>> http://pastie.org/744890
>>> 
>>> The problem that I encountered were two-fold:
>>> 
>>> SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE
>>> if the column names do not match. If FilterList is employed, then when
>> the
>>> first Filter returns INCLUDE (because the column names do not match), no
>>> more filters for that KeyValue are evaluated.  That is problematic
>> because
>>> when filterRow() is finally called for those filters, matchedColumn is
>>> never
>>> found to be true because they were not invoked (due to FilterList exiting
>>> from
>>> the filterList iteration when the name mismatched INCLUDE was returned).
>>> The fix (at least for this scenario) is for
>>> SingleColumnValueFilter.filterKeyValues() to
>>> return ReturnCode.NEXT_ROW (rather than INCLUDE).
>>> 
>>> The second problem is at the bottom of FilterList.filterKeyValue()
>>> where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator,
>>> rather than always returning ReturnCode.INCLUDE and then leaving the
>>> final filter decision to be made by the call to filterRow().   I am sure
>>> there is a good
>>> reason for returning SKIP in other scenarios, but it is problematic in
>>> mine.
>>> 
>>> Feedback would be much appreciated.
>>> 
>>> Paul
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>