Doubt in RegExpRowFilter and RowFilters in general

2008-02-11 Thread David Alves
Hi Guys
In my previous email I might have misunderstood the roles of the
RowFilterInterfaces so I'll pose my question more clearly (since the
last one wasn't in question form :)).
I save a setup when a table has to columns belonging to different
column families (Table A cf1:a cf2:b));

I'm trying to build a filter so that a scanner only returns the rows
where cf1:a = myvalue1 and cf2:b = myvalue2.

I've build a RegExpRowFilter like this;
MapText, byte[] conditionalsMap = new HashMapText, byte[]();
conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
return new RegExpRowFilter(.*, conditionalsMap);

My problem is this filter always fails when I know for sure that there
are rows whose columns match my values.

I'm building the the scanner like this (the purpose in this case is to
find if there are more values that match my filter):

final Text startKey = this.htable.getStartKeys()[0];
HScannerInterface scanner = htable.obtainScanner(new 
Text[] {new
Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
return scanner.iterator().hasNext();

Can anyone give me a hand please.

Thanks in advance
David Alves





Re: Doubt in RegExpRowFilter and RowFilters in general

2008-02-11 Thread stack
Have you tried enabling DEBUG-level logging?  Filters have lots of 
logging around state changes.  Might help figure this issue.  You might 
need to add extra logging around line #2401 in HStore.


(I just spent some time trying to bend my head around whats going on.  
Filters are run at the Store level.  It looks like that in 
RegExpRowFilter, a map is made on construction of column to value.  If 
value matches, filter returns false, so cell should be added in each 
family.  I don't see anything obviously wrong in here).


St.Ack


David Alves wrote:

St.Ack

Thanks for your reply.

When I use RegExpRowFilter with only one (either one) of the conditions
it works (the rows are passed onto the Map/Reduce task) but there is
still a problem because only one of them column is present in the
resulting MapWritable (I'm using my own tableinputformat) from the
scanner.
So I still use the filter to check for more rows (build a scanner with
one of the conditions, the rarest one, iterate through to try and find
the other) but not in the tableinputformat itself (I just discard the
unwanted values in the Mapper) which is a performance hit (if it would
be the scanner the row wouldn't simply be sent to the master right,
therefore less traffic is distributed mode?), but no big deal.
I seems to me that when the filter is applied only the column that
matches (or the one that doesn't match I'm not sure at the moment) is
passed to the scanner result.

As to the second point I'm running HBase in local mode for development
and the DEBUG log for the HMaster shows nothing, my process simply hangs
indefinitely.

When I'll have some free time I'll try to look into the sources, and
pinpoint the problem more accurately.

David

On Mon, 2008-02-11 at 10:36 -0800, stack wrote:
  

David:

disclaimerIMO, filters are a bit of sweet functionality but they are 
not easy to use.  They also have seen little exercise so you are 
probably tripping over bugs.  That said, I know they basically 
work./disclaimer


I'd suggest you progress from basic filtering toward the filter you'd 
like to implement.   Does the RegExpRowFilter do the right thing when 
filtering one column only?


On the ClassNotFoundException, yeah, it should be coming out on the 
client.  Can you see it in the server logs?  Do you get any exceptions 
client-side?


St.Ack



David Alves wrote:


Hi Again

In my previous example I seem to have misplaced a new keyword (new
myvalue1.getBytes() where it should have been myvalue1.getBytes()).

On another note my program hangs when I supply my own filter to the
scanner (I suppose it's clear that the nodes don't know my class so
there should be a ClassNotFoundException right?).

Regards
David Alves 



On Mon, 2008-02-11 at 16:51 +, David Alves wrote: 
  
  

Hi Guys
In my previous email I might have misunderstood the roles of the
RowFilterInterfaces so I'll pose my question more clearly (since the
last one wasn't in question form :)).
I save a setup when a table has to columns belonging to different
column families (Table A cf1:a cf2:b));

I'm trying to build a filter so that a scanner only returns the rows
where cf1:a = myvalue1 and cf2:b = myvalue2.

I've build a RegExpRowFilter like this;
MapText, byte[] conditionalsMap = new HashMapText, byte[]();
conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
return new RegExpRowFilter(.*, conditionalsMap);

My problem is this filter always fails when I know for sure that there
are rows whose columns match my values.

I'm building the the scanner like this (the purpose in this case is to
find if there are more values that match my filter):

final Text startKey = this.htable.getStartKeys()[0];
HScannerInterface scanner = htable.obtainScanner(new 
Text[] {new
Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
return scanner.iterator().hasNext();

Can anyone give me a hand please.

Thanks in advance
David Alves





  
  


  




Re: Doubt in RegExpRowFilter and RowFilters in general

2008-02-11 Thread David Alves
Hi Again

In my previous example I seem to have misplaced a new keyword (new
myvalue1.getBytes() where it should have been myvalue1.getBytes()).

On another note my program hangs when I supply my own filter to the
scanner (I suppose it's clear that the nodes don't know my class so
there should be a ClassNotFoundException right?).

Regards
David Alves 


On Mon, 2008-02-11 at 16:51 +, David Alves wrote: 
 Hi Guys
   In my previous email I might have misunderstood the roles of the
 RowFilterInterfaces so I'll pose my question more clearly (since the
 last one wasn't in question form :)).
   I save a setup when a table has to columns belonging to different
 column families (Table A cf1:a cf2:b));
 
 I'm trying to build a filter so that a scanner only returns the rows
 where cf1:a = myvalue1 and cf2:b = myvalue2.
 
 I've build a RegExpRowFilter like this;
 MapText, byte[] conditionalsMap = new HashMapText, byte[]();
   conditionalsMap.put(new Text(cf1:a), new myvalue1.getBytes());
   conditionalsMap.put(new Text(cf2:b), myvalue2.getBytes());
   return new RegExpRowFilter(.*, conditionalsMap);
 
 My problem is this filter always fails when I know for sure that there
 are rows whose columns match my values.
 
 I'm building the the scanner like this (the purpose in this case is to
 find if there are more values that match my filter):
 
 final Text startKey = this.htable.getStartKeys()[0];
   HScannerInterface scanner = htable.obtainScanner(new 
 Text[] {new
 Text(cf1:a), new Text(cf2:b)}, startKey, rowFilterInterface);
   return scanner.iterator().hasNext();
 
 Can anyone give me a hand please.
 
 Thanks in advance
 David Alves