Re: PrefixFilter performance question.

Edward Capriolo Tue, 08 Dec 2009 15:30:50 -0800

On Tue, Dec 8, 2009 at 6:00 PM, Andrew Purtell <[email protected]> wrote:
> I added an entry to the troubleshooting page up on the wiki:
>
>    http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A16
>
>  - Andy
>
>
>
>
>
> ________________________________
> From: Ryan Rawson <[email protected]>
> To: [email protected]
> Sent: Tue, December 8, 2009 5:21:25 PM
> Subject: Re: PrefixFilter performance question.
>
> You want:
>
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#scannerCaching
>
> The default is low because if a job takes too long processing, a
> scanner can time out, which causes unhappy jobs/people/emails.
>
> BTW I can read small rows out of a 19 node cluster at 7 million
> rows/sec using a map-reduce program.  Any individual process is doing
> 40k+ rows/sec or so
>
> -ryan
>
> On Tue, Dec 8, 2009 at 12:25 PM, Edward Capriolo <[email protected]> 
> wrote:
>> Hey all,
>>
>> I have been doing some performance evaluation with mysql vs hbase.
>>
>> I have a table webtable
>> {NAME => 'webdata', FAMILIES => [{NAME => 'anchor', COMPRESSION =>
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'image',
>> COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
>> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
>> 'raw_data', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
>> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
>> => 'true'}]}
>>
>> I have a normalized version in mysql. I currently have loaded
>>
>> nyhadoopdev6:60030      1260289750689   requests=4, regions=3, usedHeap=99, 
>> maxHeap=997
>> nyhadoopdev7:60030      1260289862481   requests=0, regions=2, usedHeap=181,
>> maxHeap=997
>> nyhadoopdev8:60030      1260289909059   requests=0, regions=2, usedHeap=395,
>> maxHeap=997
>>
>> This is a snippet here.
>>
>> if (mysql) {
>>       try {
>>        PreparedStatement ps = conn.prepareStatement("SELECT * FROM
>> page WHERE page LIKE (?)");
>>        ps.setString(1,"http://www.s%";);
>>        ResultSet rs = ps.executeQuery();
>>        while (rs.next() ){
>>          sPageCount++;
>>        }
>>        rs.close();
>>        ps.close();
>>       } catch (SQLException ex) {System.out.println(ex); System.exit(1); }
>>      }
>>
>>      if (hbase) {
>>        Scan s = new Scan();
>>        //s.setCacheBlocks(true);
>>        s.setFilter( new PrefixFilter(Bytes.toBytes("http://www.s";) ) );
>>        ResultScanner scanner = table.getScanner(s);
>>        try {
>>          for (Result rr:scanner){
>>            sPageCount++;
>>          }
>>       } finally {
>>         scanner.close();
>>       }
>>
>>      }
>>
>> I am seeing about .3 MS from mysql and 20. second performance from
>> Hbase. I have read some tuning docs but most seem geared for insertion
>> speed, not search speed. I would think this would be a
>> Bread-and-butter search for hbase since the row keys are naturally
>> sorted lexicographically. I am not running a giant setup here, 3
>> nodes, 2x replication, but I would think that it is almost a non
>> factor here since these data is fairly small. Hints ?
>>
>
>
>
>


I raised this to from 1-30 -> 18 sec
I raised this to 100 ->17 sec
I raised this to 1000 ->OOM

The OOM pointed me in the direction that this comparison is not apples
to apples. In mysql the page table is normalized, but in HBASE it is
not. I see lots of data moving across the wire.

I tried to filter to just move the ROW key across the wire but I do
not think I have it right...

  List<Filter> filters = new ArrayList<Filter>();
        filters.add( new PrefixFilter(Bytes.toBytes("http://www.s";) ) ) ;
        filters.add( new QualifierFilter( CompareOp.EQUAL, new BinaryComparator(

Bytes.toBytes("ROW")) ) );
        Filter f =new FilterList(Operator.MUST_PASS_ALL, filters);
        s.setFilter(f);
        ResultScanner scanner = table.getScanner(s);

Re: PrefixFilter performance question.

Reply via email to