Re: PrefixFilter performance question.

stack Tue, 08 Dec 2009 20:43:41 -0800

Try using this filter instead:

      scan.setFilter(FirstKeyOnlyFilter.new())


Will only return row keys, if thats the effect you are looking for.

St.Ack


On Tue, Dec 8, 2009 at 3:30 PM, Edward Capriolo <[email protected]>wrote:

> On Tue, Dec 8, 2009 at 6:00 PM, Andrew Purtell <[email protected]>
> wrote:
> > I added an entry to the troubleshooting page up on the wiki:
> >
> >    http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A16
> >
> >  - Andy
> >
> >
> >
> >
> >
> > ________________________________
> > From: Ryan Rawson <[email protected]>
> > To: [email protected]
> > Sent: Tue, December 8, 2009 5:21:25 PM
> > Subject: Re: PrefixFilter performance question.
> >
> > You want:
> >
> >
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#scannerCaching
> >
> > The default is low because if a job takes too long processing, a
> > scanner can time out, which causes unhappy jobs/people/emails.
> >
> > BTW I can read small rows out of a 19 node cluster at 7 million
> > rows/sec using a map-reduce program.  Any individual process is doing
> > 40k+ rows/sec or so
> >
> > -ryan
> >
> > On Tue, Dec 8, 2009 at 12:25 PM, Edward Capriolo <[email protected]>
> wrote:
> >> Hey all,
> >>
> >> I have been doing some performance evaluation with mysql vs hbase.
> >>
> >> I have a table webtable
> >> {NAME => 'webdata', FAMILIES => [{NAME => 'anchor', COMPRESSION =>
> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'image',
> >> COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
> >> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> >> 'raw_data', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
> >> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
> >> => 'true'}]}
> >>
> >> I have a normalized version in mysql. I currently have loaded
> >>
> >> nyhadoopdev6:60030      1260289750689   requests=4, regions=3,
> usedHeap=99, maxHeap=997
> >> nyhadoopdev7:60030      1260289862481   requests=0, regions=2,
> usedHeap=181,
> >> maxHeap=997
> >> nyhadoopdev8:60030      1260289909059   requests=0, regions=2,
> usedHeap=395,
> >> maxHeap=997
> >>
> >> This is a snippet here.
> >>
> >> if (mysql) {
> >>       try {
> >>        PreparedStatement ps = conn.prepareStatement("SELECT * FROM
> >> page WHERE page LIKE (?)");
> >>        ps.setString(1,"http://www.s%";);
> >>        ResultSet rs = ps.executeQuery();
> >>        while (rs.next() ){
> >>          sPageCount++;
> >>        }
> >>        rs.close();
> >>        ps.close();
> >>       } catch (SQLException ex) {System.out.println(ex); System.exit(1);
> }
> >>      }
> >>
> >>      if (hbase) {
> >>        Scan s = new Scan();
> >>        //s.setCacheBlocks(true);
> >>        s.setFilter( new PrefixFilter(Bytes.toBytes("http://www.s";) ) );
> >>        ResultScanner scanner = table.getScanner(s);
> >>        try {
> >>          for (Result rr:scanner){
> >>            sPageCount++;
> >>          }
> >>       } finally {
> >>         scanner.close();
> >>       }
> >>
> >>      }
> >>
> >> I am seeing about .3 MS from mysql and 20. second performance from
> >> Hbase. I have read some tuning docs but most seem geared for insertion
> >> speed, not search speed. I would think this would be a
> >> Bread-and-butter search for hbase since the row keys are naturally
> >> sorted lexicographically. I am not running a giant setup here, 3
> >> nodes, 2x replication, but I would think that it is almost a non
> >> factor here since these data is fairly small. Hints ?
> >>
> >
> >
> >
> >
>
> I raised this to from 1-30 -> 18 sec
> I raised this to 100 ->17 sec
> I raised this to 1000 ->OOM
>
> The OOM pointed me in the direction that this comparison is not apples
> to apples. In mysql the page table is normalized, but in HBASE it is
> not. I see lots of data moving across the wire.
>
> I tried to filter to just move the ROW key across the wire but I do
> not think I have it right...
>
>  List<Filter> filters = new ArrayList<Filter>();
>        filters.add( new PrefixFilter(Bytes.toBytes("http://www.s";) ) ) ;
>        filters.add( new QualifierFilter( CompareOp.EQUAL, new
> BinaryComparator(
>
> Bytes.toBytes("ROW")) ) );
>        Filter f =new FilterList(Operator.MUST_PASS_ALL, filters);
>        s.setFilter(f);
>         ResultScanner scanner = table.getScanner(s);
>

Re: PrefixFilter performance question.

Reply via email to