Try using this filter instead:
scan.setFilter(FirstKeyOnlyFilter.new())
Will only return row keys, if thats the effect you are looking for.
St.Ack
On Tue, Dec 8, 2009 at 3:30 PM, Edward Capriolo <[email protected]>wrote:
> On Tue, Dec 8, 2009 at 6:00 PM, Andrew Purtell <[email protected]>
> wrote:
> > I added an entry to the troubleshooting page up on the wiki:
> >
> > http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A16
> >
> > - Andy
> >
> >
> >
> >
> >
> > ________________________________
> > From: Ryan Rawson <[email protected]>
> > To: [email protected]
> > Sent: Tue, December 8, 2009 5:21:25 PM
> > Subject: Re: PrefixFilter performance question.
> >
> > You want:
> >
> >
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#scannerCaching
> >
> > The default is low because if a job takes too long processing, a
> > scanner can time out, which causes unhappy jobs/people/emails.
> >
> > BTW I can read small rows out of a 19 node cluster at 7 million
> > rows/sec using a map-reduce program. Any individual process is doing
> > 40k+ rows/sec or so
> >
> > -ryan
> >
> > On Tue, Dec 8, 2009 at 12:25 PM, Edward Capriolo <[email protected]>
> wrote:
> >> Hey all,
> >>
> >> I have been doing some performance evaluation with mysql vs hbase.
> >>
> >> I have a table webtable
> >> {NAME => 'webdata', FAMILIES => [{NAME => 'anchor', COMPRESSION =>
> >> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
> >> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'image',
> >> COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
> >> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
> >> 'raw_data', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
> >> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
> >> => 'true'}]}
> >>
> >> I have a normalized version in mysql. I currently have loaded
> >>
> >> nyhadoopdev6:60030 1260289750689 requests=4, regions=3,
> usedHeap=99, maxHeap=997
> >> nyhadoopdev7:60030 1260289862481 requests=0, regions=2,
> usedHeap=181,
> >> maxHeap=997
> >> nyhadoopdev8:60030 1260289909059 requests=0, regions=2,
> usedHeap=395,
> >> maxHeap=997
> >>
> >> This is a snippet here.
> >>
> >> if (mysql) {
> >> try {
> >> PreparedStatement ps = conn.prepareStatement("SELECT * FROM
> >> page WHERE page LIKE (?)");
> >> ps.setString(1,"http://www.s%");
> >> ResultSet rs = ps.executeQuery();
> >> while (rs.next() ){
> >> sPageCount++;
> >> }
> >> rs.close();
> >> ps.close();
> >> } catch (SQLException ex) {System.out.println(ex); System.exit(1);
> }
> >> }
> >>
> >> if (hbase) {
> >> Scan s = new Scan();
> >> //s.setCacheBlocks(true);
> >> s.setFilter( new PrefixFilter(Bytes.toBytes("http://www.s") ) );
> >> ResultScanner scanner = table.getScanner(s);
> >> try {
> >> for (Result rr:scanner){
> >> sPageCount++;
> >> }
> >> } finally {
> >> scanner.close();
> >> }
> >>
> >> }
> >>
> >> I am seeing about .3 MS from mysql and 20. second performance from
> >> Hbase. I have read some tuning docs but most seem geared for insertion
> >> speed, not search speed. I would think this would be a
> >> Bread-and-butter search for hbase since the row keys are naturally
> >> sorted lexicographically. I am not running a giant setup here, 3
> >> nodes, 2x replication, but I would think that it is almost a non
> >> factor here since these data is fairly small. Hints ?
> >>
> >
> >
> >
> >
>
> I raised this to from 1-30 -> 18 sec
> I raised this to 100 ->17 sec
> I raised this to 1000 ->OOM
>
> The OOM pointed me in the direction that this comparison is not apples
> to apples. In mysql the page table is normalized, but in HBASE it is
> not. I see lots of data moving across the wire.
>
> I tried to filter to just move the ROW key across the wire but I do
> not think I have it right...
>
> List<Filter> filters = new ArrayList<Filter>();
> filters.add( new PrefixFilter(Bytes.toBytes("http://www.s") ) ) ;
> filters.add( new QualifierFilter( CompareOp.EQUAL, new
> BinaryComparator(
>
> Bytes.toBytes("ROW")) ) );
> Filter f =new FilterList(Operator.MUST_PASS_ALL, filters);
> s.setFilter(f);
> ResultScanner scanner = table.getScanner(s);
>