On Tue, Dec 8, 2009 at 6:00 PM, Andrew Purtell <[email protected]> wrote:
> I added an entry to the troubleshooting page up on the wiki:
>
> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A16
>
> - Andy
>
>
>
>
>
> ________________________________
> From: Ryan Rawson <[email protected]>
> To: [email protected]
> Sent: Tue, December 8, 2009 5:21:25 PM
> Subject: Re: PrefixFilter performance question.
>
> You want:
>
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/HTable.html#scannerCaching
>
> The default is low because if a job takes too long processing, a
> scanner can time out, which causes unhappy jobs/people/emails.
>
> BTW I can read small rows out of a 19 node cluster at 7 million
> rows/sec using a map-reduce program. Any individual process is doing
> 40k+ rows/sec or so
>
> -ryan
>
> On Tue, Dec 8, 2009 at 12:25 PM, Edward Capriolo <[email protected]>
> wrote:
>> Hey all,
>>
>> I have been doing some performance evaluation with mysql vs hbase.
>>
>> I have a table webtable
>> {NAME => 'webdata', FAMILIES => [{NAME => 'anchor', COMPRESSION =>
>> 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536',
>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'image',
>> COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
>> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME =>
>> 'raw_data', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
>> '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
>> => 'true'}]}
>>
>> I have a normalized version in mysql. I currently have loaded
>>
>> nyhadoopdev6:60030 1260289750689 requests=4, regions=3, usedHeap=99,
>> maxHeap=997
>> nyhadoopdev7:60030 1260289862481 requests=0, regions=2, usedHeap=181,
>> maxHeap=997
>> nyhadoopdev8:60030 1260289909059 requests=0, regions=2, usedHeap=395,
>> maxHeap=997
>>
>> This is a snippet here.
>>
>> if (mysql) {
>> try {
>> PreparedStatement ps = conn.prepareStatement("SELECT * FROM
>> page WHERE page LIKE (?)");
>> ps.setString(1,"http://www.s%");
>> ResultSet rs = ps.executeQuery();
>> while (rs.next() ){
>> sPageCount++;
>> }
>> rs.close();
>> ps.close();
>> } catch (SQLException ex) {System.out.println(ex); System.exit(1); }
>> }
>>
>> if (hbase) {
>> Scan s = new Scan();
>> //s.setCacheBlocks(true);
>> s.setFilter( new PrefixFilter(Bytes.toBytes("http://www.s") ) );
>> ResultScanner scanner = table.getScanner(s);
>> try {
>> for (Result rr:scanner){
>> sPageCount++;
>> }
>> } finally {
>> scanner.close();
>> }
>>
>> }
>>
>> I am seeing about .3 MS from mysql and 20. second performance from
>> Hbase. I have read some tuning docs but most seem geared for insertion
>> speed, not search speed. I would think this would be a
>> Bread-and-butter search for hbase since the row keys are naturally
>> sorted lexicographically. I am not running a giant setup here, 3
>> nodes, 2x replication, but I would think that it is almost a non
>> factor here since these data is fairly small. Hints ?
>>
>
>
>
>
I raised this to from 1-30 -> 18 sec
I raised this to 100 ->17 sec
I raised this to 1000 ->OOM
The OOM pointed me in the direction that this comparison is not apples
to apples. In mysql the page table is normalized, but in HBASE it is
not. I see lots of data moving across the wire.
I tried to filter to just move the ROW key across the wire but I do
not think I have it right...
List<Filter> filters = new ArrayList<Filter>();
filters.add( new PrefixFilter(Bytes.toBytes("http://www.s") ) ) ;
filters.add( new QualifierFilter( CompareOp.EQUAL, new BinaryComparator(
Bytes.toBytes("ROW")) ) );
Filter f =new FilterList(Operator.MUST_PASS_ALL, filters);
s.setFilter(f);
ResultScanner scanner = table.getScanner(s);