Hi Ryan,
Thanks for your response - I am also working on this project.
I was hoping that hBase perhaps treated the time range differently
which would prevent a full table scan. I suppose our only next option
is to implement indexing?
Regards,
Seraph
On 05 May 2010, at 9:27 AM, Ryan Rawson wrote:
You have to examine nearly every single value in the table - the
mechanism by which HBase can restrict how much data it has to scan is
via the row key only. All the filters and filter-like calls (eg:
setTimeRange) just restrict what data is passed back to the client.
So yes you are scanning the entire table. Could get expensive once
you have a few TB.
The thing to remember is access to data is all about the primary key.
It's very similar to a RDBMs with only a primary index. If you can't
restrict your query via the primary key, then you have to do a full
table scan.
-ryan
On Wed, May 5, 2010 at 12:22 AM, Michelan Arendse <miche...@addynamo.com
> wrote:
I don't know what the row start and end keys are - they GUID keys
(improves writes across cluster - had help with this from this user-
group before).
I need to export data written between "startDate" and "endDate"
into a relational database so I can interrogate the data (SUM/AVG,
etc).
That is why I am are using: scan.setTimeRange(fromDate.getTime(),
toDate.getTime());
In my test with live data, I only took between 2010-03-26 00:00:00
and 2010-03-26 01:00:00 - there should only be a few thousand rows
in-between those dates.
Will hbase still take forever to find the data I look for unless I
use startRow/endRow?
-----Original Message-----
From: TuX RaceR [mailto:tuxrace...@gmail.com]
Sent: 04 May 2010 05:52 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Improving HBase scanner
Michelan Arendse wrote:
Is there a way to speed up the fetching of data from HBase?
Divide your key space in smaller chunks?
using closer |startRow, and ||stopRow?|
|*cf:
<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/Scan.html#Scan%28byte%5B%5D,%20byte%5B%5D%29
>
Scan
<http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/Scan.html#Scan%28byte%5B%5D,%20byte%5B%5D%29
>*(byte[] startRow,
byte[] stopRow)|
TuX