Hi all!
I have a MR jobs to import contents to HBase. Before importing, I have
to determine the new contents to import (The row key in Hbase is URI).
After import this new contents to HBase.
Assume, I have large content in HBabse (> 1,000,000,000 URIs) and I have
1,000,000 URIs need to import (new + existed in Hbase). How to get new
contents (URIs) to import?
The current solution: I check the existed of the URI in Hbase to get the
new URIs. Some things like:
RowResult row = hTable.getRow(uri);
if (row.isEmpty()) {
// collect the new content (URI)
}
With this solution, if URIs is large then the time connection to HBase
is large :(
Please suggest for me the good solution. :)
Thanks!
Best regards,
Nguyen.