Hi all!
I have a MR jobs to import contents to HBase. Before importing, I have to determine the new contents to import (The row key in Hbase is URI). After import this new contents to HBase. Assume, I have large content in HBabse (> 1,000,000,000 URIs) and I have 1,000,000 URIs need to import (new + existed in Hbase). How to get new contents (URIs) to import? The current solution: I check the existed of the URI in Hbase to get the new URIs. Some things like: RowResult row = hTable.getRow(uri); if (row.isEmpty()) { // collect the new content (URI) } With this solution, if URIs is large then the time connection to HBase is large :( Please suggest for me the good solution. :) Thanks! Best regards, Nguyen.