I want to use hbase to maintain a very large dataset which needs to be updated pretty much continuously. I'm creating a record for each entity and including a creation timestamp column as well as between 10 and 1000 additional columns named for distinct events related to the record entity. Being new to hbase the approach I've taken is to create a map/reduce app that for each input record:
Does a lookup in the table using HTable get(row, column) on the timestamp colum to determine if there is an existing row for the entity. If there is no existing record for the entity, the event history for the entity is added to the table with one column added per unique event id. If there is an existing record for the entity, it just adds the most recent event to the table. I'd like feedback as to whether this is a reasonable approach in terms of general performance and reliability or if there is a different pattern better suited to hbase with map/reduce or if I should even be using map/reduce for this. Thanks in advance. -- View this message in context: http://www.nabble.com/Table-Updates-with-Map-Reduce-tp18537368p18537368.html Sent from the HBase User mailing list archive at Nabble.com.
