Thank you, supports what I was thinking.

Jean-Daniel Cryans wrote:
> 
> Ok now a have a good picture of your situation (took me a moment).
> 
> I guess that even if it's concurrent it will not be that much of a
> problem.
> Keeping the max version at 1 will insure that even if 3 mappers insert the
> history of one entity, the data that overlaps will still be inserted in
> your
> "event:" family and the rest will be discarded. Your biggest concern will
> be
> the efficiency of reading data from HBase so your mappers should have a
> local cache.
> 
> Hope this helps,
> 
> J-D
> 
> On Sat, Jul 19, 2008 at 5:22 PM, imbmay <[EMAIL PROTECTED]> wrote:
> 
>>
>> The table was created with two column families: createdAt and event, the
>> former is the timestamp, so 1 entry per entity and the latter is a
>> collection of events.  In the latter entries take the form event:1524,
>> event:1207, etc. and for the time being I'm storing only the event time.
>> The input is a set of text files generated at a rate of about 600 an hour
>> with up to 50,000 entries per file.  Each line in the text file contains
>> a
>> unique entity ID, a timestamp of the first time it was seen, an event
>> code
>> and a history of the last 100 event codes.  In cases where I haven't seen
>> an
>> entity before I want to add everything in the history; when the entity
>> has
>> been seen previously I just want to add the last event.  I'm keeping the
>> table design simple to start with while I'm getting familiar with HBase.
>>
>> The principal area of concern I have is regarding the reading of the data
>> from the HBase table during the map/reduce process to determine if an
>> entity
>> already exists.  If I'm running the map/reduce on a single machine then
>> its
>> pretty easy to keep track of previously unknown entities; but if I'm
>> running
>> in a cluster a new entity may show up in the inputs to several concurrent
>> [EMAIL PROTECTED]
>>
>>
>> Jean-Daniel Cryans wrote:
>> >
>> > Brian (guessing it's your name from your email address),
>> >
>> > Please be more specific about your table design. For example, a
>> "column"
>> > in
>> > HBase is a very vague word since it may refer to a column family or a
>> > column
>> > key inside a column family. Also, what kind of load you expect to have?
>> >
>> > Maybe answering to this will also help you understanding HBase.
>> >
>> > Thx,
>> >
>> > J-D
>> >
>> > On Fri, Jul 18, 2008 at 4:41 PM, imbmay <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >>
>> >> I want to use hbase to maintain a very large dataset which needs to be
>> >> updated pretty much continuously.  I'm creating a record for each
>> entity
>> >> and
>> >> including a creation timestamp column as well as between 10 and 1000
>> >> additional columns named for distinct events related to the record
>> >> entity.
>> >> Being new to hbase the approach I've taken is to create a map/reduce
>> app
>> >> that for each input record:
>> >>
>> >> Does a lookup in the table using HTable get(row, column) on the
>> timestamp
>> >> colum to determine if there is an existing row for the entity.
>> >> If there is no existing record for the entity, the event history for
>> the
>> >> entity is added to the table with one column added per unique event
>> id.
>> >> If there is an existing record for the entity, it just adds the most
>> >> recent
>> >> event to the table.
>> >>
>> >> I'd like feedback as to whether this is a reasonable approach in terms
>> of
>> >> general performance and reliability or if there is a different pattern
>> >> better suited to hbase with map/reduce or if I should even be using
>> >> map/reduce for this.
>> >>
>> >> Thanks in advance.
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Table-Updates-with-Map-Reduce-tp18537368p18537368.html
>> >> Sent from the HBase User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Table-Updates-with-Map-Reduce-tp18537368p18548888.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Table-Updates-with-Map-Reduce-tp18537368p18576436.html
Sent from the HBase User mailing list archive at Nabble.com.

Reply via email to