For a few million records like this, hbase isnt the optimal solution in all probability. It might work better to store them in memory instead. You'd be good with 4GB RAM for this kind of a graph and for persistent storage, you can simply dump it into a log file or an xml, which can be read if the system crashes. Or you can serialize the data and store in Berkeley DB or some column oriented data store. Hbase would be useful only if you are talking of much bigger data sets. My data set, which contains about 20 million records is still small and probably just on the threshold of what hbase might be optimal for (debatable - i dont have exact numbers).
Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, May 13, 2009 at 1:43 PM, Amandeep Khurana <[email protected]> wrote: > I store an entire graph in a single hbase table. This has essentially > many-many relationships. The model is similar to the one you described in > your mail. There are multiple column families and for every "type" of data > point (row id), different set of families contain data. In your design it > seems that a single column family will be filled only for a particular type > of data point, which makes it unnecessary to store in a single table. > Infact, you might be better off not using hbase for it. In my setup, a > column family can contain data for different data points... So that makes it > like a giant sparse matrix of some sort (not exactly though). > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > > On Wed, May 13, 2009 at 12:24 PM, llpind <[email protected]> wrote: > >> >> Thanks. Thats cool, I'm interested in indexes. Here is a classic >> student/course example: >> >> RDBMS >> >> TBL_STUDENT: student_id, student_name, student_address >> TBL_COURSES: course_id, student_id, course_type >> TBL_COURSE_TYPES: course_type, course_desc >> >> 1st shot at HBase (1 HBase table): >> >> Key: ST:<student_id> example: ST:4423 >> Column Family: course >> CoulmnEntries: course:<Type>:<Value> example: course:Math:Jon >> Notice that each entity will be a new column and there is not really any >> ‘value’ in the column. The column itself is holding the valuable >> information. >> >> Key: CS:<Type>:<Value> example: CS:Math:Phil >> Column Family: student >> Column Entries: student:<student_id> example: student:4423 or student:5656 >> >> Key: VL:<Value> example: VL:John >> Column Family: type >> Column Entries: type:<course> example: type:Math or type:Science >> >> Key: TP:<Type> example: TP:Math >> Column Family: none >> Column Entries: none >> >> >> Assume potentially millions+ students & courses (~100s types). common >> queries: >> >> Given a student name -> list all courses >> Given a course -> list all students >> Given a course -> list all types >> list all available course types >> >> Open to ideas on alternative designs. Thanks. >> -- >> View this message in context: >> http://www.nabble.com/HBase-Data-Model-tp23511426p23528345.html >> Sent from the HBase User mailing list archive at Nabble.com. >> >> >
