Yep I _really_ understand denormalization... But! Still sometimes you want to have the choice of whether you will denormalize or not. I prefer to normalize at first and measure the bottlenecks then getting pragmatic :)
I'm not so worried about diskspace as I am about shuffling uneccessary data around making request roundtrip times longer than necessary. Consider this: Papa Lots and lots of columns and data Daughter Few columns If I'm 90%+ only interested in the Daughters of the Papa I want to have the choice of not seeing Papa's data. Typically I want to store normalized data in a db and denormalize like hell with Lucene indexes for searching since Lucene beats the crap out of db indexing. Get me ? By the time I'm writing this I have already written a simple ORM for HBase with lazy fetching, one-to-many, many-to-one etc :) More about that later if you or the group are interested. Kindly //Marcus On Tue, Jul 22, 2008 at 4:54 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > Marcus, > > Denormalization implies duplication. See this excellent article on the > subject: > > http://highscalability.com/how-i-learned-stop-worrying-and-love-using-lot-disk-space-scale > > In your case, you could keep the "role:" family that contains the row keys > to all roles (a user has) as a column key and value (or the value could be > the description) and if you have to know who has a particular role, have a > new family in Role named "user:" that would map the other way. > > Same thing with category. > > J-D > > On Tue, Jul 22, 2008 at 9:33 AM, Marcus Herou <[EMAIL PROTECTED]> > wrote: > > > Hi. > > > > What is the best practice in hbase when it comes to creating "mapping" > > tables between objects? > > > > Let's say you want to create two tables named "User" and "Role" where the > > user can be in many roles. > > > > User->Role > > > > I guess you could create some specially, proprietary cells like > > role:someuid > > which contains the ref to the Role table but this seems a little strange. > > > > Another quite normal example (for me at lesast) is to tag various > content. > > > > Eg: > > BlogEntry<-BlogEntryCategory->Category > > > > where in a rdbms the BlogEntryCategory would just contain two cols > > blogEntryId and categoryId. > > > > Howto model that with column families ? > > > > Right now I'm creating Serializers which can serialize arrays back and > > forth > > > > Eg StringArraySerializer > > public byte[] serialize(Object object) throws IOException > > { > > String[] a = (String[])object; > > StringBuilder sb = new StringBuilder(); > > for (int i = 0; i < a.length; i++) > > { > > sb.append(a[i]); > > if(i < (a.length - 1)) > > { > > sb.append(this.delimiter); > > } > > } > > return sb.toString().getBytes("UTF-8"); > > } > > > > public Object deserialize(byte[] bytes) throws IOException > > { > > String str = new String(bytes, "UTF-8"); > > StringTokenizer st = new StringTokenizer(str, delimiter); > > > > List<String> list = new ArrayList(); > > while(st.hasMoreTokens()) > > { > > String token = st.nextToken(); > > list.add(token); > > } > > return list.toArray(new String[list.size()]); > > } > > > > > > and then store the byte[] in hbase. Ugly.... > > > > Please guide my sorry ass. > > > > Kindly > > > > //Marcus > > > > > > > > > > -- > > Marcus Herou CTO and co-founder Tailsweep AB > > +46702561312 > > [EMAIL PROTECTED] > > http://www.tailsweep.com/ > > http://blogg.tailsweep.com/ > > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 [EMAIL PROTECTED] http://www.tailsweep.com/ http://blogg.tailsweep.com/
