On Fri, Nov 13, 2009 at 8:02 AM, Tatsuya Kawano <[email protected]> wrote: > Hi Imran, > >> * Where can I read up on or checkout samples RDBMS schemas converted >> to HBase schema? Basically, I want to read up efficient schema design >> for different cardinal relationships between objects. > > I would recommend the following presentations and paper: > > Practical HBase [ Page 27 -- 33 ] > by Jon Gray and Michael Stack, Apachecon2009 in Oakland > http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=view&target=ApacheCon2009_Practical_HBase-1.pdf > > > Paper: No Relation: The Mixed Blessings of Non-Relational&Databases > by Ian Thomas Varley > http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf > > > HBase Schema Design -- Case Studies > by Evan(Qingyan) Liu > http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies > > > There are some fundamental differences to RDBMS schema design: > > -- De-normalization is the key to design HBase schemas. > -- Carefully pick the primary key, try to fulfill all queries without > having secondary indices. Use composite primary key if you need. > -- Be aware you can store a value not only in the cell but also column > qualifier. > -- Each cell is byte array and typeless, and you can store > multi-values in a cell by serializing them with Google protobuf, > Apache Avro or JSON. >
Thanks a lot for the pointers...Read them and now I have new questions and looking to clarify them :). The questions are - * Definition of Column family, column and cell. And their use cases. Thanks a lot again, Imran > > Hope this helps, > > -- > Tatsuya Kawano (Mr.) > Tokyo, Japan > > > > On Thu, Nov 12, 2009 at 11:13 PM, Imran M Yousuf <[email protected]> wrote: >> Hi! >> >> I am absolutely new to HBase. All I have done is to read up >> documentation, presentation and getting a single instance up and >> running. I am starting on a Content Management System which will be >> used as a backend for multiple web applications of different natures. >> In the CMS: >> * User can define their content known as content type. >> * Content can have one-2-many one-2-one and many-2-many relationship >> with other contents. >> * Content fields should be versioned >> * Content type can change in runtime, i.e. fields (a.k.a. columns in >> HBase) added and removal will not be allowed just yet. >> * Every content type will have a corresponding grammer to validate >> content of its type. >> * It will have authentication and authorization >> * It will have full text search based on Lucene/Katta. >> >> Based on these requirements I have the following questions that I >> would like feedback on: >> * Reading articles and presentations it looks to be HBase is a perfect >> match as it supports multi-dimensional rows, versioned cells, dynamic >> schema modification. But I could not understand what is the definition >> of "Big Data" - that is if a content size is roughly 1~100kB >> (field/cell size 0~100kB), is HBase meant for such uses? >> * Since I am not sure how much load the site will have, I am planning >> to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a >> view of with revenue and pageviews increasing, more moderate >> "commodity" hardware can be added progressively. Any >> comments/suggestions on this strategy? >> * Where can I read up on or checkout samples RDBMS schemas converted >> to HBase schema? Basically, I want to read up efficient schema design >> for different cardinal relationships between objects. >> >> Thank you, >> >> -- >> Imran M Yousuf >> Entrepreneur & Software Engineer >> Smart IT Engineering >> Dhaka, Bangladesh >> Email: [email protected] >> Blog: http://imyousuf-tech.blogs.smartitengineering.com/ >> Mobile: +880-1711402557 > -- Imran M Yousuf Entrepreneur & Software Engineer Smart IT Engineering Dhaka, Bangladesh Email: [email protected] Blog: http://imyousuf-tech.blogs.smartitengineering.com/ Mobile: +880-1711402557
