Hi Imran, > * Where can I read up on or checkout samples RDBMS schemas converted > to HBase schema? Basically, I want to read up efficient schema design > for different cardinal relationships between objects.
I would recommend the following presentations and paper: Practical HBase [ Page 27 -- 33 ] by Jon Gray and Michael Stack, Apachecon2009 in Oakland http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=view&target=ApacheCon2009_Practical_HBase-1.pdf Paper: No Relation: The Mixed Blessings of Non-Relational&Databases by Ian Thomas Varley http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf HBase Schema Design -- Case Studies by Evan(Qingyan) Liu http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies There are some fundamental differences to RDBMS schema design: -- De-normalization is the key to design HBase schemas. -- Carefully pick the primary key, try to fulfill all queries without having secondary indices. Use composite primary key if you need. -- Be aware you can store a value not only in the cell but also column qualifier. -- Each cell is byte array and typeless, and you can store multi-values in a cell by serializing them with Google protobuf, Apache Avro or JSON. Hope this helps, -- Tatsuya Kawano (Mr.) Tokyo, Japan On Thu, Nov 12, 2009 at 11:13 PM, Imran M Yousuf <[email protected]> wrote: > Hi! > > I am absolutely new to HBase. All I have done is to read up > documentation, presentation and getting a single instance up and > running. I am starting on a Content Management System which will be > used as a backend for multiple web applications of different natures. > In the CMS: > * User can define their content known as content type. > * Content can have one-2-many one-2-one and many-2-many relationship > with other contents. > * Content fields should be versioned > * Content type can change in runtime, i.e. fields (a.k.a. columns in > HBase) added and removal will not be allowed just yet. > * Every content type will have a corresponding grammer to validate > content of its type. > * It will have authentication and authorization > * It will have full text search based on Lucene/Katta. > > Based on these requirements I have the following questions that I > would like feedback on: > * Reading articles and presentations it looks to be HBase is a perfect > match as it supports multi-dimensional rows, versioned cells, dynamic > schema modification. But I could not understand what is the definition > of "Big Data" - that is if a content size is roughly 1~100kB > (field/cell size 0~100kB), is HBase meant for such uses? > * Since I am not sure how much load the site will have, I am planning > to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a > view of with revenue and pageviews increasing, more moderate > "commodity" hardware can be added progressively. Any > comments/suggestions on this strategy? > * Where can I read up on or checkout samples RDBMS schemas converted > to HBase schema? Basically, I want to read up efficient schema design > for different cardinal relationships between objects. > > Thank you, > > -- > Imran M Yousuf > Entrepreneur & Software Engineer > Smart IT Engineering > Dhaka, Bangladesh > Email: [email protected] > Blog: http://imyousuf-tech.blogs.smartitengineering.com/ > Mobile: +880-1711402557
