Hi! I am absolutely new to HBase. All I have done is to read up documentation, presentation and getting a single instance up and running. I am starting on a Content Management System which will be used as a backend for multiple web applications of different natures. In the CMS: * User can define their content known as content type. * Content can have one-2-many one-2-one and many-2-many relationship with other contents. * Content fields should be versioned * Content type can change in runtime, i.e. fields (a.k.a. columns in HBase) added and removal will not be allowed just yet. * Every content type will have a corresponding grammer to validate content of its type. * It will have authentication and authorization * It will have full text search based on Lucene/Katta.
Based on these requirements I have the following questions that I would like feedback on: * Reading articles and presentations it looks to be HBase is a perfect match as it supports multi-dimensional rows, versioned cells, dynamic schema modification. But I could not understand what is the definition of "Big Data" - that is if a content size is roughly 1~100kB (field/cell size 0~100kB), is HBase meant for such uses? * Since I am not sure how much load the site will have, I am planning to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a view of with revenue and pageviews increasing, more moderate "commodity" hardware can be added progressively. Any comments/suggestions on this strategy? * Where can I read up on or checkout samples RDBMS schemas converted to HBase schema? Basically, I want to read up efficient schema design for different cardinal relationships between objects. Thank you, -- Imran M Yousuf Entrepreneur & Software Engineer Smart IT Engineering Dhaka, Bangladesh Email: [email protected] Blog: http://imyousuf-tech.blogs.smartitengineering.com/ Mobile: +880-1711402557
