If you are querying this data again and again you could just create another table which has only those 10 columns (more like a materialized view approach - though that is not there in Hive yet.) This ofcourse uses up some space as compared to vertical partitioning but if the rcfile performance is not good enough, this could be the workaround for now. Also do you see a lot more time spent on I/O in your queries?
Ashish -----Original Message----- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, June 17, 2010 9:02 AM To: hive-dev@hadoop.apache.org Subject: Re: Vertical partitioning On Thu, Jun 17, 2010 at 3:00 AM, jaydeep vishwakarma < jaydeep.vishwaka...@mkhoj.com> wrote: > Just looking opportunity and feasibility for it. In one of my table > have more than 20 fields where most of the time I need only 10 main > fields. We rarely need other fields for day to day analysis. > > Regards, > Jaydeep > > > Ning Zhang wrote: > > Hive support columnar storage (RCFile) but not vertical partitioning. > Is there any use case for vertical partitioning? > > On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote: > > > > Hi, > > Does hive support Vertical partitioning? > > Regards, > Jaydeep > > > > The information contained in this communication is intended solely for > the use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally > privileged information. If you are not the intended recipient you are > hereby notified that any disclosure, copying, distribution or taking > any action in reliance on the contents of this information is strictly > prohibited and may be unlawful. If you have received this > communication in error, please notify us immediately by responding to this > email and then delete it from your system. > The firm is neither liable for the proper and complete transmission of > the information contained in this communication nor for any delay in > its receipt. > > > > > > > > ________________________________ > > The information contained in this communication is intended solely for > the use of the individual or entity to whom it is addressed and others > authorized to receive it. It may contain confidential or legally > privileged information. If you are not the intended recipient you are > hereby notified that any disclosure, copying, distribution or taking > any action in reliance on the contents of this information is strictly > prohibited and may be unlawful. If you have received this > communication in error, please notify us immediately by responding to this > email and then delete it from your system. > The firm is neither liable for the proper and complete transmission of > the information contained in this communication nor for any delay in > its receipt. > Vertical partitioning is just as practical in a traditional RDBMS as it would be in hive. Normally you would do it for a few reasons: 1) You have some rarely used columns and you want to reduce the table/row size 2) Your DBMS has terrible blob/clob/text support and the only want to get large objects out of your way is to put them in other tables. If you go the option of vertical partitioning in hive, you may have to join to select the columns you need. I do not consider row serialization and de serialization to be the majority of a hive job, and in most cases hadoop handles 1 large file better then two smaller ones. Then again we have some tables 140+ columns so i can see vertical partitioning helping with those tables but it doubles the management.