RE: Vertical partitioning

Ashish Thusoo Thu, 17 Jun 2010 11:21:20 -0700

If you are querying this data again and again you could just create another 
table which has only those 10 columns (more like a materialized view approach - 
though that is not there in Hive yet.) This ofcourse uses up some space as 
compared to vertical partitioning but if the rcfile performance is not good 
enough, this could be the workaround for now. Also do you see a lot more time 
spent on I/O in your queries?


Ashish

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Thursday, June 17, 2010 9:02 AM
To: hive-dev@hadoop.apache.org
Subject: Re: Vertical partitioning

On Thu, Jun 17, 2010 at 3:00 AM, jaydeep vishwakarma < 
jaydeep.vishwaka...@mkhoj.com> wrote:

> Just looking opportunity and feasibility for it. In one of my table 
> have more than 20 fields where most of the time I need only 10 main 
> fields. We rarely need other fields for day to day analysis.
>
> Regards,
> Jaydeep
>
>
> Ning Zhang wrote:
>
> Hive support columnar storage (RCFile) but not vertical partitioning. 
> Is there any use case for vertical partitioning?
>
> On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote:
>
>
>
> Hi,
>
> Does hive support Vertical partitioning?
>
> Regards,
> Jaydeep
>
>
>
> The information contained in this communication is intended solely for 
> the use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally 
> privileged information. If you are not the intended recipient you are 
> hereby notified that any disclosure, copying, distribution or taking 
> any action in reliance on the contents of this information is strictly 
> prohibited and may be unlawful. If you have received this 
> communication in error, please notify us immediately by responding to this 
> email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of 
> the information contained in this communication nor for any delay in 
> its receipt.
>
>
>
>
>
>
>
> ________________________________
>
> The information contained in this communication is intended solely for 
> the use of the individual or entity to whom it is addressed and others 
> authorized to receive it. It may contain confidential or legally 
> privileged information. If you are not the intended recipient you are 
> hereby notified that any disclosure, copying, distribution or taking 
> any action in reliance on the contents of this information is strictly 
> prohibited and may be unlawful. If you have received this 
> communication in error, please notify us immediately by responding to this 
> email and then delete it from your system.
> The firm is neither liable for the proper and complete transmission of 
> the information contained in this communication nor for any delay in 
> its receipt.
>

Vertical partitioning is just as practical in a traditional RDBMS as it would 
be in hive. Normally you would do it for a few reasons:
1) You have some rarely used columns and you want to reduce the table/row size
2) Your DBMS has terrible blob/clob/text support and the only want to get large 
objects out of your way is to put them in other tables.

If you go the option of vertical partitioning in hive, you may have to join to 
select the columns you need. I do not consider row serialization and de 
serialization to be the majority of a hive job, and in most cases hadoop 
handles 1 large file better then two smaller ones. Then again we have some 
tables 140+ columns so i can see vertical partitioning helping with those 
tables but it doubles the management.

RE: Vertical partitioning

Reply via email to