[ 
https://issues.apache.org/jira/browse/HIVE-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajat Venkatesh updated HIVE-8467:
----------------------------------
    Attachment: Table Copies.pdf

> Table Copy - Background, incremental data load
> ----------------------------------------------
>
>                 Key: HIVE-8467
>                 URL: https://issues.apache.org/jira/browse/HIVE-8467
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Rajat Venkatesh
>         Attachments: Table Copies.pdf
>
>
> Traditionally, Hive and other tools in the Hadoop eco-system havent required 
> a load stage. However, with recent developments, Hive is much more performant 
> when data is stored in specific formats like ORC, Parquet, Avro etc. 
> Technologies like Presto, also work much better with certain data formats. At 
> the same time, data is generated or obtained from 3rd parties in non-optimal 
> formats such as CSV, tab-limited or JSON. Many a times, its not an option to 
> change the data format at the source. We've found that users either use 
> sub-optimal formats or spend a large amount of effort creating and 
> maintaining copies. We want to propose a new construct - Table Copy - to help 
> “load” data into an optimal storage format.
> I am going to attach a PDF document with a lot more details especially 
> addressing how is this different from bulk loads in relational DBs or 
> materialized views.
> Looking forward to hear if others see a similar need to formalize conversion 
> of data to different storage formats.  If yes, are the details in the PDF 
> document a good start ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to