[
https://issues.apache.org/jira/browse/HIVE-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajat Venkatesh updated HIVE-8467:
----------------------------------
Attachment: Table Copies.pdf
> Table Copy - Background, incremental data load
> ----------------------------------------------
>
> Key: HIVE-8467
> URL: https://issues.apache.org/jira/browse/HIVE-8467
> Project: Hive
> Issue Type: New Feature
> Reporter: Rajat Venkatesh
> Attachments: Table Copies.pdf
>
>
> Traditionally, Hive and other tools in the Hadoop eco-system havent required
> a load stage. However, with recent developments, Hive is much more performant
> when data is stored in specific formats like ORC, Parquet, Avro etc.
> Technologies like Presto, also work much better with certain data formats. At
> the same time, data is generated or obtained from 3rd parties in non-optimal
> formats such as CSV, tab-limited or JSON. Many a times, its not an option to
> change the data format at the source. We've found that users either use
> sub-optimal formats or spend a large amount of effort creating and
> maintaining copies. We want to propose a new construct - Table Copy - to help
> “load” data into an optimal storage format.
> I am going to attach a PDF document with a lot more details especially
> addressing how is this different from bulk loads in relational DBs or
> materialized views.
> Looking forward to hear if others see a similar need to formalize conversion
> of data to different storage formats. If yes, are the details in the PDF
> document a good start ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)