[jira] [Updated] (HAWQ-450) Schema auto discovery on HDFS

Shivram Mani (JIRA) Mon, 21 Mar 2016 14:20:04 -0700

     [ 
https://issues.apache.org/jira/browse/HAWQ-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shivram Mani updated HAWQ-450:
------------------------------
    Description: 
File formats such as avro,json have the schema information along with the data. 
Other formats such as text/CSV schema inference is a bit more complex. This can 
be broken down to individual subtasks corresponding to specific file formats.
Introduce additional parameters in the PXF api inferSchema, header in order to 
auo discover schema.
Spark provides a similar option eg: https://github.com/databricks/spark-csv 
provides options for schema inference

The idea is to eventually expose metadata information of the underlying file on 
HDFS via the /getMetdata API https://issues.apache.org/jira/browse/HAWQ-459

  was:
File formats such as avro,json have the schema information along with the data. 
Other formats such as text/CSV schema inference is a bit more complex.
Introduce additional parameters in the PXF api inferSchema, header in order to 
auo discover schema.
Spark provides a similar option eg: https://github.com/databricks/spark-csv 
provides options for schema inference

The idea is to eventually expose metadata information of the underlying file on 
HDFS via the /getMetdata API https://issues.apache.org/jira/browse/HAWQ-459


> Schema auto discovery on HDFS
> -----------------------------
>
>                 Key: HAWQ-450
>                 URL: https://issues.apache.org/jira/browse/HAWQ-450
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: PXF
>            Reporter: Shivram Mani
>            Assignee: Goden Yao
>              Labels: gsoc2016
>
> File formats such as avro,json have the schema information along with the 
> data. Other formats such as text/CSV schema inference is a bit more complex. 
> This can be broken down to individual subtasks corresponding to specific file 
> formats.
> Introduce additional parameters in the PXF api inferSchema, header in order 
> to auo discover schema.
> Spark provides a similar option eg: https://github.com/databricks/spark-csv 
> provides options for schema inference
> The idea is to eventually expose metadata information of the underlying file 
> on HDFS via the /getMetdata API https://issues.apache.org/jira/browse/HAWQ-459



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HAWQ-450) Schema auto discovery on HDFS

Reply via email to