[ https://issues.apache.org/jira/browse/HAWQ-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shivram Mani updated HAWQ-450: ------------------------------ Description: File formats such as avro,json have the schema information along with the data. Other formats such as text/CSV schema inference is a bit more complex. This can be broken down to individual subtasks corresponding to specific file formats. Introduce additional parameters in the PXF api inferSchema, header in order to auo discover schema. Spark provides a similar option eg: https://github.com/databricks/spark-csv provides options for schema inference The idea is to eventually expose metadata information of the underlying file on HDFS via the /getMetdata API https://issues.apache.org/jira/browse/HAWQ-459 was: File formats such as avro,json have the schema information along with the data. Other formats such as text/CSV schema inference is a bit more complex. Introduce additional parameters in the PXF api inferSchema, header in order to auo discover schema. Spark provides a similar option eg: https://github.com/databricks/spark-csv provides options for schema inference The idea is to eventually expose metadata information of the underlying file on HDFS via the /getMetdata API https://issues.apache.org/jira/browse/HAWQ-459 > Schema auto discovery on HDFS > ----------------------------- > > Key: HAWQ-450 > URL: https://issues.apache.org/jira/browse/HAWQ-450 > Project: Apache HAWQ > Issue Type: New Feature > Components: PXF > Reporter: Shivram Mani > Assignee: Goden Yao > Labels: gsoc2016 > > File formats such as avro,json have the schema information along with the > data. Other formats such as text/CSV schema inference is a bit more complex. > This can be broken down to individual subtasks corresponding to specific file > formats. > Introduce additional parameters in the PXF api inferSchema, header in order > to auo discover schema. > Spark provides a similar option eg: https://github.com/databricks/spark-csv > provides options for schema inference > The idea is to eventually expose metadata information of the underlying file > on HDFS via the /getMetdata API https://issues.apache.org/jira/browse/HAWQ-459 -- This message was sent by Atlassian JIRA (v6.3.4#6332)