[ https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers updated DRILL-7279: ------------------------------- Reviewer: Arina Ielchiieva > Support provided schema for CSV without headers > ----------------------------------------------- > > Key: DRILL-7279 > URL: https://issues.apache.org/jira/browse/DRILL-7279 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.16.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Major > Fix For: 1.17.0 > > > Extend the Drill 1.16 provided schema support for the text reader to allow a > provided schema for files without headers. Behavior: > * If the file is configured to not extract headers, and a schema is provided, > and the schema has at least one column, then use the provided schema to > create individual columns. Otherwise, continue to use {{columns}} as in > previous versions. > * The columns in the schema are assumed to match left-to-right with those in > the file. > * If the schema contains more columns than the file, the extra columns take > their default values. (This occurs in schema evolution when a column is added > to newer files.) > * If the file contains more columns than the schema, then the extra columns, > at the end of the line, are ignored. This is the same behavior as occurs if > the file contains headers. > h4. Table Properties > Also adds four table properties for text files. These properties, if present, > override those defined in the format plugin configuration. The properties > allow the user to have a single "csv" config, but to have many tables with > the "csv" suffix, each with different properties. That is, the user need not > define a new plugin config, and define a new extension, just to change a file > format property. With this system, the user can have a ".csv" file with > headers; the user need not define a different suffix (usually ".csvh" in > Drill) for this case. > || Table Property || Equivalent Plugin Config Property || > | {{drill.headers}} | {{extractHeader}} | > | {{drill.skipFirstLine}} | {{skipFirstLine}} | > | {{drill.delimiter}} | {{fieldDelimiter}} | > | {{drill.commentChar}} | {{comment}}| > For each, the rules are: > * If the table property is not set, then the plugin property is used. > * If the table property is set, then the property value replaces the plugin > property value for that one specific table. > * For the delimiter, if the property value is an empty string, then this is > the same as an unset property. > * For the comment, if the property value is an empty string, then the comment > is set to the ASCII NULL, which will never match. This effectively turns off > the comment feature for this one table. > * If the delimiter or comment value is longer than a single character, only > the first character is used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)