[jira] [Updated] (DRILL-7279) Support provided schema for CSV without headers
[ https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Rogers updated DRILL-7279: --- Description: Extend the Drill 1.16 provided schema support for the text reader to allow a provided schema for files without headers. Behavior: * If the file is configured to not extract headers, and a schema is provided, and the schema has at least one column, then use the provided schema to create individual columns. Otherwise, continue to use {{columns}} as in previous versions. * The columns in the schema are assumed to match left-to-right with those in the file. * If the schema contains more columns than the file, the extra columns take their default values. (This occurs in schema evolution when a column is added to newer files.) * If the file contains more columns than the schema, then the extra columns, at the end of the line, are ignored. This is the same behavior as occurs if the file contains headers. h4. Table Properties Also adds several table properties for text files. These properties, if present, override those defined in the format plugin configuration. The properties allow the user to have a single "csv" config, but to have many tables with the "csv" suffix, each with different properties. That is, the user need not define a new plugin config, and define a new extension, just to change a file format property. With this system, the user can have a ".csv" file with headers; the user need not define a different suffix (usually ".csvh" in Drill) for this case. All properties start with {{drill}}} (standard for Drill-defined properties) then "text" (because they are specific to the text reader.) The tail property name is the same as the format config property name. || Table Property || Equivalent Plugin Config Property || | {{drill.text.extractHeader}} | {{extractHeader}} | | {{drill.text.skipFirstLine}} | {{skipFirstLine}} | | {{drill.text.fieldDelimiter}} | {{fieldDelimiter}} | | {{drill.text.quote}} | {{quote}}| | {{drill.text.escape}} | {{escape}}| | {{drill.text.lineDelimiter}} | {{lineDelimiter}}| For each, the rules are: * If the table property is not set, then the plugin property is used. * If the table property is set, then the property value replaces the plugin property value for that one specific table. * For most properties, if the property value is an empty string, then this is the same as an unset property. * For the comment, if the property value is an empty string, then the comment is set to the ASCII NULL, which will never match. This effectively turns off the comment feature for this one table. * If the delimiter or comment value is longer than a single character, only the first character is used. It is possible to use the table properties without specifying a "provided" schema. Just omit any columns from the schema: {noformat} create schema () for table `dfs.data`.`example` PROPERTIES ('drill.text.extractHeader'='false', 'drill.text.skipFirstLine'='false', 'drill.text.fieldDelimiter'='|') {noformat} The field and line delimiters are sometimes a non-printable character. Drill (via Calcite) already supports the following syntax: * Standard escapes: {{\n}}, {{\r}}, {{\t}}, perhaps others. * Two-byte (ASCII) codes: {{\01}} * Four-byte (Unicode) codes: {{\u0001}} Note that, although Drill supports Unicode escapes, the text reader itself supports only single-byte characters for the delimiter and escape properties. was: Extend the Drill 1.16 provided schema support for the text reader to allow a provided schema for files without headers. Behavior: * If the file is configured to not extract headers, and a schema is provided, and the schema has at least one column, then use the provided schema to create individual columns. Otherwise, continue to use {{columns}} as in previous versions. * The columns in the schema are assumed to match left-to-right with those in the file. * If the schema contains more columns than the file, the extra columns take their default values. (This occurs in schema evolution when a column is added to newer files.) * If the file contains more columns than the schema, then the extra columns, at the end of the line, are ignored. This is the same behavior as occurs if the file contains headers. h4. Table Properties Also adds four table properties for text files. These properties, if present, override those defined in the format plugin configuration. The properties allow the user to have a single "csv" config, but to have many tables with the "csv" suffix, each with different properties. That is, the user need not define a new plugin config, and define a new extension, just to change a file format property. With this system, the user can have a ".csv" file with headers; the user need not define a different suffix (usually ".csvh" in Drill) for this case. || Table Property || Equivalent Plugin Config
[jira] [Commented] (DRILL-7288) IndexOutOfBoundsException when coalesce(dir0,'')
[ https://issues.apache.org/jira/browse/DRILL-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858714#comment-16858714 ] benj commented on DRILL-7288: - Please note that when using a column instead constant there is no problem: {code:java} SELECT 1 AS col1 ,CAST(CASE WHEN COALESCE(dir0,'abc')=COALESCE(dir0,'abc') THEN `columnDate` END AS DATE) AS record_date FROM LIMIT 2; => OK {code} unless adding * in the request: {code:java} SELECT 1 AS col1 ,CAST(CASE WHEN COALESCE(dir0,'abc')=COALESCE(dir0,'abc') THEN `columnDate` END AS DATE) AS record_date , * /* ADDING all the fields */ FROM LIMIT 2; => Error: SYSTEM ERROR: NumberFormatException: abc {code} ... > IndexOutOfBoundsException when coalesce(dir0,'') > > > Key: DRILL-7288 > URL: https://issues.apache.org/jira/browse/DRILL-7288 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: benj >Priority: Minor > > Example of request running in 1.15 and not in 1.16: > {code:java} > SELECT 1 AS col1 > ,CAST(CASE WHEN COALESCE(dir0,'')=COALESCE(dir0,'') THEN '2017-03-31' END AS > DATE) AS record_date > FROM > LIMIT 2; > {code} > in 1.15 (it's OK): > {code:java} > ++-+ > | col1 | record_date | > ++-+ > | 1 | 2017-03-31 | > | 1 | 2017-03-31 | > ++-+ > {code} > in 1.16 (it's NOK): > {code:java} > Error: SYSTEM ERROR: IndexOutOfBoundsException: Index 1 out of bounds for > length 0 > {code} > Surprisingly, by removing at least one of the _coalesce_, it works (but the > request won't produce expected result if dir0 is null) > {code:java} > SELECT 1 AS col1 > ,CAST(CASE WHEN dir0=dir0 THEN '2017-03-31' END AS DATE) AS record_date > FROM > LIMIT 2; > => OK{code} > Note that trick was used to force the mode to be NULLABLE. Fortunately, it is > also possible to use more simply NULLIF (see below) > {code:java} > SELECT 1 AS col1 > ,CAST(NULLIF('2017-03-31','')AS DATE) AS record_date > FROM > LIMIT 2; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7288) IndexOutOfBoundsException when coalesce(dir0,'')
benj created DRILL-7288: --- Summary: IndexOutOfBoundsException when coalesce(dir0,'') Key: DRILL-7288 URL: https://issues.apache.org/jira/browse/DRILL-7288 Project: Apache Drill Issue Type: Bug Affects Versions: 1.16.0 Reporter: benj Example of request running in 1.15 and not in 1.16: {code:java} SELECT 1 AS col1 ,CAST(CASE WHEN COALESCE(dir0,'')=COALESCE(dir0,'') THEN '2017-03-31' END AS DATE) AS record_date FROM LIMIT 2; {code} in 1.15 (it's OK): {code:java} ++-+ | col1 | record_date | ++-+ | 1 | 2017-03-31 | | 1 | 2017-03-31 | ++-+ {code} in 1.16 (it's NOK): {code:java} Error: SYSTEM ERROR: IndexOutOfBoundsException: Index 1 out of bounds for length 0 {code} Surprisingly, by removing at least one of the _coalesce_, it works (but the request won't produce expected result if dir0 is null) {code:java} SELECT 1 AS col1 ,CAST(CASE WHEN dir0=dir0 THEN '2017-03-31' END AS DATE) AS record_date FROM LIMIT 2; => OK{code} Note that trick was used to force the mode to be NULLABLE. Fortunately, it is also possible to use more simply NULLIF (see below) {code:java} SELECT 1 AS col1 ,CAST(NULLIF('2017-03-31','')AS DATE) AS record_date FROM LIMIT 2; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7251) Read Hive array w/o nulls
[ https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858635#comment-16858635 ] ASF GitHub Bot commented on DRILL-7251: --- vvysotskyi commented on pull request #1799: DRILL-7251: Read Hive array w/o nulls URL: https://github.com/apache/drill/pull/1799 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Read Hive array w/o nulls > - > > Key: DRILL-7251 > URL: https://issues.apache.org/jira/browse/DRILL-7251 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Reporter: Igor Guzenko >Assignee: Igor Guzenko >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
[ https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Volodymyr Vysotskyi updated DRILL-7271: --- Description: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location (String) (for directory level metadata) - directory location fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set location; -> locations {noformat} org.apache.drill.metastore.FileMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} org.apache.drill.metastore.RowGroupMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class path - path to file fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Path location; - should contain directory to which file belongs {noformat} 8. Remove org.apache.drill.exec package from metastore module. 9. Rename ColumnStatisticsImpl class. 10. Separate existing classes in org.apache.drill.metastore package into sub-packages. 11. Rename FileTableMetadata -> BaseTableMetadata 12. TableMetadataProvider.getNonInterestingColumnsMeta() -> getNonInterestingColumnsMetadata 13. Introduce segment-level metadata class: {noformat} class SegmentMetadata { TableInfo tableInfo; MetadataInfo metadataInfo; SchemaPath column; TupleMetadata schema; String location; Map columnsStatistics; Map statistics; List partitionValues; List locations; long lastModifiedTime; } {noformat} was: 1. Merge info from metadataStatistics + statisticsKinds into one holder: Map. 2. Rename hasStatistics to hasDescriptiveStatistics 3. Remove drill-file-metastore-plugin 4. Move org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel to metadata module, rename to MetadataType and add new value: DIRECTORY. 5. Add JSON ser/de for ColumnStatistics, StatisticsHolder. 6. Add new info classes: {noformat} class TableInfo { String storagePlugin; String workspace; String name; String type; String owner; } class MetadataInfo { public static final String GENERAL_INFO_KEY = "GENERAL_INFO"; public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION"; MetadataType type (enum); String key; String identifier; } {noformat} 7. Modify existing metadata classes: org.apache.drill.metastore.FileTableMetadata {noformat} missing fields -- storagePlugin, workspace, tableType -> will be covered by TableInfo class metadataType, metadataKey -> will be covered by MetadataInfo class interestingColumns fields to modify private final Map tableStatistics; private final Map statisticsKinds; private final Set partitionKeys; -> Map {noformat} org.apache.drill.metastore.PartitionMetadata {noformat} missing fields -- storagePlugin, workspace -> will be covered by TableInfo class metadataType, metadataKey, metadataIdentifier -> will be covered by MetadataInfo class partitionValues (List) location
[jira] [Resolved] (DRILL-7158) null values for varchar, interval, boolean are displayed as empty string in SqlLine
[ https://issues.apache.org/jira/browse/DRILL-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-7158. - Resolution: Fixed Resolved with 2766e653cda8b1de817b234c66b0058e707750d0 commit id. > null values for varchar, interval, boolean are displayed as empty string in > SqlLine > --- > > Key: DRILL-7158 > URL: https://issues.apache.org/jira/browse/DRILL-7158 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Fix For: 1.17.0 > > > null values for varchar, interval, boolean are displayed as empty string in > SqlLine. > Caused by SqlLine bug: [https://github.com/julianhyde/sqlline/issues/288] > Possible workaround to set nullValue other case than lower: {{!set nullValue > Null}}. > Should be fixed in the next SqlLine upgrade (to 1.8.0) when prior fixed in > SqlLine. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7198) Issuing a control-C in Sqlline exits the session (it does cancel the query)
[ https://issues.apache.org/jira/browse/DRILL-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-7198. - Resolution: Fixed Resolved with 2766e653cda8b1de817b234c66b0058e707750d0 commit id. > Issuing a control-C in Sqlline exits the session (it does cancel the query) > --- > > Key: DRILL-7198 > URL: https://issues.apache.org/jira/browse/DRILL-7198 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0, 1.16.0 >Reporter: Aman Sinha >Assignee: Volodymyr Vysotskyi >Priority: Major > Fix For: 1.17.0 > > > This behavior is observed both in Drill 1.15.0 and the RC1 of 1.16.0. Run a > long-running query in sqlline and cancel it using control-c. It exits the > sqlline session although it does cancel the query. Behavior is seen in both > embedded mode and distributed mode. If the query is submitted through > sqlline and cancelled from the Web UI, it does behave correctly..the session > does not get killed and subsequent queries can be submitted in the same > sqlline session. > Same query in Drill 1.14.0 works correctly and returns the column headers > while canceling the query. > Since the query can be cancelled just fine through the Web UI, I am not > considering this a blocker for 1.16. Very likely the sqlline upgrade in > 1.15.0 changed the behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-7262) Parse Error appears on attempting to run several SQL queries at the same time in SQLLine
[ https://issues.apache.org/jira/browse/DRILL-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-7262. - Resolution: Fixed Resolved with 2766e653cda8b1de817b234c66b0058e707750d0 commit id. > Parse Error appears on attempting to run several SQL queries at the same time > in SQLLine > > > Key: DRILL-7262 > URL: https://issues.apache.org/jira/browse/DRILL-7262 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Pavel Semenov >Assignee: Volodymyr Vysotskyi >Priority: Minor > Fix For: 1.17.0 > > Attachments: 2.png > > > *STEPS TO REPRODUCE* > # Run SqlLine > # Submit several SQL queries at the same time, e.g. (select * from > sys.version; select * from sys.version; ) > # Observe the result > *EXPECTED RESULT* > Several query results appear > *ACTUAL RESULT* > Parse Error appears > !2.png|thumbnail! > *ADDITIONAL INFO* > Current issue will be fixed in scope of > [https://github.com/julianhyde/sqlline/pull/297] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7261) Simplify Easy format config for new scan framework
[ https://issues.apache.org/jira/browse/DRILL-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858473#comment-16858473 ] ASF GitHub Bot commented on DRILL-7261: --- asfgit commented on pull request #1796: DRILL-7261: Simplify Easy framework config URL: https://github.com/apache/drill/pull/1796 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Simplify Easy format config for new scan framework > -- > > Key: DRILL-7261 > URL: https://issues.apache.org/jira/browse/DRILL-7261 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > Rollup of related CSV V3 fixes along with supporting row set framework fixes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7236) SqlLine 1.8 upgrade
[ https://issues.apache.org/jira/browse/DRILL-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858476#comment-16858476 ] ASF GitHub Bot commented on DRILL-7236: --- asfgit commented on pull request #1804: DRILL-7236: SqlLine 1.8 upgrade URL: https://github.com/apache/drill/pull/1804 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > SqlLine 1.8 upgrade > --- > > Key: DRILL-7236 > URL: https://issues.apache.org/jira/browse/DRILL-7236 > Project: Apache Drill > Issue Type: Task >Affects Versions: 1.16.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > > SqlLine 1.8 upgrade -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7258) [Text V3 Reader] Unsupported operation error is thrown when select a column with a long string
[ https://issues.apache.org/jira/browse/DRILL-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858472#comment-16858472 ] ASF GitHub Bot commented on DRILL-7258: --- asfgit commented on pull request #1802: DRILL-7258: Remove field width limit for text reader URL: https://github.com/apache/drill/pull/1802 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Text V3 Reader] Unsupported operation error is thrown when select a column > with a long string > -- > > Key: DRILL-7258 > URL: https://issues.apache.org/jira/browse/DRILL-7258 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.16.0 >Reporter: Anton Gozhiy >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > Attachments: 10.tbl > > > *Data:* > 10.tbl is attached > *Steps:* > # Set exec.storage.enable_v3_text_reader=true > # Run the following query: > {code:sql} > select * from dfs.`/tmp/drill/data/10.tbl` > {code} > *Expected result:* > The query should return result normally. > *Actual result:* > Exception is thrown: > {noformat} > UNSUPPORTED_OPERATION ERROR: Drill Remote Exception > (java.lang.Exception) UNSUPPORTED_OPERATION ERROR: Text column is too large. > Column 0 > Limit 65536 > Fragment 0:0 > [Error Id: 5f73232f-f0c0-48aa-ab0f-b5f86495d3c8 on userf87d-pc:31010] > org.apache.drill.common.exceptions.UserException$Builder.build():630 > > org.apache.drill.exec.store.easy.text.compliant.v3.BaseFieldOutput.append():131 > > org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValueAll():208 > > org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValue():225 > > org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseField():341 > > org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseRecord():137 > > org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseNext():388 > > org.apache.drill.exec.store.easy.text.compliant.v3.CompliantTextBatchReader.next():220 > > org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132 > org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():397 > org.apache.drill.exec.physical.impl.scan.ReaderState.next():354 > org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():184 > org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():159 > org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():176 > org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():114 > > org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():147 > org.apache.drill.exec.record.AbstractRecordBatch.next():126 > org.apache.drill.exec.record.AbstractRecordBatch.next():116 > org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141 > org.apache.drill.exec.record.AbstractRecordBatch.next():186 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283 > ...():0 > org.apache.hadoop.security.UserGroupInformation.doAs():1746 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():283 > org.apache.drill.common.SelfCleaningRunnable.run():38 > ...():0 > {noformat} > *Note:* works fine with v2 reader. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers
[ https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858474#comment-16858474 ] ASF GitHub Bot commented on DRILL-7279: --- asfgit commented on pull request #1798: DRILL-7279: Enable provided schema for text files without headers URL: https://github.com/apache/drill/pull/1798 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support provided schema for CSV without headers > --- > > Key: DRILL-7279 > URL: https://issues.apache.org/jira/browse/DRILL-7279 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.17.0 > > > Extend the Drill 1.16 provided schema support for the text reader to allow a > provided schema for files without headers. Behavior: > * If the file is configured to not extract headers, and a schema is provided, > and the schema has at least one column, then use the provided schema to > create individual columns. Otherwise, continue to use {{columns}} as in > previous versions. > * The columns in the schema are assumed to match left-to-right with those in > the file. > * If the schema contains more columns than the file, the extra columns take > their default values. (This occurs in schema evolution when a column is added > to newer files.) > * If the file contains more columns than the schema, then the extra columns, > at the end of the line, are ignored. This is the same behavior as occurs if > the file contains headers. > h4. Table Properties > Also adds four table properties for text files. These properties, if present, > override those defined in the format plugin configuration. The properties > allow the user to have a single "csv" config, but to have many tables with > the "csv" suffix, each with different properties. That is, the user need not > define a new plugin config, and define a new extension, just to change a file > format property. With this system, the user can have a ".csv" file with > headers; the user need not define a different suffix (usually ".csvh" in > Drill) for this case. > || Table Property || Equivalent Plugin Config Property || > | {{drill.headers}} | {{extractHeader}} | > | {{drill.skipFirstLine}} | {{skipFirstLine}} | > | {{drill.delimiter}} | {{fieldDelimiter}} | > | {{drill.commentChar}} | {{comment}}| > For each, the rules are: > * If the table property is not set, then the plugin property is used. > * If the table property is set, then the property value replaces the plugin > property value for that one specific table. > * For the delimiter, if the property value is an empty string, then this is > the same as an unset property. > * For the comment, if the property value is an empty string, then the comment > is set to the ASCII NULL, which will never match. This effectively turns off > the comment feature for this one table. > * If the delimiter or comment value is longer than a single character, only > the first character is used. > It is possible to use the table properties without specifying a "provided" > schema. Just omit any columns from the schema: > {noformat} > create schema () for table `dfs.data`.`example` > PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', > 'drill.delimiter'='|') > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism
[ https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858475#comment-16858475 ] ASF GitHub Bot commented on DRILL-7278: --- asfgit commented on pull request #1797: DRILL-7278: Refactor result set loader projection mechanism URL: https://github.com/apache/drill/pull/1797 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor result set loader projection mechanism > --- > > Key: DRILL-7278 > URL: https://issues.apache.org/jira/browse/DRILL-7278 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.16.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.17.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)