[jira] [Updated] (DRILL-7279) Support provided schema for CSV without headers

2019-06-07 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7279:
---
Description: 
Extend the Drill 1.16 provided schema support for the text reader to allow a 
provided schema for files without headers. Behavior:

* If the file is configured to not extract headers, and a schema is provided, 
and the schema has at least one column, then use the provided schema to create 
individual columns. Otherwise, continue to use {{columns}} as in previous 
versions.
* The columns in the schema are assumed to match left-to-right with those in 
the file.
* If the schema contains more columns than the file, the extra columns take 
their default values. (This occurs in schema evolution when a column is added 
to newer files.)
* If the file contains more columns than the schema, then the extra columns, at 
the end of the line, are ignored. This is the same behavior as occurs if the 
file contains headers.

h4. Table Properties

Also adds several table properties for text files. These properties, if 
present, override those defined in the format plugin configuration. The 
properties allow the user to have a single "csv" config, but to have many 
tables with the "csv" suffix, each with different properties. That is, the user 
need not define a new plugin config, and define a new extension, just to change 
a file format property. With this system, the user can have a ".csv" file with 
headers; the user need not define a different suffix (usually ".csvh" in Drill) 
for this case.

All properties start with {{drill}}} (standard for Drill-defined properties) 
then "text" (because they are specific to the text reader.) The tail property 
name is the same as the format config property name.

|| Table Property || Equivalent Plugin Config Property ||
| {{drill.text.extractHeader}} | {{extractHeader}} |
| {{drill.text.skipFirstLine}} |  {{skipFirstLine}} | 
| {{drill.text.fieldDelimiter}} |  {{fieldDelimiter}} | 
| {{drill.text.quote}} |  {{quote}}| 
| {{drill.text.escape}} |  {{escape}}| 
| {{drill.text.lineDelimiter}} |  {{lineDelimiter}}| 

For each, the rules are:

* If the table property is not set, then the plugin property is used.
* If the table property is set, then the property value replaces the plugin 
property value for that one specific table.
* For most properties, if the property value is an empty string, then this is 
the same as an unset property.
* For the comment, if the property value is an empty string, then the comment 
is set to the ASCII NULL, which will never match. This effectively turns off 
the comment feature for this one table.
* If the delimiter or comment value is longer than a single character, only the 
first character is used.

It is possible to use the table properties without specifying a "provided" 
schema. Just omit any columns from the schema:

{noformat}
create schema () for table `dfs.data`.`example`
PROPERTIES ('drill.text.extractHeader'='false', 
'drill.text.skipFirstLine'='false', 'drill.text.fieldDelimiter'='|')
{noformat}

The field and line delimiters are sometimes a non-printable character. Drill 
(via Calcite) already supports the following syntax:

* Standard escapes: {{\n}}, {{\r}}, {{\t}}, perhaps others.
* Two-byte (ASCII) codes: {{\01}}
* Four-byte (Unicode) codes: {{\u0001}}

Note that, although Drill supports Unicode escapes, the text reader itself 
supports only single-byte characters for the delimiter and escape properties.

  was:
Extend the Drill 1.16 provided schema support for the text reader to allow a 
provided schema for files without headers. Behavior:

* If the file is configured to not extract headers, and a schema is provided, 
and the schema has at least one column, then use the provided schema to create 
individual columns. Otherwise, continue to use {{columns}} as in previous 
versions.
* The columns in the schema are assumed to match left-to-right with those in 
the file.
* If the schema contains more columns than the file, the extra columns take 
their default values. (This occurs in schema evolution when a column is added 
to newer files.)
* If the file contains more columns than the schema, then the extra columns, at 
the end of the line, are ignored. This is the same behavior as occurs if the 
file contains headers.

h4. Table Properties

Also adds four table properties for text files. These properties, if present, 
override those defined in the format plugin configuration. The properties allow 
the user to have a single "csv" config, but to have many tables with the "csv" 
suffix, each with different properties. That is, the user need not define a new 
plugin config, and define a new extension, just to change a file format 
property. With this system, the user can have a ".csv" file with headers; the 
user need not define a different suffix (usually ".csvh" in Drill) for this 
case.

|| Table Property || Equivalent Plugin Config 

[jira] [Commented] (DRILL-7288) IndexOutOfBoundsException when coalesce(dir0,'')

2019-06-07 Thread benj (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858714#comment-16858714
 ] 

benj commented on DRILL-7288:
-

Please note that when using a column instead constant there is no problem:
{code:java}
SELECT 1 AS col1
,CAST(CASE WHEN COALESCE(dir0,'abc')=COALESCE(dir0,'abc') THEN `columnDate` END 
AS DATE) AS record_date
FROM 
LIMIT 2;
=> OK
{code}
unless adding * in the request:
{code:java}
SELECT 1 AS col1
,CAST(CASE WHEN COALESCE(dir0,'abc')=COALESCE(dir0,'abc') THEN `columnDate` END 
AS DATE) AS record_date
, * /* ADDING all the fields */
FROM 
LIMIT 2;
=> Error: SYSTEM ERROR: NumberFormatException: abc
{code}
...

 

> IndexOutOfBoundsException when coalesce(dir0,'')
> 
>
> Key: DRILL-7288
> URL: https://issues.apache.org/jira/browse/DRILL-7288
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: benj
>Priority: Minor
>
> Example of request running in 1.15 and not in 1.16:
> {code:java}
> SELECT 1 AS col1
> ,CAST(CASE WHEN COALESCE(dir0,'')=COALESCE(dir0,'') THEN '2017-03-31' END AS 
> DATE) AS record_date
> FROM 
> LIMIT 2;
> {code}
> in 1.15 (it's OK):
> {code:java}
> ++-+
> | col1   | record_date |
> ++-+
> | 1  | 2017-03-31  |
> | 1  | 2017-03-31  |
> ++-+
> {code}
> in 1.16 (it's NOK):
> {code:java}
> Error: SYSTEM ERROR: IndexOutOfBoundsException: Index 1 out of bounds for 
> length 0
> {code}
> Surprisingly, by removing at least one of the _coalesce_, it works (but the 
> request won't produce expected result if dir0 is null)
> {code:java}
> SELECT 1 AS col1
> ,CAST(CASE WHEN dir0=dir0 THEN '2017-03-31' END AS DATE) AS record_date
> FROM 
> LIMIT 2;
> => OK{code}
> Note that trick was used to force the mode to be NULLABLE. Fortunately, it is 
> also possible to use more simply NULLIF (see below)
> {code:java}
> SELECT 1 AS col1
> ,CAST(NULLIF('2017-03-31','')AS DATE) AS record_date
> FROM 
> LIMIT 2;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7288) IndexOutOfBoundsException when coalesce(dir0,'')

2019-06-07 Thread benj (JIRA)
benj created DRILL-7288:
---

 Summary: IndexOutOfBoundsException when coalesce(dir0,'')
 Key: DRILL-7288
 URL: https://issues.apache.org/jira/browse/DRILL-7288
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: benj


Example of request running in 1.15 and not in 1.16:
{code:java}
SELECT 1 AS col1
,CAST(CASE WHEN COALESCE(dir0,'')=COALESCE(dir0,'') THEN '2017-03-31' END AS 
DATE) AS record_date
FROM 
LIMIT 2;
{code}
in 1.15 (it's OK):
{code:java}
++-+
| col1   | record_date |
++-+
| 1  | 2017-03-31  |
| 1  | 2017-03-31  |
++-+
{code}
in 1.16 (it's NOK):
{code:java}
Error: SYSTEM ERROR: IndexOutOfBoundsException: Index 1 out of bounds for 
length 0
{code}
Surprisingly, by removing at least one of the _coalesce_, it works (but the 
request won't produce expected result if dir0 is null)

{code:java}
SELECT 1 AS col1
,CAST(CASE WHEN dir0=dir0 THEN '2017-03-31' END AS DATE) AS record_date
FROM 
LIMIT 2;

=> OK{code}
Note that trick was used to force the mode to be NULLABLE. Fortunately, it is 
also possible to use more simply NULLIF (see below)

{code:java}
SELECT 1 AS col1
,CAST(NULLIF('2017-03-31','')AS DATE) AS record_date
FROM 
LIMIT 2;
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7251) Read Hive array w/o nulls

2019-06-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858635#comment-16858635
 ] 

ASF GitHub Bot commented on DRILL-7251:
---

vvysotskyi commented on pull request #1799: DRILL-7251: Read Hive array w/o 
nulls
URL: https://github.com/apache/drill/pull/1799
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive array w/o nulls
> -
>
> Key: DRILL-7251
> URL: https://issues.apache.org/jira/browse/DRILL-7251
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7271) Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

2019-06-07 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7271:
---
Description: 
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location (String) (for directory level metadata) - directory location

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set location; -> locations
{noformat}

org.apache.drill.metastore.FileMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
org.apache.drill.metastore.RowGroupMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
path - path to file 

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Path location; - should contain directory to which file belongs
{noformat}
8. Remove org.apache.drill.exec package from metastore module.
9. Rename ColumnStatisticsImpl class.
10. Separate existing classes in org.apache.drill.metastore package into 
sub-packages.
11. Rename FileTableMetadata -> BaseTableMetadata
12. TableMetadataProvider.getNonInterestingColumnsMeta() -> 
getNonInterestingColumnsMetadata
13. Introduce segment-level metadata class:
{noformat}
class SegmentMetadata {
  TableInfo tableInfo;
  MetadataInfo metadataInfo;
  SchemaPath column;
  TupleMetadata schema;
  String location;
  Map columnsStatistics;
  Map statistics;
  List partitionValues;
  List locations;
  long lastModifiedTime;
}
{noformat}

  was:
1. Merge info from metadataStatistics + statisticsKinds into one holder: 
Map.
2. Rename hasStatistics to hasDescriptiveStatistics
3. Remove drill-file-metastore-plugin
4. Move  
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata.MetadataLevel 
to metadata module, rename to MetadataType and add new value: DIRECTORY.
5. Add JSON ser/de for ColumnStatistics, StatisticsHolder.
6. Add new info classes:
{noformat}
class TableInfo {
  String storagePlugin;
  String workspace;
  String name;
  String type;
  String owner;
}

class MetadataInfo {

  public static final String GENERAL_INFO_KEY = "GENERAL_INFO";
  public static final String DEFAULT_PARTITION_KEY = "DEFAULT_PARTITION";

  MetadataType type (enum);
  String key;
  String identifier;
}
{noformat}
7. Modify existing metadata classes:
org.apache.drill.metastore.FileTableMetadata
{noformat}
missing fields
--
storagePlugin, workspace, tableType -> will be covered by TableInfo class
metadataType, metadataKey -> will be covered by MetadataInfo class
interestingColumns

fields to modify

private final Map tableStatistics;
private final Map statisticsKinds;
private final Set partitionKeys; -> Map
{noformat}

org.apache.drill.metastore.PartitionMetadata
{noformat}
missing fields
--
storagePlugin, workspace -> will be covered by TableInfo class
metadataType, metadataKey, metadataIdentifier -> will be covered by 
MetadataInfo class
partitionValues (List)
location 

[jira] [Resolved] (DRILL-7158) null values for varchar, interval, boolean are displayed as empty string in SqlLine

2019-06-07 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-7158.
-
Resolution: Fixed

Resolved with 2766e653cda8b1de817b234c66b0058e707750d0 commit id.

> null values for varchar, interval, boolean are displayed as empty string in 
> SqlLine
> ---
>
> Key: DRILL-7158
> URL: https://issues.apache.org/jira/browse/DRILL-7158
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> null values for varchar, interval, boolean are displayed as empty string in 
> SqlLine.
> Caused by SqlLine bug: [https://github.com/julianhyde/sqlline/issues/288]
> Possible workaround to set nullValue other case than lower: {{!set nullValue 
> Null}}.
> Should be fixed in the next SqlLine upgrade (to 1.8.0) when prior fixed in 
> SqlLine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7198) Issuing a control-C in Sqlline exits the session (it does cancel the query)

2019-06-07 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-7198.
-
Resolution: Fixed

Resolved with 2766e653cda8b1de817b234c66b0058e707750d0 commit id.

> Issuing a control-C in Sqlline exits the session (it does cancel the query)
> ---
>
> Key: DRILL-7198
> URL: https://issues.apache.org/jira/browse/DRILL-7198
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0, 1.16.0
>Reporter: Aman Sinha
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.17.0
>
>
> This behavior is observed both in Drill 1.15.0 and the RC1 of 1.16.0.   Run a 
> long-running query in sqlline and cancel it using control-c.  It exits the 
> sqlline session although it does cancel the query.  Behavior is seen in both 
> embedded mode and distributed mode.  If the query is submitted through 
> sqlline  and cancelled from the Web UI, it does behave correctly..the session 
> does not get killed and subsequent queries can be submitted in the same 
> sqlline session. 
> Same query in Drill 1.14.0 works correctly and returns the column headers 
> while canceling the query. 
> Since the query can be cancelled just fine through the Web UI,  I am not 
> considering this a blocker for 1.16.   Very likely the sqlline upgrade in 
> 1.15.0 changed the behavior.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7262) Parse Error appears on attempting to run several SQL queries at the same time in SQLLine

2019-06-07 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-7262.
-
Resolution: Fixed

Resolved with 2766e653cda8b1de817b234c66b0058e707750d0 commit id.

> Parse Error appears on attempting to run several SQL queries at the same time 
> in SQLLine
> 
>
> Key: DRILL-7262
> URL: https://issues.apache.org/jira/browse/DRILL-7262
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Volodymyr Vysotskyi
>Priority: Minor
> Fix For: 1.17.0
>
> Attachments: 2.png
>
>
> *STEPS TO REPRODUCE*
> # Run SqlLine
> # Submit several SQL queries at the same time, e.g. (select * from 
> sys.version; select * from sys.version; ) 
> # Observe the result
> *EXPECTED RESULT*
> Several query results appear 
> *ACTUAL RESULT*
> Parse Error appears 
> !2.png|thumbnail!
> *ADDITIONAL INFO*
> Current issue will be fixed in scope of 
> [https://github.com/julianhyde/sqlline/pull/297]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7261) Simplify Easy format config for new scan framework

2019-06-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858473#comment-16858473
 ] 

ASF GitHub Bot commented on DRILL-7261:
---

asfgit commented on pull request #1796: DRILL-7261: Simplify Easy framework 
config
URL: https://github.com/apache/drill/pull/1796
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Simplify Easy format config for new scan framework
> --
>
> Key: DRILL-7261
> URL: https://issues.apache.org/jira/browse/DRILL-7261
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Rollup of related CSV V3 fixes along with supporting row set framework fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7236) SqlLine 1.8 upgrade

2019-06-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858476#comment-16858476
 ] 

ASF GitHub Bot commented on DRILL-7236:
---

asfgit commented on pull request #1804: DRILL-7236: SqlLine 1.8 upgrade
URL: https://github.com/apache/drill/pull/1804
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SqlLine 1.8 upgrade
> ---
>
> Key: DRILL-7236
> URL: https://issues.apache.org/jira/browse/DRILL-7236
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> SqlLine 1.8 upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7258) [Text V3 Reader] Unsupported operation error is thrown when select a column with a long string

2019-06-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858472#comment-16858472
 ] 

ASF GitHub Bot commented on DRILL-7258:
---

asfgit commented on pull request #1802: DRILL-7258: Remove field width limit 
for text reader
URL: https://github.com/apache/drill/pull/1802
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Text V3 Reader] Unsupported operation error is thrown when select a column 
> with a long string
> --
>
> Key: DRILL-7258
> URL: https://issues.apache.org/jira/browse/DRILL-7258
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anton Gozhiy
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
> Attachments: 10.tbl
>
>
> *Data:*
> 10.tbl is attached
> *Steps:*
> # Set exec.storage.enable_v3_text_reader=true
> # Run the following query:
> {code:sql}
> select * from dfs.`/tmp/drill/data/10.tbl`
> {code}
> *Expected result:*
> The query should return result normally.
> *Actual result:*
> Exception is thrown:
> {noformat}
> UNSUPPORTED_OPERATION ERROR: Drill Remote Exception
>   (java.lang.Exception) UNSUPPORTED_OPERATION ERROR: Text column is too large.
> Column 0
> Limit 65536
> Fragment 0:0
> [Error Id: 5f73232f-f0c0-48aa-ab0f-b5f86495d3c8 on userf87d-pc:31010]
> org.apache.drill.common.exceptions.UserException$Builder.build():630
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.BaseFieldOutput.append():131
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValueAll():208
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseValue():225
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseField():341
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseRecord():137
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.TextReader.parseNext():388
> 
> org.apache.drill.exec.store.easy.text.compliant.v3.CompliantTextBatchReader.next():220
> 
> org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader.next():132
> org.apache.drill.exec.physical.impl.scan.ReaderState.readBatch():397
> org.apache.drill.exec.physical.impl.scan.ReaderState.next():354
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.nextAction():184
> org.apache.drill.exec.physical.impl.scan.ScanOperatorExec.next():159
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.doNext():176
> org.apache.drill.exec.physical.impl.protocol.OperatorDriver.next():114
> 
> org.apache.drill.exec.physical.impl.protocol.OperatorRecordBatch.next():147
> org.apache.drill.exec.record.AbstractRecordBatch.next():126
> org.apache.drill.exec.record.AbstractRecordBatch.next():116
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():186
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
> ...():0
> org.apache.hadoop.security.UserGroupInformation.doAs():1746
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
> org.apache.drill.common.SelfCleaningRunnable.run():38
> ...():0
> {noformat}
> *Note:* works fine with v2 reader. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7279) Support provided schema for CSV without headers

2019-06-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858474#comment-16858474
 ] 

ASF GitHub Bot commented on DRILL-7279:
---

asfgit commented on pull request #1798: DRILL-7279: Enable provided schema for 
text files without headers
URL: https://github.com/apache/drill/pull/1798
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support provided schema for CSV without headers
> ---
>
> Key: DRILL-7279
> URL: https://issues.apache.org/jira/browse/DRILL-7279
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.17.0
>
>
> Extend the Drill 1.16 provided schema support for the text reader to allow a 
> provided schema for files without headers. Behavior:
> * If the file is configured to not extract headers, and a schema is provided, 
> and the schema has at least one column, then use the provided schema to 
> create individual columns. Otherwise, continue to use {{columns}} as in 
> previous versions.
> * The columns in the schema are assumed to match left-to-right with those in 
> the file.
> * If the schema contains more columns than the file, the extra columns take 
> their default values. (This occurs in schema evolution when a column is added 
> to newer files.)
> * If the file contains more columns than the schema, then the extra columns, 
> at the end of the line, are ignored. This is the same behavior as occurs if 
> the file contains headers.
> h4. Table Properties
> Also adds four table properties for text files. These properties, if present, 
> override those defined in the format plugin configuration. The properties 
> allow the user to have a single "csv" config, but to have many tables with 
> the "csv" suffix, each with different properties. That is, the user need not 
> define a new plugin config, and define a new extension, just to change a file 
> format property. With this system, the user can have a ".csv" file with 
> headers; the user need not define a different suffix (usually ".csvh" in 
> Drill) for this case.
> || Table Property || Equivalent Plugin Config Property ||
> | {{drill.headers}} | {{extractHeader}} |
> | {{drill.skipFirstLine}} |  {{skipFirstLine}} | 
> | {{drill.delimiter}} |  {{fieldDelimiter}} | 
> |  {{drill.commentChar}} |  {{comment}}| 
> For each, the rules are:
> * If the table property is not set, then the plugin property is used.
> * If the table property is set, then the property value replaces the plugin 
> property value for that one specific table.
> * For the delimiter, if the property value is an empty string, then this is 
> the same as an unset property.
> * For the comment, if the property value is an empty string, then the comment 
> is set to the ASCII NULL, which will never match. This effectively turns off 
> the comment feature for this one table.
> * If the delimiter or comment value is longer than a single character, only 
> the first character is used.
> It is possible to use the table properties without specifying a "provided" 
> schema. Just omit any columns from the schema:
> {noformat}
> create schema () for table `dfs.data`.`example`
> PROPERTIES ('drill.headers'='false', 'drill.skipFirstLine'='false', 
> 'drill.delimiter'='|')
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7278) Refactor result set loader projection mechanism

2019-06-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858475#comment-16858475
 ] 

ASF GitHub Bot commented on DRILL-7278:
---

asfgit commented on pull request #1797: DRILL-7278: Refactor result set loader 
projection mechanism
URL: https://github.com/apache/drill/pull/1797
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor result set loader projection mechanism
> ---
>
> Key: DRILL-7278
> URL: https://issues.apache.org/jira/browse/DRILL-7278
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)