[jira] [Updated] (DRILL-7629) Parquet MAP field support missing in recent stable release (?)

Idan Sheinberg (Jira) Fri, 06 Mar 2020 06:23:45 -0800


     [ 
https://issues.apache.org/jira/browse/DRILL-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Idan Sheinberg updated DRILL-7629:
----------------------------------
    Description: 
Encountered this issue when lowering {{planner.slice_target}}  (to say, 100) in 
order to make drill generate more fragments. Queries then started crashing with 
the following error:
{code:java}
Caused by: java.io.IOException: Unable to parse column [`currencyPair` 
STRUCT<`bfix` MAP<`map` STRUCT<`key` ARRAY<VARCHAR>, `value` ARRAY<DOUBLE>>>> 
not null]: Line [1], position [29], offending symbol 
[@4,29:31='MAP',<26>,1:29]: no viable alternative at input '`bfix`MAP'
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:80)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:61)
        at 
org.apache.drill.exec.record.metadata.AbstractColumnMetadata.createColumnMetadata(AbstractColumnMetadata.java:75)
        at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
com.fasterxml.jackson.databind.introspect.AnnotatedMethod.call(AnnotatedMethod.java:109)
        at 
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283)
        ... 72 common frames omitted
Caused by: 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParsingException: 
Line [1], position [29], offending symbol [@4,29:31='MAP',<26>,1:29]: no viable 
alternative at input '`bfix`MAP'
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser$ErrorListener.syntaxError(SchemaExprParser.java:120)
        at 
org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
        at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
        at 
org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:310)
        at 
org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:136)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:403)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column_def(SchemaParser.java:317)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.columns(SchemaParser.java:262)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_type(SchemaParser.java:1395)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_column(SchemaParser.java:579)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:383)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:78){code}
All files in the queried directory are parquet files that share the same 
schema, just to be clear.

Looking into the stack-trace, this seems like an {{antlr}} error. Assuming 
{{SchemaParser}} generated from 
[this|https://github.com/apache/drill/blob/drill-1.17.0/exec/vector/src/main/antlr4/org/apache/drill/exec/record/metadata/schema/parser/SchemaParser.g4]
 {{g4}} file you can see {{MAP}} support is lacking

Looking around a bit in Jira/Github, I noticed that this issue had already been 
fixed in DRILL-7361. I can also confirm that upgrading to the last SNAPSHOT 
version (built from source today) resolved the issue.

A few questions:

 * Did you intentionally drop parquet MAP field support in Drill for 1.17 as 
part of the Antlr lexer refactoring, or was it never present to begin with (I 
see 1.16 is not using antlr parsing for parquet schema)?

 * Can we safely assume the (newly added) MAP field support will persist from 
here on out, or at as part of the 1.18 release?

 * Probably not the best place to ask, but as for 1.18, is there a 
timeline/plan for that already? or is there a possibility for a hot-fix version 
release? would really be happy to work on a stable version rather than a 
self-built one.

I'd be able to provide parquet files and guidance towards re-creating this 
issue in 1.17, should the need arise.

Thanks in advance!

  was:
Encountered this issue when lowering {{planner.slice_target}}  (to say, 100) in 
order to make drill generate more fragments. Queries then started crashing with 
the following error:
{code:java}
Caused by: java.io.IOException: Unable to parse column [`currencyPair` 
STRUCT<`bfix` MAP<`map` STRUCT<`key` ARRAY<VARCHAR>, `value` ARRAY<DOUBLE>>>> 
not null]: Line [1], position [29], offending symbol 
[@4,29:31='MAP',<26>,1:29]: no viable alternative at input '`bfix`MAP'
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:80)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:61)
        at 
org.apache.drill.exec.record.metadata.AbstractColumnMetadata.createColumnMetadata(AbstractColumnMetadata.java:75)
        at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
com.fasterxml.jackson.databind.introspect.AnnotatedMethod.call(AnnotatedMethod.java:109)
        at 
com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283)
        ... 72 common frames omitted
Caused by: 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParsingException: 
Line [1], position [29], offending symbol [@4,29:31='MAP',<26>,1:29]: no viable 
alternative at input '`bfix`MAP'
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser$ErrorListener.syntaxError(SchemaExprParser.java:120)
        at 
org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
        at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
        at 
org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:310)
        at 
org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:136)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:403)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column_def(SchemaParser.java:317)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.columns(SchemaParser.java:262)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_type(SchemaParser.java:1395)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_column(SchemaParser.java:579)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:383)
        at 
org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:78){code}
All files in the queried directory are parquet files that share the same 
schema, just to be clear.

Looking into the stack-trace, this seems like an {{antlr}} error. Assuming 
{{SchemaParser}} generated from 
[thishttps://github.com/apache/drill/blob/drill-1.17.0/exec/vector/src/main/antlr4/org/apache/drill/exec/record/metadata/schema/parser/SchemaParser.g4]
 {{g4}} file you can see {{MAP}} support is lacking

Looking around a bit in Jira/Github, I noticed that this issue had already been 
fixed in DRILL-7361. I can also confirm that upgrading to the last SNAPSHOT 
version (built from source today) resolved the issue.

A few questions:

 * Did you intentionally drop parquet MAP field support in Drill for 1.17 as 
part of the Antlr lexer refactoring, or was it never present to begin with (I 
see 1.16 is not using antlr parsing for parquet schema)?

 * Can we safely assume the (newly added) MAP field support will persist from 
here on out, or at as part of the 1.18 release?

 * Probably not the best place to ask, but as for 1.18, is there a 
timeline/plan for that already? or is there a possibility for a hot-fix version 
release? would really be happy to work on a stable version rather than a 
self-built one.

I'd be able to provide parquet files and guidance towards re-creating this 
issue in 1.17, should the need arise.

Thanks in advance!


> Parquet MAP field support missing in recent stable release (?)
> --------------------------------------------------------------
>
>                 Key: DRILL-7629
>                 URL: https://issues.apache.org/jira/browse/DRILL-7629
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.17.0
>         Environment: Drill 1.17
> Zulu OpenJDK 8 build 1.8.0_232
> Debian Buster 10.3
> Kernel version 4.19.98-1
> EC c5.2xlarge instances (8 Cores, 16GB RAM)
>            Reporter: Idan Sheinberg
>            Priority: Major
>
> Encountered this issue when lowering {{planner.slice_target}}  (to say, 100) 
> in order to make drill generate more fragments. Queries then started crashing 
> with the following error:
> {code:java}
> Caused by: java.io.IOException: Unable to parse column [`currencyPair` 
> STRUCT<`bfix` MAP<`map` STRUCT<`key` ARRAY<VARCHAR>, `value` ARRAY<DOUBLE>>>> 
> not null]: Line [1], position [29], offending symbol 
> [@4,29:31='MAP',<26>,1:29]: no viable alternative at input '`bfix`MAP'
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:80)
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:61)
>       at 
> org.apache.drill.exec.record.metadata.AbstractColumnMetadata.createColumnMetadata(AbstractColumnMetadata.java:75)
>       at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> com.fasterxml.jackson.databind.introspect.AnnotatedMethod.call(AnnotatedMethod.java:109)
>       at 
> com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283)
>       ... 72 common frames omitted
> Caused by: 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParsingException: 
> Line [1], position [29], offending symbol [@4,29:31='MAP',<26>,1:29]: no 
> viable alternative at input '`bfix`MAP'
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser$ErrorListener.syntaxError(SchemaExprParser.java:120)
>       at 
> org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
>       at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
>       at 
> org.antlr.v4.runtime.DefaultErrorStrategy.reportNoViableAlternative(DefaultErrorStrategy.java:310)
>       at 
> org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:136)
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:403)
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column_def(SchemaParser.java:317)
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.columns(SchemaParser.java:262)
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_type(SchemaParser.java:1395)
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.struct_column(SchemaParser.java:579)
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaParser.column(SchemaParser.java:383)
>       at 
> org.apache.drill.exec.record.metadata.schema.parser.SchemaExprParser.parseColumn(SchemaExprParser.java:78){code}
> All files in the queried directory are parquet files that share the same 
> schema, just to be clear.
> Looking into the stack-trace, this seems like an {{antlr}} error. Assuming 
> {{SchemaParser}} generated from 
> [this|https://github.com/apache/drill/blob/drill-1.17.0/exec/vector/src/main/antlr4/org/apache/drill/exec/record/metadata/schema/parser/SchemaParser.g4]
>  {{g4}} file you can see {{MAP}} support is lacking
> Looking around a bit in Jira/Github, I noticed that this issue had already 
> been fixed in DRILL-7361. I can also confirm that upgrading to the last 
> SNAPSHOT version (built from source today) resolved the issue.
> A few questions:
>  * Did you intentionally drop parquet MAP field support in Drill for 1.17 as 
> part of the Antlr lexer refactoring, or was it never present to begin with (I 
> see 1.16 is not using antlr parsing for parquet schema)?
>  * Can we safely assume the (newly added) MAP field support will persist from 
> here on out, or at as part of the 1.18 release?
>  * Probably not the best place to ask, but as for 1.18, is there a 
> timeline/plan for that already? or is there a possibility for a hot-fix 
> version release? would really be happy to work on a stable version rather 
> than a self-built one.
> I'd be able to provide parquet files and guidance towards re-creating this 
> issue in 1.17, should the need arise.
> Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (DRILL-7629) Parquet MAP field support missing in recent stable release (?)

Reply via email to