clintropolis commented on issue #7169: Parquet Hadoop parser fails to parse 
columns specified in transformSpec only.
URL: 
https://github.com/apache/incubator-druid/issues/7169#issuecomment-474155232
 
 
   I believe this is an artifact of how the "contrib" Parquet extension 
functions, [that it only reads columns from the file that are specified in the 
schema.](https://github.com/apache/incubator-druid/blob/cf15aac71f9f0f2ba41a2304d6b72719e6f7ec69/extensions-contrib/parquet-extensions/src/main/java/org/apache/parquet/avro/DruidParquetReadSupport.java#L57)
   
   Fwiw, in the upcoming Druid 0.14 the Parquet extension has been reworked and 
moved to a "core" extension (see #6360). In the new version of the extension, 
there are added `parquet` and `parquet-avro` parser types which support a 
`flattenSpec`, and _also_ allow a `transformSpec` to refer to columns which are 
not otherwise used as dimensions or metrics. Note that using the `timeAndDims` 
parser type will still run into this same issue, as the optimization to use a 
partial schema to read the file is still used in this case. I tested to confirm 
this behavior, both that using `parquet` and `parquet-avro` with 'auto 
discovery' enabled allow a transform expression to refer to a column that isn't 
a dimension or metric, and that using `timeAndDims` will experience the current 
behavior that you are running into.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to