clintropolis commented on issue #7169: Parquet Hadoop parser fails to parse columns specified in transformSpec only. URL: https://github.com/apache/incubator-druid/issues/7169#issuecomment-474155232 I believe this is an artifact of how the "contrib" Parquet extension functions, [that it only reads columns from the file that are specified in the schema.](https://github.com/apache/incubator-druid/blob/cf15aac71f9f0f2ba41a2304d6b72719e6f7ec69/extensions-contrib/parquet-extensions/src/main/java/org/apache/parquet/avro/DruidParquetReadSupport.java#L57) Fwiw, in the upcoming Druid 0.14 the Parquet extension has been reworked and moved to a "core" extension (see #6360). In the new version of the extension, there are added `parquet` and `parquet-avro` parser types which support a `flattenSpec`, and _also_ allow a `transformSpec` to refer to columns which are not otherwise used as dimensions or metrics. Note that using the `timeAndDims` parser type will still run into this same issue, as the optimization to use a partial schema to read the file is still used in this case. I tested to confirm this behavior, both that using `parquet` and `parquet-avro` with 'auto discovery' enabled allow a transform expression to refer to a column that isn't a dimension or metric, and that using `timeAndDims` will experience the current behavior that you are running into.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org