andrei-ionescu commented on pull request #1392: URL: https://github.com/apache/arrow-datafusion/pull/1392#issuecomment-985775205
@houqp After more debugging and fixing different things I found that the physical plan lacks the nested fields support. I got into this error: ``` Error: ArrowError(SchemaError("Unexpected batch schema from file, expected 36 cols but got 6")) ``` And this error is happening in these lines of code: [physical_plan/file_format/mod.rs#L223-L229](https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/file_format/mod.rs#L223-L229). The chunk of data that has been read has only 6 columns while the expected number of columns is 36. The root cause seems to be the way parquet files are read vs how it gets projected. It reads one top nested column at a time, while it tries to project that chunk of data over the full schema. For example, in the case of the `nested_struct.rust.parquet` it reads the first column with 6 leaves and then tries to project that over all 36 top columns of that parquet file. This is root cause of the error above. It seems that DataFusion lacks the support for nested fields, at least when using the parquet data source. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org