GitHub user hveiga closed a discussion: QUESTION: Keep columns when using PARTITIONED BY with SQL
Hi folks, I am using Datafusion to partition some data stored in parquet files into a different set of parquet files. I would like those newly created files to contain the columns I am partitioning by, however currently the column gets removed as it becomes part of the file directory structure. Something like: ``` COPY (SELECT col1, col2, col3, col4 FROM my_external_table) TO '/output' PARTITIONED BY (col1) OPTIONS (format parquet); ... /output/col1=val1/some_random_file_name.parquet /output/col1=val2/some_random_file_name.parquet /output/col1=val3/some_random_file_name.parquet ... ``` Is there a way in SQL to keep `col1` in the output parquet files? If not, would it make sense to add this as part of a new option when calling `COPY`? I have been looking in the documentation and open/closed issues but I could not find a way to do this, but if there is some information about it, a link would be greatly appreciated. Thanks! GitHub link: https://github.com/apache/datafusion/discussions/10962 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
