wgtmac commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431185993
> > @wgtmac, by supporting multiple files to rewrite them into one we will end up with the same number of row-groups, right? Therefore, this tool is not ment to be used to solve the "small files problem". I am highlighting this because we've had issues with users misunderstanding the purpose of features like this. Maybe, we should add some notes about it to the help of parquet-cli. > > You're right. We might add an option to force rewriting the input files record by record so row groups are regenerated by the writer. Does that sound good? @gszadovszky Something like https://github.com/apache/parquet-mr/blob/master/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ConvertCommand.java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org