wgtmac commented on PR #1026:
URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431185993

   > > @wgtmac, by supporting multiple files to rewrite them into one we will 
end up with the same number of row-groups, right? Therefore, this tool is not 
ment to be used to solve the "small files problem". I am highlighting this 
because we've had issues with users misunderstanding the purpose of features 
like this. Maybe, we should add some notes about it to the help of parquet-cli.
   > 
   > You're right. We might add an option to force rewriting the input files 
record by record so row groups are regenerated by the writer. Does that sound 
good? @gszadovszky
   
   Something like 
https://github.com/apache/parquet-mr/blob/master/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ConvertCommand.java
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to