gszadovszky commented on PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431359196
> You're right. We might add an option to force rewriting the input files record by record so row groups are regenerated by the writer. Does that sound good? @gszadovszky It sounds great, @wgtmac, but I am fine implementing it separately or in this PR as you prefer. However, we still need to highlight somehow to the user that in the other cases the user should not expect performance improvements in case of merging several files into one. (Moreover it'll increase the footer size which might also generate additional issues.) What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org