[ 
https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689138#comment-17689138
 ] 

ASF GitHub Bot commented on PARQUET-2228:
-----------------------------------------

wgtmac commented on PR #1026:
URL: https://github.com/apache/parquet-mr/pull/1026#issuecomment-1431423625

   > > You're right. We might add an option to force rewriting the input files 
record by record so row groups are regenerated by the writer. Does that sound 
good? @gszadovszky
   > 
   > It sounds great, @wgtmac, but I am fine implementing it separately or in 
this PR as you prefer. However, we still need to highlight somehow to the user 
that in the other cases the user should not expect performance improvements in 
case of merging several files into one. (Moreover it'll increase the footer 
size which might also generate additional issues.) What do you think?
   
   I agree. Let me add some comments to explain the issue at the moment.




> ParquetRewriter supports more than one input file
> -------------------------------------------------
>
>                 Key: PARQUET-2228
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2228
>             Project: Parquet
>          Issue Type: Sub-task
>          Components: parquet-mr
>            Reporter: Gang Wu
>            Assignee: Gang Wu
>            Priority: Major
>
> ParquetRewriter currently supports only one input file. The scope of this 
> task is to support multiple input files and the rewriter merges them into a 
> single one w/o some rewrite options specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to