[ https://issues.apache.org/jira/browse/PARQUET-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688030#comment-17688030 ]
ASF GitHub Bot commented on PARQUET-2228: ----------------------------------------- shangxinli commented on code in PR #1026: URL: https://github.com/apache/parquet-mr/pull/1026#discussion_r1104769279 ########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/rewrite/ParquetRewriter.java: ########## @@ -183,12 +186,61 @@ public ParquetRewriter(TransParquetFileReader reader, } } + // Open all input files to validate their schemas are compatible to merge + private void openInputFiles(List<Path> inputFiles, Configuration conf) { + Preconditions.checkArgument(inputFiles != null && !inputFiles.isEmpty(), "No input files"); + + for (Path inputFile : inputFiles) { Review Comment: Do we want to set a max size? > ParquetRewriter supports more than one input file > ------------------------------------------------- > > Key: PARQUET-2228 > URL: https://issues.apache.org/jira/browse/PARQUET-2228 > Project: Parquet > Issue Type: Sub-task > Components: parquet-mr > Reporter: Gang Wu > Assignee: Gang Wu > Priority: Major > > ParquetRewriter currently supports only one input file. The scope of this > task is to support multiple input files and the rewriter merges them into a > single one w/o some rewrite options specified. -- This message was sent by Atlassian Jira (v8.20.10#820010)