dramaticlly commented on code in PR #13881:
URL: https://github.com/apache/iceberg/pull/13881#discussion_r2291672854
##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java:
##########
@@ -963,6 +963,59 @@ public void testTableWithManyStatisticFiles() throws
IOException {
iterations * 2 + 1, iterations, iterations, iterations, iterations * 6
+ 1, result);
}
+ @Test
+ public void testStatisticsFileSourcePath() throws IOException {
+ String sourceTableLocation = newTableLocation();
+ Map<String, String> properties = Maps.newHashMap();
+ properties.put("format-version", "2");
+ String tableName = "v2tblwithstats";
+ Table sourceTable =
+ createMetastoreTable(sourceTableLocation, properties, "default",
tableName, 1);
+
+ // Compute table statistics to generate a .stats file
+ actions().computeTableStats(sourceTable).execute();
+
+ assertThat(sourceTable.statisticsFiles())
+ .as("Should include 1 statistics file after compute stats")
+ .hasSize(1);
+
+ String targetTableLocation = targetTableLocation();
+ RewriteTablePath.Result result =
+ actions()
+ .rewriteTablePath(sourceTable)
+ .rewriteLocationPrefix(sourceTableLocation, targetTableLocation)
+ .execute();
+
+ checkFileNum(3, 1, 1, 1, 7, result);
+
+ // Read the file list to verify statistics file paths
+ List<Tuple2<String, String>> filesToMove =
readPathPairList(result.fileListLocation());
+
+ // Find the statistics file entry in the file list
+ Tuple2<String, String> statsFilePathPair = null;
+ for (Tuple2<String, String> pathPair : filesToMove) {
+ if (pathPair._1().endsWith(".stats")) {
+ statsFilePathPair = pathPair;
+ break;
+ }
+ }
+
+ assertThat(statsFilePathPair).as("Should find statistics file in file
list").isNotNull();
+
+ // Verify the source path points to the actual source location, not staging
+ assertThat(statsFilePathPair._1())
+ .as("Statistics file source should point to source table location")
+ .startsWith(sourceTableLocation);
+ assertThat(statsFilePathPair._1())
+ .as("Statistics file source should NOT point to staging directory")
+ .doesNotContain("staging");
Review Comment:
nit: those can be combined
```java
assertThat(statsFilePathPair._1())
.as("Statistics file source should point to source table location,
not staging")
.startsWith(sourceTableLocation)
.doesNotContain("staging");
```
##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteTablePathsAction.java:
##########
@@ -963,6 +963,59 @@ public void testTableWithManyStatisticFiles() throws
IOException {
iterations * 2 + 1, iterations, iterations, iterations, iterations * 6
+ 1, result);
}
+ @Test
+ public void testStatisticsFileSourcePath() throws IOException {
+ String sourceTableLocation = newTableLocation();
+ Map<String, String> properties = Maps.newHashMap();
+ properties.put("format-version", "2");
+ String tableName = "v2tblwithstats";
+ Table sourceTable =
+ createMetastoreTable(sourceTableLocation, properties, "default",
tableName, 1);
+
+ // Compute table statistics to generate a .stats file
+ actions().computeTableStats(sourceTable).execute();
+
+ assertThat(sourceTable.statisticsFiles())
+ .as("Should include 1 statistics file after compute stats")
+ .hasSize(1);
+
+ String targetTableLocation = targetTableLocation();
+ RewriteTablePath.Result result =
+ actions()
+ .rewriteTablePath(sourceTable)
+ .rewriteLocationPrefix(sourceTableLocation, targetTableLocation)
+ .execute();
+
+ checkFileNum(3, 1, 1, 1, 7, result);
+
+ // Read the file list to verify statistics file paths
+ List<Tuple2<String, String>> filesToMove =
readPathPairList(result.fileListLocation());
+
+ // Find the statistics file entry in the file list
+ Tuple2<String, String> statsFilePathPair = null;
+ for (Tuple2<String, String> pathPair : filesToMove) {
+ if (pathPair._1().endsWith(".stats")) {
+ statsFilePathPair = pathPair;
+ break;
+ }
+ }
Review Comment:
nit: can also be replaced with stream
```java
Tuple2<String, String> statsFilePathPair = filesToMove.stream()
.filter(pathPair -> pathPair._1().endsWith(".stats"))
.findFirst()
.orElse(null);
```
##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java:
##########
@@ -404,10 +404,7 @@ private Set<Pair<String, String>> statsFileCopyPlan(
Preconditions.checkArgument(
before.fileSizeInBytes() == after.fileSizeInBytes(),
"Before and after path rewrite, statistic file size should be same");
- result.add(
- Pair.of(
- RewriteTablePathUtil.stagingPath(before.path(), sourcePrefix,
stagingDir),
- after.path()));
+ result.add(Pair.of(before.path(), after.path()));
Review Comment:
good catch I think we dont need to open and rewrite the content of the stats
file, so it only need to copied from source to target, instead of from staging
to target.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]