szehon-ho commented on issue #4090:
URL: https://github.com/apache/iceberg/issues/4090#issuecomment-1039508890


   I have an idea for this, it seems like from your log, the expected exception 
is still a validationException from the given line, just not from the 
distributed Spark delta-writer job as is expected.  Ref:
   
   ```
   Expected: an instance of org.apache.spark.SparkException
        but: <java.lang.IllegalArgumentException: Failed to cleanly delete data 
files matching: ref(name="id") == 1> is a java.lang.IllegalArgumentException
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   ```
   
   That's from here (the non-distributed version):
   
https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L271
   
   
   Although most of the time it will hit the distributed delta delete, there is 
a chance here it will fall to this non-distributed metadata version.  This is 
if the optimizer (OptimizeMetadataOnlyDeleteFromTable in particular) believes 
that the delete can be handled with just metadata.  The appendFuture tries to 
avert this by constructing a file of two elements (1,2) and each time the 
deleteFuture hits only (1), but if it has not run yet then the table is empty 
the deleteFuture decides it proceed with metadata-only delete.  Most of those 
times that delete goes through fine as there is nothing to do and the test gets 
another try to get the right exception, but a very smaller percentage of time 
it may find the appendFuture has landed right before the commit and then fails 
in the above code path.
   
   One fix is to try to add expected checks this potential failure.  Another 
fix is to make sure that there is some pre-existing data to force it always to 
use the distributed delta delete, Ill try to make a pr for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to