bk-mz commented on PR #12856:
URL: https://github.com/apache/iceberg/pull/12856#issuecomment-2818352166
These changes logs examples from planFiles:
```txt
2025-04-21 12:43:04.411 INFO [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Initializing SparkMicroBatchStream
with params: branch=null, caseSensitive=false, splitSize=134217728,
splitLookback=10, splitOpenFileCost=4194304, fromTimestamp=1745239382655,
maxFilesPerMicroBatch=2147483647, maxRecordsPerMicroBatch=50000000
2025-04-21 12:43:04.412 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - InitialOffsetStore created with
location
s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0
2025-04-21 12:43:04.428 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Found existing offset file at
s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0,
reading
2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Read offset Streaming Offset[-1:
position (-1) scan_all_files (false)] from
s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0
2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Initial offset set to Streaming
Offset[-1: position (-1) scan_all_files (false)]
2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Skip delete snapshots=true, skip
overwrite snapshots=true
2025-04-21 12:43:04.806 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Deserialized offset from JSON
{"version":1,"snapshot_id":1823134505898519413,"position":8,"scan_all_files":false}:
Streaming Offset[1823134505898519413: position (8) scan_all_files (false)]
2025-04-21 12:43:04.806 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Deserialized offset from JSON
{"version":1,"snapshot_id":3821473473156059401,"position":24,"scan_all_files":false}:
Streaming Offset[3821473473156059401: position (24) scan_all_files (false)]
2025-04-21 12:43:04.867 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.aws.glue.GlueCatalog [stream execution thread for [id =
d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Using optimistic locking for Glue Data
Catalog tables.
2025-04-21 12:43:05.152 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot
[id=1823134505898519413, dateTime=2025-04-19T20:53:29.457Z, ageHours=39,
startFileIndex=8, endFileIndex=8] generated 0 file scan tasks
2025-04-21 12:43:05.265 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot
[id=267320482196197876, dateTime=2025-04-19T20:54:28.367Z, ageHours=39,
startFileIndex=0, endFileIndex=8] generated 8 file scan tasks
2025-04-21 12:43:05.266 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Skipping processing for snapshot
id=7905888728843236371 operation=replace
2025-04-21 12:43:05.352 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot
[id=7421363600347898867, dateTime=2025-04-19T20:55:33.095Z, ageHours=39,
startFileIndex=0, endFileIndex=8] generated 8 file scan tasks
2025-04-21 12:43:05.471 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot
[id=1672029328327466770, dateTime=2025-04-19T20:56:31.462Z, ageHours=39,
startFileIndex=0, endFileIndex=9] generated 9 file scan tasks
2025-04-21 12:43:05.561 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot
[id=7190315158754434941, dateTime=2025-04-19T20:57:26.865Z, ageHours=39,
startFileIndex=0, endFileIndex=8] generated 8 file scan tasks
...
2025-04-21 12:43:11.720 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot
[id=3821473473156059401, dateTime=2025-04-19T22:01:46.828Z, ageHours=38,
startFileIndex=0, endFileIndex=24] generated 24 file scan tasks
2025-04-21 12:43:11.721 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - planFiles returned 764 file scan
tasks. total_files=764, total_size_in_bytes=178383916431. Time taken to eval
stats 0 ms
2025-04-21 12:43:11.734 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Split into 1484 combined scan tasks
2025-04-21 12:43:11.736 DEBUG [ip-0-0-0-0.ec2.internal] -
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId =
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Created 1484 SparkInputPartitions
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]