Re: [PR] Add detailed debug and warn logging to SparkMicroBatchStream [iceberg]

via GitHub Mon, 21 Apr 2025 05:53:14 -0700


bk-mz commented on PR #12856:
URL: https://github.com/apache/iceberg/pull/12856#issuecomment-2818352166


   These changes logs examples from planFiles:
   
   ```txt
   2025-04-21 12:43:04.411 INFO  [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Initializing SparkMicroBatchStream 
with params: branch=null, caseSensitive=false, splitSize=134217728, 
splitLookback=10, splitOpenFileCost=4194304, fromTimestamp=1745239382655, 
maxFilesPerMicroBatch=2147483647, maxRecordsPerMicroBatch=50000000
   2025-04-21 12:43:04.412 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - InitialOffsetStore created with 
location 
s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0
   2025-04-21 12:43:04.428 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Found existing offset file at 
s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0,
 reading
   2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Read offset Streaming Offset[-1: 
position (-1) scan_all_files (false)] from 
s3://com.twilio.mdp.iceberg.tables.mdr.finalized.testcell/checkpoints7/sources/0/offsets/0
   2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Initial offset set to Streaming 
Offset[-1: position (-1) scan_all_files (false)]
   2025-04-21 12:43:04.444 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Skip delete snapshots=true, skip 
overwrite snapshots=true
   2025-04-21 12:43:04.806 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Deserialized offset from JSON 
{"version":1,"snapshot_id":1823134505898519413,"position":8,"scan_all_files":false}:
 Streaming Offset[1823134505898519413: position (8) scan_all_files (false)]
   2025-04-21 12:43:04.806 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Deserialized offset from JSON 
{"version":1,"snapshot_id":3821473473156059401,"position":24,"scan_all_files":false}:
 Streaming Offset[3821473473156059401: position (24) scan_all_files (false)]
   2025-04-21 12:43:04.867 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.aws.glue.GlueCatalog [stream execution thread for [id = 
d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Using optimistic locking for Glue Data 
Catalog tables.
   2025-04-21 12:43:05.152 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot 
[id=1823134505898519413, dateTime=2025-04-19T20:53:29.457Z, ageHours=39, 
startFileIndex=8, endFileIndex=8] generated 0 file scan tasks
   2025-04-21 12:43:05.265 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot 
[id=267320482196197876, dateTime=2025-04-19T20:54:28.367Z, ageHours=39, 
startFileIndex=0, endFileIndex=8] generated 8 file scan tasks
   2025-04-21 12:43:05.266 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Skipping processing for snapshot 
id=7905888728843236371 operation=replace
   2025-04-21 12:43:05.352 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot 
[id=7421363600347898867, dateTime=2025-04-19T20:55:33.095Z, ageHours=39, 
startFileIndex=0, endFileIndex=8] generated 8 file scan tasks
   2025-04-21 12:43:05.471 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot 
[id=1672029328327466770, dateTime=2025-04-19T20:56:31.462Z, ageHours=39, 
startFileIndex=0, endFileIndex=9] generated 9 file scan tasks
   2025-04-21 12:43:05.561 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot 
[id=7190315158754434941, dateTime=2025-04-19T20:57:26.865Z, ageHours=39, 
startFileIndex=0, endFileIndex=8] generated 8 file scan tasks
   ...
   2025-04-21 12:43:11.720 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Processing snapshot 
[id=3821473473156059401, dateTime=2025-04-19T22:01:46.828Z, ageHours=38, 
startFileIndex=0, endFileIndex=24] generated 24 file scan tasks
   2025-04-21 12:43:11.721 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - planFiles returned 764 file scan 
tasks. total_files=764, total_size_in_bytes=178383916431. Time taken to eval 
stats 0 ms
   2025-04-21 12:43:11.734 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Split into 1484 combined scan tasks
   2025-04-21 12:43:11.736 DEBUG [ip-0-0-0-0.ec2.internal] - 
org.apache.iceberg.spark.source.SparkMicroBatchStream [stream execution thread 
for [id = d053491a-73d0-4d8b-b364-ecef987788b9, runId = 
1a1a6d0c-045b-443e-94e3-7c39977f067b]] - Created 1484 SparkInputPartitions
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add detailed debug and warn logging to SparkMicroBatchStream [iceberg]

Reply via email to