Re: [PR] Add detailed debug and warn logging to SparkMicroBatchStream [iceberg]

via GitHub Mon, 21 Apr 2025 10:14:47 -0700


bk-mz commented on code in PR #12856:
URL: https://github.com/apache/iceberg/pull/12856#discussion_r2052733653



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java:
##########
@@ -105,28 +108,54 @@ public class SparkMicroBatchStream implements 
MicroBatchStream, SupportsAdmissio
     this.maxFilesPerMicroBatch = readConf.maxFilesPerMicroBatch();
     this.maxRecordsPerMicroBatch = readConf.maxRecordsPerMicroBatch();
 
+    LOG.info(
+        "Initializing SparkMicroBatchStream with params: branch={}, 
caseSensitive={}, "
+            + "splitSize={}, splitLookback={}, splitOpenFileCost={}, 
fromTimestamp={}, "
+            + "maxFilesPerMicroBatch={}, maxRecordsPerMicroBatch={}",

Review Comment:
   Unfortunately, I couldn't find anything related to that from spark ui.
   
   Microbatch Scan:
   
   <img width="1058" alt="image" 
src="https://github.com/user-attachments/assets/07ecb82c-5f76-43f9-afcd-47d717f3c0b7";
 />
   
   ```text
   == Parsed Logical Plan ==
   WriteToMicroBatchDataSourceV1 ForeachBatchSink, 
d053491a-73d0-4d8b-b364-ecef987788b9, 
[checkpointLocation=s3://my-table/checkpoints7, fanout-enabled=true], Append, 26
   +- StreamingDataSourceV2Relation [key#0, value#1, topic#2, partition#3, 
offset#4L, timestamp#5, timestampType#6, headers#7], 
IcebergScan(table=my_catalog.my_database.my_table, branch=null, type=struct<1: 
key: optional binary, 2: value: optional binary, 3: topic: optional string, 4: 
partition: optional int, 5: offset: optional long, 6: timestamp: optional 
timestamptz, 7: timestampType: optional int, 8: headers: optional 
list<struct<10: key: optional string, 11: value: optional binary>>>, 
filters=[], runtimeFilters=[], caseSensitive=false), 
org.apache.iceberg.spark.source.SparkMicroBatchStream@4b3c3e87, Streaming 
Offset[7176483676499627176: position (0) scan_all_files (false)], Streaming 
Offset[3260915119082769976: position (3) scan_all_files (false)]
   
   == Analyzed Logical Plan ==
   key: binary, value: binary, topic: string, partition: int, offset: bigint, 
timestamp: timestamp, timestampType: int, headers: 
array<struct<key:string,value:binary>>
   WriteToMicroBatchDataSourceV1 ForeachBatchSink, 
d053491a-73d0-4d8b-b364-ecef987788b9, 
[checkpointLocation=s3://my-table/checkpoints7, fanout-enabled=true], Append, 26
   +- StreamingDataSourceV2Relation [key#0, value#1, topic#2, partition#3, 
offset#4L, timestamp#5, timestampType#6, headers#7], 
IcebergScan(table=my_catalog.my_database.my_table, branch=null, type=struct<1: 
key: optional binary, 2: value: optional binary, 3: topic: optional string, 4: 
partition: optional int, 5: offset: optional long, 6: timestamp: optional 
timestamptz, 7: timestampType: optional int, 8: headers: optional 
list<struct<10: key: optional string, 11: value: optional binary>>>, 
filters=[], runtimeFilters=[], caseSensitive=false), 
org.apache.iceberg.spark.source.SparkMicroBatchStream@4b3c3e87, Streaming 
Offset[7176483676499627176: position (0) scan_all_files (false)], Streaming 
Offset[3260915119082769976: position (3) scan_all_files (false)]
   
   == Optimized Logical Plan ==
   StreamingDataSourceV2Relation [key#0, value#1, topic#2, partition#3, 
offset#4L, timestamp#5, timestampType#6, headers#7], 
IcebergScan(table=my_catalog.my_database.my_table, branch=null, type=struct<1: 
key: optional binary, 2: value: optional binary, 3: topic: optional string, 4: 
partition: optional int, 5: offset: optional long, 6: timestamp: optional 
timestamptz, 7: timestampType: optional int, 8: headers: optional 
list<struct<10: key: optional string, 11: value: optional binary>>>, 
filters=[], runtimeFilters=[], caseSensitive=false), 
org.apache.iceberg.spark.source.SparkMicroBatchStream@4b3c3e87, Streaming 
Offset[7176483676499627176: position (0) scan_all_files (false)], Streaming 
Offset[3260915119082769976: position (3) scan_all_files (false)]
   
   == Physical Plan ==
   *(1) Project [key#0, value#1, topic#2, partition#3, offset#4L, timestamp#5, 
timestampType#6, headers#7]
   +- MicroBatchScan[key#0, value#1, topic#2, partition#3, offset#4L, 
timestamp#5, timestampType#6, headers#7] my_catalog.my_database.my_table 
(branch=null) [filters=, groupedBy=]
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add detailed debug and warn logging to SparkMicroBatchStream [iceberg]

Reply via email to