Re: [PR] Docs: Add documentation for Rate limiting in Spark Structured Streaming [iceberg]

via GitHub Tue, 11 Feb 2025 14:45:07 -0800


singhpk234 commented on code in PR #12217:
URL: https://github.com/apache/iceberg/pull/12217#discussion_r1951710433



##########
docs/docs/spark-configuration.md:
##########
@@ -155,16 +155,18 @@ spark.read
     .table("catalog.db.table")
 ```
 
-| Spark option    | Default               | Description                        
                                                       |
-| --------------- | --------------------- | 
-----------------------------------------------------------------------------------------
 |
-| snapshot-id     | (latest)              | Snapshot ID of the table snapshot 
to read                                                 |
-| as-of-timestamp | (latest)              | A timestamp in milliseconds; the 
snapshot used will be the snapshot current at this time. |
-| split-size      | As per table property | Overrides this table's 
read.split.target-size and read.split.metadata-target-size         |
-| lookback        | As per table property | Overrides this table's 
read.split.planning-lookback                                       |
-| file-open-cost  | As per table property | Overrides this table's 
read.split.open-file-cost                                          |
-| vectorization-enabled  | As per table property | Overrides this table's 
read.parquet.vectorization.enabled                                          |
-| batch-size  | As per table property | Overrides this table's 
read.parquet.vectorization.batch-size                                          |
-| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; 
if before the oldest known ancestor snapshot, the oldest will be used |
+| Spark option                        | Default               | Description    
                                                                                
                                                                                
                                                                           |
+|-------------------------------------| --------------------- 
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| snapshot-id                         | (latest)              | Snapshot ID of 
the table snapshot to read                                                      
                                                                                
                                                                           |
+| as-of-timestamp                     | (latest)              | A timestamp in 
milliseconds; the snapshot used will be the snapshot current at this time.      
                                                                                
                                                                           |
+| split-size                          | As per table property | Overrides this 
table's read.split.target-size and read.split.metadata-target-size              
                                                                                
                                                                           |
+| lookback                            | As per table property | Overrides this 
table's read.split.planning-lookback                                            
                                                                                
                                                                           |
+| file-open-cost                      | As per table property | Overrides this 
table's read.split.open-file-cost                                               
                                                                                
                                                                           |
+| vectorization-enabled               | As per table property | Overrides this 
table's read.parquet.vectorization.enabled                                      
                                                                                
                                                                           |
+| batch-size                          | As per table property | Overrides this 
table's read.parquet.vectorization.batch-size                                   
                                                                                
                                                                           |
+| stream-from-timestamp               | (none) | A timestamp in milliseconds 
to stream from; if before the oldest known ancestor snapshot, the oldest will 
be used                                                                         
                                                                |
+| streaming-max-files-per-micro-batch | INT_MAX | Maximum number of files per 
microbatch                                                                      
                                                                                
                                                              | 
+| streaming-max-rows-per-micro-batch  | INT_MAX | Maximum number of rows per 
microbatch. Note : smallest granuality supported is 1 file, please make sure 
number of records per file is always greater than the number of records in the 
largest file possible, otherwise it can lead to stream being stuck | 

Review Comment:
   sounds fair, changed the wording 
   
   >Do we reformat all the rows to align the column widths (which is only 
partially done here)? Or should we minimize churn and not reformat existing 
rows 
   
   Honestly, I am not sure of any practice, IDE changed the indentation for me, 
i fixed it back.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Docs: Add documentation for Rate limiting in Spark Structured Streaming [iceberg]

Reply via email to