hudi-agent commented on code in PR #18869:
URL: https://github.com/apache/hudi/pull/18869#discussion_r3315149627


##########
website/docs/indexes.md:
##########
@@ -211,13 +211,38 @@ for more details. All these, support the index types 
mentioned [above](#addition
 
 #### Flink based configs
 
-For Flink DataStream and Flink SQL, Bucket index and Flink state index are 
supported.
+For Flink DataStream and Flink SQL, Bucket index, Flink state index, and 
record-level index are supported.
 Following are the basic configs that control the indexing behavior. Please 
refer [the Flink 
configurations](configurations.md#Flink-Options-advanced-configs) for advanced 
configs.
 
-| Config Name                | Default                | Description            
                                                                                
                                                                                
                                                 |
-|----------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| index.type                 | FLINK_STATE (Optional) | Index type of Flink 
write job, default is using state backed index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li></ul><br />  `Config Param: INDEX_TYPE`  
                                                             |
-| hoodie.index.bucket.engine | SIMPLE (Optional)      | 
org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: Determines the type of 
bucketing or hashing to use when `hoodie.index.type` is set to `BUCKET`.    
Possible Values: <br /> <ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul> |
+| Config Name                                                 | Default        
        | Description                                                           
                                                                                
                                                                                
                                                            |
+|-------------------------------------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| index.type                                                  | FLINK_STATE 
(Optional) | Index type of Flink write job, default is using state backed 
index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li><li>GLOBAL_RECORD_LEVEL_INDEX</li><li>RECORD_LEVEL_INDEX</li></ul><br
 />  `Config Param: INDEX_TYPE`                                                 
           |
+| hoodie.index.bucket.engine                                  | SIMPLE 
(Optional)      | org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: 
Determines the type of bucketing or hashing to use when `hoodie.index.type` is 
set to `BUCKET`.    Possible Values: <br /> 
<ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul>                             
                              |
+| metadata.enabled                                            | true 
(Optional)        | Enables the metadata table. Required for Flink record-level 
index lookups.                                                                  
                                                                                
                                                                       |

Review Comment:
   🤖 The default for `index.bootstrap.rocksdb.path` is listed as `(Optional)` 
with no value, but in `FlinkOptions.INDEX_BOOTSTRAP_ROCKSDB_PATH` it has a real 
default of `FileIOUtils.getDefaultSpillableMapBasePath()` (typically the system 
temp directory). Could you update the Default column to reflect this, since 
users relying on the documented "no default" may be surprised that a path is 
implicitly chosen for them?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
website/docs/indexes.md:
##########
@@ -211,13 +211,38 @@ for more details. All these, support the index types 
mentioned [above](#addition
 
 #### Flink based configs
 
-For Flink DataStream and Flink SQL, Bucket index and Flink state index are 
supported.
+For Flink DataStream and Flink SQL, Bucket index, Flink state index, and 
record-level index are supported.
 Following are the basic configs that control the indexing behavior. Please 
refer [the Flink 
configurations](configurations.md#Flink-Options-advanced-configs) for advanced 
configs.
 
-| Config Name                | Default                | Description            
                                                                                
                                                                                
                                                 |
-|----------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| index.type                 | FLINK_STATE (Optional) | Index type of Flink 
write job, default is using state backed index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li></ul><br />  `Config Param: INDEX_TYPE`  
                                                             |
-| hoodie.index.bucket.engine | SIMPLE (Optional)      | 
org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: Determines the type of 
bucketing or hashing to use when `hoodie.index.type` is set to `BUCKET`.    
Possible Values: <br /> <ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul> |
+| Config Name                                                 | Default        
        | Description                                                           
                                                                                
                                                                                
                                                            |
+|-------------------------------------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| index.type                                                  | FLINK_STATE 
(Optional) | Index type of Flink write job, default is using state backed 
index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li><li>GLOBAL_RECORD_LEVEL_INDEX</li><li>RECORD_LEVEL_INDEX</li></ul><br
 />  `Config Param: INDEX_TYPE`                                                 
           |
+| hoodie.index.bucket.engine                                  | SIMPLE 
(Optional)      | org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: 
Determines the type of bucketing or hashing to use when `hoodie.index.type` is 
set to `BUCKET`.    Possible Values: <br /> 
<ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul>                             
                              |

Review Comment:
   🤖 The default column shows `false (Optional)` but the description explains 
Flink may override this to `true` for `GLOBAL_RECORD_LEVEL_INDEX` at runtime. 
This dual default is easy to miss — it might help to make the override behavior 
more prominent (e.g., "Static default: `false`; runtime default for 
`GLOBAL_RECORD_LEVEL_INDEX`: `true` when unset"), so users don't assume the 
static default applies in all cases.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
website/docs/indexes.md:
##########
@@ -211,13 +211,38 @@ for more details. All these, support the index types 
mentioned [above](#addition
 
 #### Flink based configs
 
-For Flink DataStream and Flink SQL, Bucket index and Flink state index are 
supported.
+For Flink DataStream and Flink SQL, Bucket index, Flink state index, and 
record-level index are supported.
 Following are the basic configs that control the indexing behavior. Please 
refer [the Flink 
configurations](configurations.md#Flink-Options-advanced-configs) for advanced 
configs.
 
-| Config Name                | Default                | Description            
                                                                                
                                                                                
                                                 |
-|----------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| index.type                 | FLINK_STATE (Optional) | Index type of Flink 
write job, default is using state backed index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li></ul><br />  `Config Param: INDEX_TYPE`  
                                                             |
-| hoodie.index.bucket.engine | SIMPLE (Optional)      | 
org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: Determines the type of 
bucketing or hashing to use when `hoodie.index.type` is set to `BUCKET`.    
Possible Values: <br /> <ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul> |
+| Config Name                                                 | Default        
        | Description                                                           
                                                                                
                                                                                
                                                            |
+|-------------------------------------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| index.type                                                  | FLINK_STATE 
(Optional) | Index type of Flink write job, default is using state backed 
index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li><li>GLOBAL_RECORD_LEVEL_INDEX</li><li>RECORD_LEVEL_INDEX</li></ul><br
 />  `Config Param: INDEX_TYPE`                                                 
           |
+| hoodie.index.bucket.engine                                  | SIMPLE 
(Optional)      | org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: 
Determines the type of bucketing or hashing to use when `hoodie.index.type` is 
set to `BUCKET`.    Possible Values: <br /> 
<ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul>                             
                              |
+| metadata.enabled                                            | true 
(Optional)        | Enables the metadata table. Required for Flink record-level 
index lookups.                                                                  
                                                                                
                                                                       |
+| index.global.enabled                                        | true 
(Optional)        | Whether to update the old partition path when the same 
record key arrives with a different partition path. This must be `true` for 
`GLOBAL_RECORD_LEVEL_INDEX` and is set to `false` for `RECORD_LEVEL_INDEX`.     
                                                                                
|
+| index.bootstrap.enabled                                     | false 
(Optional)       | When `index.type=GLOBAL_RECORD_LEVEL_INDEX`, controls 
whether Flink bootstraps the global index into a local RocksDB backend. If not 
explicitly set for global RLI, Flink enables bootstrap by default. Set to 
`false` to force native metadata-table RLI access.                              
      |
+| index.bootstrap.rocksdb.path                                | (Optional)     
        | Local directory path for the RocksDB backend used when 
`index.bootstrap.enabled=true`. Each task manager creates a unique subdirectory 
under this path.                                                                
                                                                             |
+| index.rli.cache.size                                        | 256 (Optional) 
        | Maximum memory, in MB, allocated for the record-level index cache per 
bucket-assign task. Applies to native metadata-table RLI access and partitioned 
RLI caches.                                                                     
                                                              |
+| index.rli.cache.concurrent.partitions.num                   | 2 (Optional)   
        | Expected number of partitions whose partitioned RLI caches are 
updated concurrently. Used to size each partition cache when historical cache 
usage is unavailable.                                                           
                                                                       |

Review Comment:
   🤖 The `index.write.tasks` row shows default `(N/A)`. Since this option 
resolves to the execution environment parallelism at runtime, it might read 
more clearly as a default of "execution environment parallelism" (matching how 
`read.tasks` and similar Flink configs are typically described), rather than 
`(N/A)` which could read as "unsupported."
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
website/docs/indexes.md:
##########
@@ -211,13 +211,38 @@ for more details. All these, support the index types 
mentioned [above](#addition
 
 #### Flink based configs
 
-For Flink DataStream and Flink SQL, Bucket index and Flink state index are 
supported.
+For Flink DataStream and Flink SQL, Bucket index, Flink state index, and 
record-level index are supported.
 Following are the basic configs that control the indexing behavior. Please 
refer [the Flink 
configurations](configurations.md#Flink-Options-advanced-configs) for advanced 
configs.
 
-| Config Name                | Default                | Description            
                                                                                
                                                                                
                                                 |
-|----------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| index.type                 | FLINK_STATE (Optional) | Index type of Flink 
write job, default is using state backed index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li></ul><br />  `Config Param: INDEX_TYPE`  
                                                             |
-| hoodie.index.bucket.engine | SIMPLE (Optional)      | 
org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: Determines the type of 
bucketing or hashing to use when `hoodie.index.type` is set to `BUCKET`.    
Possible Values: <br /> <ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul> |
+| Config Name                                                 | Default        
        | Description                                                           
                                                                                
                                                                                
                                                            |
+|-------------------------------------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| index.type                                                  | FLINK_STATE 
(Optional) | Index type of Flink write job, default is using state backed 
index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li><li>GLOBAL_RECORD_LEVEL_INDEX</li><li>RECORD_LEVEL_INDEX</li></ul><br
 />  `Config Param: INDEX_TYPE`                                                 
           |
+| hoodie.index.bucket.engine                                  | SIMPLE 
(Optional)      | org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: 
Determines the type of bucketing or hashing to use when `hoodie.index.type` is 
set to `BUCKET`.    Possible Values: <br /> 
<ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul>                             
                              |
+| metadata.enabled                                            | true 
(Optional)        | Enables the metadata table. Required for Flink record-level 
index lookups.                                                                  
                                                                                
                                                                       |
+| index.global.enabled                                        | true 
(Optional)        | Whether to update the old partition path when the same 
record key arrives with a different partition path. This must be `true` for 
`GLOBAL_RECORD_LEVEL_INDEX` and is set to `false` for `RECORD_LEVEL_INDEX`.     
                                                                                
|

Review Comment:
   🤖 The description says this cache size "Applies to native metadata-table RLI 
access and partitioned RLI caches." The FlinkOptions Javadoc adds an important 
detail: the per-checkpoint cache size is dynamically computed based on 
historical checkpoint averages. It might help to mention this dynamic sizing 
here so users understand `index.rli.cache.size` is an upper bound rather than a 
fixed allocation.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
website/docs/indexes.md:
##########
@@ -211,13 +211,38 @@ for more details. All these, support the index types 
mentioned [above](#addition
 
 #### Flink based configs
 
-For Flink DataStream and Flink SQL, Bucket index and Flink state index are 
supported.
+For Flink DataStream and Flink SQL, Bucket index, Flink state index, and 
record-level index are supported.
 Following are the basic configs that control the indexing behavior. Please 
refer [the Flink 
configurations](configurations.md#Flink-Options-advanced-configs) for advanced 
configs.
 
-| Config Name                | Default                | Description            
                                                                                
                                                                                
                                                 |
-|----------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| index.type                 | FLINK_STATE (Optional) | Index type of Flink 
write job, default is using state backed index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li></ul><br />  `Config Param: INDEX_TYPE`  
                                                             |
-| hoodie.index.bucket.engine | SIMPLE (Optional)      | 
org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: Determines the type of 
bucketing or hashing to use when `hoodie.index.type` is set to `BUCKET`.    
Possible Values: <br /> <ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul> |
+| Config Name                                                 | Default        
        | Description                                                           
                                                                                
                                                                                
                                                            |
+|-------------------------------------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| index.type                                                  | FLINK_STATE 
(Optional) | Index type of Flink write job, default is using state backed 
index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li><li>GLOBAL_RECORD_LEVEL_INDEX</li><li>RECORD_LEVEL_INDEX</li></ul><br
 />  `Config Param: INDEX_TYPE`                                                 
           |
+| hoodie.index.bucket.engine                                  | SIMPLE 
(Optional)      | org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: 
Determines the type of bucketing or hashing to use when `hoodie.index.type` is 
set to `BUCKET`.    Possible Values: <br /> 
<ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul>                             
                              |
+| metadata.enabled                                            | true 
(Optional)        | Enables the metadata table. Required for Flink record-level 
index lookups.                                                                  
                                                                                
                                                                       |
+| index.global.enabled                                        | true 
(Optional)        | Whether to update the old partition path when the same 
record key arrives with a different partition path. This must be `true` for 
`GLOBAL_RECORD_LEVEL_INDEX` and is set to `false` for `RECORD_LEVEL_INDEX`.     
                                                                                
|
+| index.bootstrap.enabled                                     | false 
(Optional)       | When `index.type=GLOBAL_RECORD_LEVEL_INDEX`, controls 
whether Flink bootstraps the global index into a local RocksDB backend. If not 
explicitly set for global RLI, Flink enables bootstrap by default. Set to 
`false` to force native metadata-table RLI access.                              
      |
+| index.bootstrap.rocksdb.path                                | (Optional)     
        | Local directory path for the RocksDB backend used when 
`index.bootstrap.enabled=true`. Each task manager creates a unique subdirectory 
under this path.                                                                
                                                                             |
+| index.rli.cache.size                                        | 256 (Optional) 
        | Maximum memory, in MB, allocated for the record-level index cache per 
bucket-assign task. Applies to native metadata-table RLI access and partitioned 
RLI caches.                                                                     
                                                              |
+| index.rli.cache.concurrent.partitions.num                   | 2 (Optional)   
        | Expected number of partitions whose partitioned RLI caches are 
updated concurrently. Used to size each partition cache when historical cache 
usage is unavailable.                                                           
                                                                       |
+| index.rli.lookup.minibatch.size                             | 1000 
(Optional)        | Maximum number of input records buffered for mini-batch 
record-index lookup. Mini-batching reduces individual metadata-table lookup 
calls for native global RLI access.                                             
                                                                               |
+| index.rli.write.buffer.size                                 | 100 (Optional) 
        | Maximum memory, in MB, for the index record writer buffer. When the 
threshold is reached, Flink flushes index records to avoid OOM.                 
                                                                                
                                                                |
+| index.write.tasks                                           | (N/A)          
        | Parallelism for tasks that write record-level index records. Defaults 
to the execution environment parallelism when not set.                          
                                                                                
                                                              |
+| metadata.compaction.schedule.enabled                        | true 
(Optional)        | Schedules metadata table compaction plans.                  
                                                                                
                                                                                
                                                                       |

Review Comment:
   🤖 The doc states "Flink ingestion does not support deferred RLI 
initialization, so keep this set to `false` for Flink RLI writes." Could you 
confirm where this restriction is enforced in Flink? If a user sets 
`hoodie.metadata.record.level.index.defer.init=true` with Flink today, does the 
write fail loudly, or does it silently fall back? A pointer to the failure mode 
(or to the relevant validator) would make this guidance more actionable for 
on-call debugging.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
website/docs/indexes.md:
##########
@@ -211,13 +211,38 @@ for more details. All these, support the index types 
mentioned [above](#addition
 
 #### Flink based configs
 
-For Flink DataStream and Flink SQL, Bucket index and Flink state index are 
supported.
+For Flink DataStream and Flink SQL, Bucket index, Flink state index, and 
record-level index are supported.
 Following are the basic configs that control the indexing behavior. Please 
refer [the Flink 
configurations](configurations.md#Flink-Options-advanced-configs) for advanced 
configs.
 
-| Config Name                | Default                | Description            
                                                                                
                                                                                
                                                 |
-|----------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| index.type                 | FLINK_STATE (Optional) | Index type of Flink 
write job, default is using state backed index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li></ul><br />  `Config Param: INDEX_TYPE`  
                                                             |
-| hoodie.index.bucket.engine | SIMPLE (Optional)      | 
org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: Determines the type of 
bucketing or hashing to use when `hoodie.index.type` is set to `BUCKET`.    
Possible Values: <br /> <ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul> |
+| Config Name                                                 | Default        
        | Description                                                           
                                                                                
                                                                                
                                                            |
+|-------------------------------------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| index.type                                                  | FLINK_STATE 
(Optional) | Index type of Flink write job, default is using state backed 
index. Possible values:<br /> 
<ul><li>FLINK_STATE</li><li>BUCKET</li><li>GLOBAL_RECORD_LEVEL_INDEX</li><li>RECORD_LEVEL_INDEX</li></ul><br
 />  `Config Param: INDEX_TYPE`                                                 
           |
+| hoodie.index.bucket.engine                                  | SIMPLE 
(Optional)      | org.apache.hudi.index.HoodieIndex$BucketIndexEngineType: 
Determines the type of bucketing or hashing to use when `hoodie.index.type` is 
set to `BUCKET`.    Possible Values: <br /> 
<ul><li>SIMPLE</li><li>CONSISTENT_HASHING</li></ul>                             
                              |
+| metadata.enabled                                            | true 
(Optional)        | Enables the metadata table. Required for Flink record-level 
index lookups.                                                                  
                                                                                
                                                                       |
+| index.global.enabled                                        | true 
(Optional)        | Whether to update the old partition path when the same 
record key arrives with a different partition path. This must be `true` for 
`GLOBAL_RECORD_LEVEL_INDEX` and is set to `false` for `RECORD_LEVEL_INDEX`.     
                                                                                
|
+| index.bootstrap.enabled                                     | false 
(Optional)       | When `index.type=GLOBAL_RECORD_LEVEL_INDEX`, controls 
whether Flink bootstraps the global index into a local RocksDB backend. If not 
explicitly set for global RLI, Flink enables bootstrap by default. Set to 
`false` to force native metadata-table RLI access.                              
      |
+| index.bootstrap.rocksdb.path                                | (Optional)     
        | Local directory path for the RocksDB backend used when 
`index.bootstrap.enabled=true`. Each task manager creates a unique subdirectory 
under this path.                                                                
                                                                             |

Review Comment:
   🤖 The FlinkOptions Javadoc for `index.rli.lookup.minibatch.size` notes that 
1000 is also the minimum — if a smaller value is configured, the default is 
used instead. Worth surfacing here so users tuning down the value aren't 
surprised that small settings are silently ignored.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to