jonvex commented on code in PR #9338: URL: https://github.com/apache/hudi/pull/9338#discussion_r1282347510
########## website/docs/migration_guide.md: ########## @@ -56,11 +64,13 @@ spark-submit --master local \ --hoodie-conf hoodie.bootstrap.keygen.class=org.apache.hudi.keygen.SimpleKeyGenerator \ --hoodie-conf hoodie.bootstrap.full.input.provider=org.apache.hudi.bootstrap.SparkParquetBootstrapDataProvider \ Review Comment: I don't think we need `hoodie-conf hoodie.bootstrap.full.input.provider` in the example ########## website/docs/migration_guide.md: ########## @@ -69,12 +79,28 @@ for partition in [list of partitions in source table] { } ``` -**Option 3** +**Option 3 using Spark SQL CALL Procedure** + +Refer to [Bootstrap procedure](https://hudi.apache.org/docs/next/procedures#bootstrap) for more details. + +**Option 4 using Hudi CLI** + Write your own custom logic of how to load an existing table into a Hudi managed one. Please read about the RDD API [here](/docs/quick-start-guide). Using the bootstrap run CLI. Once hudi has been built via `mvn clean install -DskipTests`, the shell can be fired by via `cd hudi-cli && ./hudi-cli.sh`. ```java hudi->bootstrap run --srcPath /tmp/source_table --targetPath /tmp/hoodie/bootstrap_table --tableName bootstrap_table --tableType COPY_ON_WRITE --rowKeyField ${KEY_FIELD} --partitionPathField ${PARTITION_FIELD} --sparkMaster local --hoodieConfigs hoodie.datasource.write.hive_style_partitioning=true --selectorClass org.apache.hudi.client.bootstrap.selector.FullRecordBootstrapModeSelector ``` -Unlike deltaStream, FULL_RECORD or METADATA_ONLY is set with --selectorClass, see detalis with help "bootstrap run". +Unlike Hudi Streamer, FULL_RECORD or METADATA_ONLY is set with --selectorClass, see details with help "bootstrap run". + + +## Configs + +Here are the basic configs that control bootstrapping. + +| Config Name | Default | Description | +| ---------------------------------------------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------- | +| hoodie.bootstrap.base.path | N/A **(Required)** | Base path of the dataset that needs to be bootstrapped as a Hudi table<br /><br />`Config Param: BASE_PATH`<br />`Since Version: 0.6.0` | + +By default, with only `hoodie.bootstrap.base.path` being provided METADATA_ONLY mode is selected. For other options, please refer [bootstrap configs](https://hudi.apache.org/docs/next/configurations#Bootstrap-Configs) for more details. Review Comment: I think adding `hoodie.bootstrap.mode.selector.regex.mode`, `hoodie.bootstrap.mode.selector`, `hoodie.bootstrap.mode.selector.regex` to the simple configs would be helpful. At a minimum at least `hoodie.bootstrap.mode.selector` should be added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org