yihua commented on code in PR #18867:
URL: https://github.com/apache/hudi/pull/18867#discussion_r3319003100


##########
website/docs/hoodie_streaming_ingestion.md:
##########
@@ -503,6 +505,78 @@ Check out [Kafka source 
config](https://hudi.apache.org/docs/configurations#Kafk
 Hudi Streamer also supports ingesting from Apache Pulsar via 
`org.apache.hudi.utilities.sources.PulsarSource`.
 Check out [Pulsar source 
config](https://hudi.apache.org/docs/configurations#Pulsar-Source-Configs) for 
more details.
 
+#### Amazon Kinesis
+
+Use the `JsonKinesisSource` 
(`org.apache.hudi.utilities.sources.JsonKinesisSource`) to ingest JSON records 
from an AWS Kinesis Data Stream into a Hudi table. It reads from every shard in 
parallel, tracks per-shard progress in the Hudi Streamer checkpoint, 
automatically handles shard splits and merges, and de-aggregates records 
produced by the Kinesis Producer Library (KPL).
+
+##### Common configuration
+
+All keys use the prefix `hoodie.streamer.source.kinesis.`. The settings most 
users need:
+
+| Config key | Default | Description |
+|---|---|---|
+| `hoodie.streamer.source.kinesis.stream.name` | (required) | Kinesis Data 
Streams stream name. |
+| `hoodie.streamer.source.kinesis.region` | (required) | AWS region for the 
stream (e.g., `us-east-1`). |
+| `hoodie.streamer.source.kinesis.starting.position` | `LATEST` | Where to 
start when no checkpoint exists yet. `LATEST` starts at the tip of each shard; 
`EARLIEST` replays from `TRIM_HORIZON`. |
+| `hoodie.streamer.source.kinesis.max.events` | `5000000` | Maximum number of 
records read per batch across all shards. Tune to control batch size. |
+| `hoodie.streamer.source.kinesis.append.offsets` | `false` | When enabled, 
appends Kinesis metadata fields to each record: 
`_hoodie_kinesis_source_sequence_number`, `_hoodie_kinesis_source_shard_id`, 
`_hoodie_kinesis_source_partition_key`, `_hoodie_kinesis_source_timestamp`. |

Review Comment:
   Got it.  Removed this config for clarity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to