Nikita-tech-writer commented on a change in pull request #9708: URL: https://github.com/apache/ignite/pull/9708#discussion_r781241964
########## File path: docs/_docs/persistence/change-data-capture.adoc ########## @@ -0,0 +1,129 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. += Change Data Capture + + +== Overview +Change Data Capture (link:https://en.wikipedia.org/wiki/Change_data_capture[CDC]) is a data processing pattern used to asynchronously receive entries that have been changed on the local node so that action can be taken using the changed entry. + +WARNING: Change Data Capture is an experimental feature whose API or design architecture might be changed. + +Below are some of the Change Data Capture use cases: + + * Streaming changes in Warehouse; + * Updating search index; + * Calculating statistics (streaming queries); + * Auditing logs; + * Async interaction with extenal system: Moderation, business process invocation, etc. + +Ignite implements Change Data Capture with the `ignite-cdc.sh` application and link:https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cdc/CdcConsumer.java#L56[Java API]. + +Below are the Change Data Capture application and the Ignite node integrated via WAL archive segments: + +image:../../assets/images/integrations/CDC-design.svg[] + +When Change Data Capture is enabled, the Ignite server node creates a hard link to each WAL archive segment in the special `db/cdc/\{consistency_id\}` directory. +The `ignite-cdc.sh` application runs on a different JVM and processes newly archived link:native-persistence.adoc#_write-ahead_log[WAL segments]. +When the segment is fully processed by `ignite-cdc.sh`, it is removed. The actual disk space is free when both links (archive and Change Data Capture) are removed. +The state of consumption can be saved to continue from it in case of any failure. + +== Configuration + +=== Ignite Node + +[cols="20%,45%,35%",opts="header"] +|=== +|Name |Description | Default value +| `DataStorageConfiguration#cdcEnabled` | Flag to enable Change Data Capture on the server node. | `false` +| `DataStorageConfiguration#cdcWalPath` | Path to the Change Data Capture directory | `"db/wal/cdc"` +| `DataStorageConfiguration#walForceArchiveTimeout` | Timeout to forcefully archive the WAL segment even it is not complete. | `-1` (disabled) +|=== + +=== Change Data Capture Application + +Change Data Capture is configured in the same way as the Ignite node - via the spring XML file: + +* `ignite-cdc.sh` requires both Ignite and Change Data Capture configurations to start; +* `IgniteConfiguration` is used to determine common options like a path to the Change Data Capture directory, node consistent id, and other parameters; +* `CdcConfiguration` contains `ignite-cdc.sh`-specific options. + +[cols="20%,45%,35%",opts="header"] +|=== +|Name |Description | Default value +| `lockTimeout` | Timeout to wait for lock acquiring. Change Data Capture locks directory on a startup to ensure there is no concurrent Change Data Capture processing in the same directory. +| 1000 milliseconds. +| `checkFrequency` | Amount of time application sleeps between subsequent checks when no new files available. | 1000 milliseconds. +| `keepBinary` | Flag to specify if key and value of changed entries should be provided in link:../key-value-api/binary-objects.adoc[binary format]. | `true` +| `consumer` | Implementation of `org.apache.ignite.cdc.CdcConsumer` that consumes entries changes | null +| `metricExporterSpi` | Array of SPI's to export CDC metrics. See link:../monitoring-metrics/new-metrics-system.adoc#_metric_exporters[metrics] documentation, also. | null +|=== + +== API + +=== `org.apache.ignite.cdc.CdcEvent` +Single change of data reflected by `CdcEvent`. +Let's get a closer look to the information provided: + +[cols="20%,80%",opts="header"] +|=== +|Name |Description +| `key()` | Key for the changed entry. +| `value()` | Value for the changed entry. This method will return `null` if event reflects removal. +| `cacheId()` | ID of the cache where change happens. Value equal to the `CACHE_ID` from `SYS.CACHES` link:../monitoring-metrics/system-views.adoc#_CACHES[system view] +| `partition()` | Partition of the changed entry. +| `primary()` | Flag to distinguish if operation happen on primary or backup node. +| `version()` | `Comparable` version of the changed entry. Internally Ignite maintain ordered versions of each entry so any changes of the same entry can be sorted. +|=== + +=== `org.apache.ignite.cdc.CdcConsumer` + +Consumer of change events. Should be implemented by the user. +[cols="20%,80%",opts="header"] +|=== +|Name |Description +| `void start(MetricRegistry)` | Invoked one time on the start of CDC application. `MetricRegistry` should be used to export consumer specific metrics. +| `boolean onEvents(Iterator<CdcEvent> events)` | Main method that processes changes. When this method returns `true` state will be saved on the disk. State points to the event next to the last read. In case of any failure consumption will continue from the last saved state. +| `void stop()` | Invoked one time on the stop of CDC application. +|=== + +== Metrics + +`ignite-cdc.sh` uses the same SPI to export metrics as Ignite. +The following metrics provided by the application itself (additional metrics may be provided by the consumer): +|=== +|Name |Description +| CurrentSegmentIndex | Index of currently processed WAL segment. +| CommittedSegmentIndex | Index of WAL segment contains last commited state. +| CommittedSegmentOffset | Commited offset in bytes inside WAL segment. +| LastSegmentConsumptionTime | Time in milliseconds when last segment processing was started. +| BinaryMetaDir | Binary meta directory this application reads from. +| MarshallerDir | Marshaller directory this application reads from. +| CdcDir | CDC directory this application reads from. +|=== + +== Logging + +`ignite-cdc.sh` use the same logging configuration as Ignite node. The only difference is log written in file "ignite-cdc.log" + +== Lifecycle + +IMPORTANT: `ignite-cdc.sh` implements fail-fast approach. It will just fail in case of any error. Restart should be configured with the OS tools. Review comment: ```suggestion IMPORTANT: `ignite-cdc.sh` implements the fail-fast approach. It just fails in case of any error. The restart procedure should be configured with the OS tools. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
