Github user bbende commented on a diff in the pull request: https://github.com/apache/nifi-registry/pull/112#discussion_r184694579 --- Diff: nifi-registry-docs/src/main/asciidoc/administration-guide.adoc --- @@ -895,3 +895,167 @@ Providing 2 total locations, including `nifi.registry.extension.dir.1`. Example: `/etc/http-nifi-registry.keytab` |nifi.registry.kerberos.spengo.authentication.expiration|The expiration duration of a successful Kerberos user authentication, if used. The default value is `12 hours`. |==== + +== Persistence Providers + +NiFi Registry uses a pluggable flow persistence provider to store the content of the flows saved to the registry. NiFi Registry provides `<<FileSystemFlowPersistenceProvider>>` and `<<GitFlowPersistenceProvider>>`. + +Each persistence provider has its own configuration parameters, those can be configured in a XML file specified in <<Providers Properties,nifi-registry.properties>>. + +The XML configuration file looks like below. It has a `flowPersistenceProvider` element in which qualified class name of a persistence provider implementation and its configuration properties are defined. See following sections for available configurations for each providers. + +.Example providers.xml +[source,xml] +.... +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<providers> + + <flowPersistenceProvider> + <class>persistence-provider-qualified-class-name</class> + <property name="property-1">property-value-1</property> + <property name="property-2">property-value-2</property> + <property name="property-n">property-value-n</property> + </flowPersistenceProvider> + +</providers> +.... + + +=== FileSystemFlowPersistenceProvider + +FileSystemFlowPersistenceProvider simply stores serialized Flow contents into `{bucket-id}/{flow-id}/{version}` directories. + +Example of persisted files: +.... +Flow Storage Directory/ +âââ {bucket-id}/ +â âââ {flow-id}/ +â âââ {version}/{version}.snapshot +âââ d1beba88-32e9-45d1-bfe9-057cc41f7ce8/ + âââ 219cf539-427f-43be-9294-0644fb07ca63/ + âââ 1/1.snapshot + âââ 2/2.snapshot +.... + +Qualified class name: `org.apache.nifi.registry.provider.flow.FileSystemFlowPersistenceProvider` + +|==== +|*Property*|*Description* +|Flow Storage Directory|REQUIRED: File system path for a directory where flow contents files are persisted to. If the directory does not exist when NiFi Registry starts, it will be created. If the directory exists, it must be readable and writable from NiFi Registry. +|==== + + +=== GitFlowPersistenceProvider + +GitFlowPersistenceProvider stores flow contents under a Git directory. + +In contrast to FileSystemFlowPersistenceProvider, this provider uses human friendly Bucket and Flow names so that those files can be accessed by external tools. However, it is NOT supported to modify stored files outside of NiFi Registry. Persisted files are only read when NiFi Registry starts up. + +Buckets are represented as directories and Flow contents are stored as files in a Bucket directory they belong to. Flow snapshot histories are managed as Git commits, meaning only the latest version of Buckets and Flows exist in the Git directory. Old versions are retrieved from Git commit histories. + +.Example persisted files +.... +Flow Storage Directory/ +âââ .git/ +âââ Bucket A/ +â âââ bucket.yml +â âââ Flow 1.snapshot +â âââ Flow 2.snapshot +âââ Bucket B/ + âââ bucket.yml + âââ Flow 4.snapshot +.... + +Each Bucket directory contains a YAML file named `bucket.yml`. The file manages links from NiFi Registry Bucket and Flow IDs to actual directory and file names. When NiFi Registry starts, this provider reads through Git commit histories and lookup these `bucket.yml` files to restore Buckets and Flows for each snapshot version. + +.Example bucket.yml +[source,yml] +.... +layoutVer: 1 +bucketId: d1beba88-32e9-45d1-bfe9-057cc41f7ce8 +flows: + 219cf539-427f-43be-9294-0644fb07ca63: {ver: 7, file: Flow 1.snapshot} + 22cccb6c-3011-4493-a996-611f8f112969: {ver: 3, file: Flow 2.snapshot} +.... + +Qualified class name: `org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider` + +|==== +|*Property*|*Description* +|Flow Storage Directory|REQUIRED: File system path for a directory where flow contents files are persisted to. The directory must exist when NiFi registry starts. Also must be initialized as a Git directory. See <<Initialize Git directory>> for detail. +|Remote To Push|When a new flow snapshot is created, this persistence provider updated files in the specified Git directory, then create a commit to the local repository. If `Remote To Push` is defined, it also pushes to the specified remote repository. E.g. 'origin'. To define more detailed remote spec such as branch names, use `Refspec`. See https://git-scm.com/book/en/v2/Git-Internals-The-Refspec +|Remote Access User|This user name is used to make push requests to the remote repository when `Remote To Push` is enabled, and the remote repository is accessed by HTTP protocol. If SSH is used, user authentication is done with SSH keys. +|Remote Access Password|Used with `Remote Access User`. +|==== + +==== Initialize Git directory + +In order to use GitFlowPersistenceRepository, you need to prepare a Git directory on the local file system. You can do so by initializing a directory with `git init` command, or clone an existing Git project from a remote Git repository by `git clone` command. + +- Git init command +https://git-scm.com/docs/git-init +- Git clone command +https://git-scm.com/docs/git-clone + + +==== Git user configuration + +Git distinguishes a user by its username and email address. This persistence provider uses NiFi Registry username when it creates Git commits. However since NiFi Registry users do not provide email address, preconfigured Git user email address is used. + +You can configure Git user name and email address by `git config` command. + +- Git config command +https://git-scm.com/docs/git-config + + +==== Git user authentication + +By default, this persistence repository only create commits to local repository. No user authentication is needed to do so. However, if 'Commit To Push' is enabled, user authentication to the remote Git repository is required. + +If the remote repository is accessed by HTTP, then username and password for authentication can be configured in the providers XML configuration file. + +When SSH is used, SSH keys are used to identify a Git user. In order to pick the right key to a remote server, the SSH configuration file `${USER_HOME}/.ssh/config` is used. The SSH configuration file can contain multiple `Host` entries to specify a key file to login to a remote Git server. The `Host` must much with the target remote Git server hostname. + +.example SSH config file +.... +Host git.example.com + HostName git.example.com + IdentityFile ~/.ssh/id_rsa + +Host github.com + HostName github.com + IdentityFile ~/.ssh/key-for-github + +Host bitbucket.org + HostName bitbucket.org + IdentityFile ~/.ssh/key-for-bitbucket +.... + +=== Data model version of serialized Flow snapshots + +Serialized Flow snapshots saved by these persistence providers have versions, so that the data format and schema can evolve over time. Data model version update is done automatically by NiFi Registry when it reads and stores each Flow content. + +Here is the data model version histories: + +|==== +|*Data model version*|*Since NiFi Registry*|*Description* +|2|0.2|JSON formatted text file. The root object contains header and Flow content object. +|1|0.1|Binary format having header bytes at the beginning followed by Flow content represented as XML. +|==== + +=== Migrating stored files between different Persistence Provider --- End diff -- I think instead of providing a tool we can just offer instructions for how to reset your registry to use the git provider, something like: ``` - Stop version control on all PGs in NiFi - Stop registry - Move the H2 DB and file-based flow dir somewhere for back up - Configure git provider in providers.xml - Start registry - Recreate any buckets - Start version control on all PGs again ``` This way the CLI doesn't need to depend on registry framework code since it is more of a client. What do you think?
---