Matt Burgess created NIFI-4109: ---------------------------------- Summary: Implement an InferRecordSchema processor Key: NIFI-4109 URL: https://issues.apache.org/jira/browse/NIFI-4109 Project: Apache NiFi Issue Type: New Feature Components: Extensions Reporter: Matt Burgess
Currently a record schema (for use in record-aware processors) must be provided by an attribute, a Schema Registry, or embedded in the flow file, and thus determined ahead of time. For formats that do not carry a schema (CSV, JSON, e.g.) and for flows whose files' schemas vary or are otherwise not known a priori, it would be helpful to have a processor to be able to infer the schema from the content. It could have any/all of the following features: - Record-awareness: The existing InferAvroSchema can be used for CSV and JSON with non-record-aware processors/flows, although it does not currently support Avro logical types such as timestamp (see NIFI-3000). The benefit of record-awareness means better inference can be made by inspecting each record in a flowfile. - Type inference: Should include the primitive types (numeric, string) as well as more complex types supported by Avro schemas (time, date, timestamp, etc.) - Generate Schema in attribute: Recommend "avro.schema" be used as the output attribute, as this is the default for most RecordWriters. - Publish Schema to Registry: This is an advanced feature that could be split out into its own Jira due to scope concerns. -- This message was sent by Atlassian JIRA (v6.4.14#64029)