[druid] branch master updated: edits to kafka inputFormat (#11796)

cwylie Fri, 15 Oct 2021 14:01:42 -0700

This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 938c149  edits to kafka inputFormat (#11796)
938c149 is described below

commit 938c1493e53f61cb9e16f94616995dade0318b10
Author: Charles Smith <[email protected]>
AuthorDate: Fri Oct 15 14:01:10 2021 -0700

    edits to kafka inputFormat (#11796)
    
    * edits to kafka inputFormat
    
    * revise conflict resolution description
    
    * tweak for clarity
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * style fixes
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/data-formats.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    Co-authored-by: Katya Macedo  <[email protected]>
---
 docs/ingestion/data-formats.md | 146 ++++++++++++++++++++---------------------
 1 file changed, 70 insertions(+), 76 deletions(-)

diff --git a/docs/ingestion/data-formats.md b/docs/ingestion/data-formats.md
index a88e5fc..19707b4 100644
--- a/docs/ingestion/data-formats.md
+++ b/docs/ingestion/data-formats.md
@@ -29,7 +29,7 @@ This page lists all default and core extension data formats 
supported by Druid.
 For additional data formats supported with community extensions,
 please see our [community extensions 
list](../development/extensions.md#community-extensions).
 
-## Formatting the Data
+## Formatting data
 
 The following samples show data formats that are natively supported in Druid:
 
@@ -67,12 +67,12 @@ Note that the CSV and TSV data do not contain column heads. 
This becomes importa
 
 Besides text formats, Druid also supports binary formats such as [Orc](#orc) 
and [Parquet](#parquet) formats.
 
-## Custom Formats
+## Custom formats
 
-Druid supports custom data formats and can use the `Regex` parser or the 
`JavaScript` parsers to parse these formats. Please note that using any of 
these parsers for
-parsing data will not be as efficient as writing a native Java parser or using 
an external stream processor. We welcome contributions of new Parsers.
+Druid supports custom data formats and can use the Regex parser or the 
JavaScript parsers to parse these formats. Using any of these parsers for
+parsing data is less efficient than writing a native Java parser or using an 
external stream processor. We welcome contributions of new parsers.
 
-## Input Format
+## Input format
 
 > The Input Format is a new way to specify the data format of your input data 
 > which was introduced in 0.17.0.
 Unfortunately, the Input Format doesn't support all data formats or ingestion 
methods supported by Druid yet.
@@ -87,7 +87,7 @@ Configure the JSON `inputFormat` to load JSON data as follows:
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `json`. | yes |
+| type | String | Set value to `json`. | yes |
 | flattenSpec | JSON Object | Specifies flattening configuration for nested 
JSON data. See [`flattenSpec`](#flattenspec) for more info. | no |
 | featureSpec | JSON Object | [JSON parser 
features](https://github.com/FasterXML/jackson-core/wiki/JsonParser-Features) 
supported by Jackson library. Those features will be applied when parsing the 
input JSON data. | no |
 
@@ -107,7 +107,7 @@ Configure the CSV `inputFormat` to load CSV data as follows:
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `csv`. | yes |
+| type | String | Set value to `csv`. | yes |
 | listDelimiter | String | A custom delimiter for multi-value dimensions. | no 
(default = ctrl+A) |
 | columns | JSON array | Specifies the columns of the data. The columns should 
be in the same order with the columns of your data. | yes if 
`findColumnsFromHeader` is false or missing |
 | findColumnsFromHeader | Boolean | If this is set, the task will find the 
column names from the header row. Note that `skipHeaderRows` will be applied 
before finding column names from the header. For example, if you set 
`skipHeaderRows` to 2 and `findColumnsFromHeader` to true, the task will skip 
the first two lines and then extract column information from the third line. 
`columns` will be ignored if this is set to true. | no (default = false if 
`columns` is set; otherwise null) |
@@ -130,7 +130,7 @@ Configure the TSV `inputFormat` to load TSV data as follows:
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `tsv`. | yes |
+| type | String | Set value to `tsv`. | yes |
 | delimiter | String | A custom delimiter for data values. | no (default = 
`\t`) |
 | listDelimiter | String | A custom delimiter for multi-value dimensions. | no 
(default = ctrl+A) |
 | columns | JSON array | Specifies the columns of the data. The columns should 
be in the same order with the columns of your data. | yes if 
`findColumnsFromHeader` is false or missing |
@@ -151,11 +151,24 @@ Be sure to change the `delimiter` to the appropriate 
delimiter for your data. Li
 }
 ```
 
-### KAFKA
+### Kafka
 
-The `inputFormat` to load complete kafka record including header, key and 
value. An example is:
+Configure the Kafka `inputFormat` to load complete kafka records including 
header, key, and value. 
 
-```json
+> That Kafka inputFormat is currently designated as experimental.
+
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+| type | String | Set value to `kafka`. | yes |
+| headerLabelPrefix | String | Custom label prefix for all the header columns. 
| no (default = "kafka.header.") |
+| timestampColumnName | String | Name of the column for the kafka record's 
timestamp.| no (default = "kafka.timestamp") |
+| keyColumnName | String | Name of the column for the kafka record's key.| no 
(default = "kafka.key") |
+| headerFormat | Object | `headerFormat` specifies how to parse the Kafka 
headers. Supports String types. Because Kafka header values are bytes, the 
parser decodes them as UTF-8 encoded strings. To change this behavior, 
implement your own parser based on the encoding style. Change the 'encoding' 
type in `KafkaStringHeaderFormat` to match your custom implementation. | no |
+| keyFormat | [InputFormat](#input-format) | Any existing `inputFormat` used 
to parse the Kafka key. It only processes the first entry of the input format. 
For details, see [Specifying data 
format](../development/extensions-core/kafka-ingestion.md#specifying-data-format).
 | no |
+| valueFormat | [InputFormat](#input-format) | `valueFormat` can be any 
existing `inputFormat` to parse the Kafka value payload. For details about 
specifying the input format, see [Specifying data 
format](../development/extensions-core/kafka-ingestion.md#specifying-data-format).
 | yes |
+
+For example:
+```
 "ioConfig": {
   "inputFormat": {
       "type": "kafka",
@@ -179,47 +192,28 @@ The `inputFormat` to load complete kafka record including 
header, key and value.
 }
 ```
 
-The KAFKA `inputFormat` has the following components:
-
-> Note that KAFKA inputFormat is currently designated as experimental.
-
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-| type | String | This should say `kafka`. | yes |
-| headerLabelPrefix | String | A custom label prefix for all the header 
columns. | no (default = "kafka.header.") |
-| timestampColumnName | String | Specifies the name of the column for the 
kafka record's timestamp.| no (default = "kafka.timestamp") |
-| keyColumnName | String | Specifies the name of the column for the kafka 
record's key.| no (default = "kafka.key") |
-| headerFormat | Object | headerFormat specifies how to parse the kafka 
headers. Current supported type is "string". Since header values are bytes, the 
current parser by defaults reads it as UTF-8 encoded strings. There is 
flexibility to change this behavior by implementing your very own parser based 
on the encoding style. The 'encoding' type in KafkaStringHeaderFormat class 
needs to change with the custom implementation. | no |
-| keyFormat | [InputFormat](#input-format) | keyFormat can be any existing 
inputFormat to parse the kafka key. The current behavior is to only process the 
first entry of the input format. See [the below 
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
 for details about specifying the input format. | no |
-| valueFormat | [InputFormat](#input-format) | valueFormat can be any existing 
inputFormat to parse the kafka value payload. See [the below 
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
 for details about specifying the input format. | yes |
+Note the following behaviors:
+- If there are conflicts between column names, Druid uses the column names 
from the payload and ignores the column name from the header or key. This 
behavior makes it easier to migrate to the the Kafka `inputFormat` from another 
Kafka ingestion spec without losing data.
+- The Kafka input format fundamentally blends information from the header, 
key, and value objects from a Kafka record to create a row in Druid. It 
extracts individual records from the value. Then it augments each value with 
the corresponding key or header columns.
+- The Kafka input format by default exposes Kafka timestamp 
`timestampColumnName` to make it available for use as the primary timestamp 
column. Alternatively you can choose timestamp column from either the key or 
value payload.
 
+For example, the following `timestampSpec` uses the default Kafka timestamp 
from the Kafka record:
 ```
-> For any conflicts in dimension/metric names, this inputFormat will prefer 
kafka value's column names.
-> This will enable seemless porting of existing kafka ingestion inputFormat to 
this new format, with additional columns from kafka header and key.
-
-> Kafka input format fundamentally blends information from header, key and 
value portions of a kafka record to create a druid row. It does this by 
-> exploding individual records from the value and augmenting each of these 
values with the selected key/header columns.
-
-> Kafka input format also by default exposes kafka timestamp 
(timestampColumnName), which can be used as the primary timestamp column. 
-> One can also choose timestamp column from either key or value payload, if 
there is no timestamp available then the default kafka timestamp is our savior.
-> eg.,
-
-    // Below timestampSpec chooses kafka's default timestamp that is available 
in kafka record
     "timestampSpec":
     {
         "column": "kafka.timestamp",
         "format": "millis"
     }
+```
     
-    // Assuming there is a timestamp field in the header and we have 
"kafka.header." as a desired prefix for header columns,
-    // below example chooses header's timestamp as a primary timestamp column
+If you are using "kafka.header." as the prefix for Kafka header columns and 
there is a timestamp field in the header, the header timestamp serves as the 
primary timestamp column. For example:
+```
     "timestampSpec":
     {
         "column": "kafka.header.timestamp",
         "format": "millis"
     }
 ```
-
 ### ORC
 
 To use the ORC input format, load the Druid Orc extension ( 
[`druid-orc-extensions`](../development/extensions-core/orc.md)). 
@@ -229,7 +223,7 @@ Configure the ORC `inputFormat` to load ORC data as follows:
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `orc`. | yes |
+| type | String | Set value to `orc`. | yes |
 | flattenSpec | JSON Object | Specifies flattening configuration for nested 
ORC data. See [`flattenSpec`](#flattenspec) for more info. | no |
 | binaryAsString | Boolean | Specifies if the binary orc column which is not 
logically marked as a string should be treated as a UTF-8 encoded string. | no 
(default = false) |
 
@@ -262,8 +256,8 @@ Configure the Parquet `inputFormat` to load Parquet data as 
follows:
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-|type| String| This should be set to `parquet` to read Parquet file| yes |
-|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Parquet file. Note that only 'path' expression are 
supported ('jq' is unavailable).| no (default will auto-discover 'root' level 
properties) |
+|type| String| Set value to `parquet`.| yes |
+|flattenSpec| JSON Object | Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Parquet file. Only 'path' expressions are supported ('jq' 
is unavailable).| no (default will auto-discover 'root' level properties) |
 | binaryAsString | Boolean | Specifies if the bytes parquet column which is 
not logically marked as a string or enum type should be treated as a UTF-8 
encoded string. | no (default = false) |
 
 For example:
@@ -297,8 +291,8 @@ Configure the Avro `inputFormat` to load Avro data as 
follows:
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-|type| String| This should be set to `avro_stream` to read Avro serialized 
data| yes |
-|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Avro record. Note that only 'path' expression are 
supported ('jq' is unavailable).| no (default will auto-discover 'root' level 
properties) |
+|type| String| Set value to `avro_stream`. | yes |
+|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Avro record. Only 'path' expressions are supported ('jq' 
is unavailable).| no (default will auto-discover 'root' level properties) |
 |`avroBytesDecoder`| JSON Object |Specifies how to decode bytes to Avro 
record. | yes |
 | binaryAsString | Boolean | Specifies if the bytes Avro column which is not 
logically marked as a string or enum type should be treated as a UTF-8 encoded 
string. | no (default = false) |
 
@@ -412,7 +406,7 @@ This Avro bytes decoder first extracts `subject` and `id` 
from the input message
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `schema_repo`. | no |
+| type | String | Set value to `schema_repo`. | no |
 | subjectAndIdConverter | JSON Object | Specifies how to extract the subject 
and id from message bytes. | yes |
 | schemaRepository | JSON Object | Specifies how to look up the Avro schema 
from subject and id. | yes |
 
@@ -422,7 +416,7 @@ This section describes the format of the 
`subjectAndIdConverter` object for the
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `avro_1124`. | no |
+| type | String | Set value to `avro_1124`. | no |
 | topic | String | Specifies the topic of your Kafka stream. | yes |
 
 
@@ -432,8 +426,8 @@ This section describes the format of the `schemaRepository` 
object for the `sche
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `avro_1124_rest_client`. | no |
-| url | String | Specifies the endpoint url of your Avro-1124 schema 
repository. | yes |
+| type | String | Set value to `avro_1124_rest_client`. | no |
+| url | String | Specifies the endpoint URL of your Avro-1124 schema 
repository. | yes |
 
 ###### Confluent Schema Registry-based Avro Bytes Decoder
 
@@ -442,10 +436,10 @@ For details, see the Schema Registry 
[documentation](http://docs.confluent.io/cu
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `schema_registry`. | no |
-| url | String | Specifies the url endpoint of the Schema Registry. | yes |
+| type | String | Set value to `schema_registry`. | no |
+| url | String | Specifies the URL endpoint of the Schema Registry. | yes |
 | capacity | Integer | Specifies the max size of the cache (default = 
Integer.MAX_VALUE). | no |
-| urls | Array<String> | Specifies the url endpoints of the multiple Schema 
Registry instances. | yes(if `url` is not provided) |
+| urls | Array<String> | Specifies the URL endpoints of the multiple Schema 
Registry instances. | yes (if `url` is not provided) |
 | config | Json | To send additional configurations, configured for Schema 
Registry.  This can be supplied via a 
[DynamicConfigProvider](../operations/dynamic-config-provider.md) | no |
 | headers | Json | To send headers to the Schema Registry.  This can be 
supplied via a 
[DynamicConfigProvider](../operations/dynamic-config-provider.md) | no |
 
@@ -504,9 +498,9 @@ Configure the Avro OCF `inputFormat` to load Avro OCF data 
as follows:
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-|type| String| This should be set to `avro_ocf` to read Avro OCF file| yes |
-|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Avro records. Note that only 'path' expression are 
supported ('jq' is unavailable).| no (default will auto-discover 'root' level 
properties) |
-|schema| JSON Object |Define a reader schema to be used when parsing Avro 
records, this is useful when parsing multiple versions of Avro OCF file data | 
no (default will decode using the writer schema contained in the OCF file) |
+|type| String|  Set value to `avro_ocf`. | yes |
+|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from Avro records. Only 'path' expressions are supported ('jq' is 
unavailable).| no (default will auto-discover 'root' level properties) |
+|schema| JSON Object |Define a reader schema to be used when parsing Avro 
records. This is useful when parsing multiple versions of Avro OCF file data. | 
no (default will decode using the writer schema contained in the OCF file) |
 | binaryAsString | Boolean | Specifies if the bytes parquet column which is 
not logically marked as a string or enum type should be treated as a UTF-8 
encoded string. | no (default = false) |
 
 For example:
@@ -553,7 +547,7 @@ Configure the Protobuf `inputFormat` to load Protobuf data 
as follows:
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-|type| String| This should be set to `protobuf` to read Protobuf serialized 
data| yes |
+|type| String| Set value to `protobuf`. | yes |
 |flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Protobuf record. Note that only 'path' expression are 
supported ('jq' is unavailable).| no (default will auto-discover 'root' level 
properties) |
 |`protoBytesDecoder`| JSON Object |Specifies how to decode bytes to Protobuf 
record. | yes |
 
@@ -645,7 +639,7 @@ Each line can be further parsed using 
[`parseSpec`](#parsespec).
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `string` in general, or `hadoopyString` when 
used in a Hadoop indexing job. | yes |
+| type | String | Set value to `string` for most cases. Otherwise use 
`hadoopyString` for Hadoop indexing. | yes |
 | parseSpec | JSON Object | Specifies the format, timestamp, and dimensions of 
the data. | yes |
 
 ### Avro Hadoop Parser
@@ -664,7 +658,7 @@ See [Avro 
specification](http://avro.apache.org/docs/1.7.7/spec.html#Schema+Reso
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `avro_hadoop`. | yes |
+| type | String | Set value to `avro_hadoop`. | yes |
 | parseSpec | JSON Object | Specifies the timestamp and dimensions of the 
data. Should be an "avro" parseSpec. | yes |
 | fromPigAvroStorage | Boolean | Specifies whether the data file is stored 
using AvroStorage. | no(default == false) |
 
@@ -718,8 +712,8 @@ The `inputFormat` of `inputSpec` in `ioConfig` must be set 
to `"org.apache.orc.m
 
 |Field     | Type        | Description                                         
                                   | Required|
 
|----------|-------------|----------------------------------------------------------------------------------------|---------|
-|type      | String      | This should say `orc`                               
                                   | yes|
-|parseSpec | JSON Object | Specifies the timestamp and dimensions of the data 
(`timeAndDims` and `orc` format) and a `flattenSpec` (`orc` format) | yes|
+| type | String | Set value to `orc`. | yes |
+|parseSpec | JSON Object | Specifies the timestamp and dimensions of the data 
(`timeAndDims` and `orc` format) and a `flattenSpec` (`orc` format). | yes|
 
 The parser supports two `parseSpec` formats: `orc` and `timeAndDims`.
 
@@ -959,8 +953,8 @@ JSON path expressions for all supported types.
 
 |Field     | Type        | Description                                         
                                   | Required|
 
|----------|-------------|----------------------------------------------------------------------------------------|---------|
-| type      | String      | This should say `parquet`.| yes |
-| parseSpec | JSON Object | Specifies the timestamp and dimensions of the 
data, and optionally, a flatten spec. Valid parseSpec formats are `timeAndDims` 
and `parquet` | yes |
+| type      | String      | Set value to `parquet`. | yes |
+| parseSpec | JSON Object | Specifies the timestamp and dimensions of the 
data, and optionally, a flatten spec. Valid parseSpec formats are `timeAndDims` 
and `parquet`. | yes |
 | binaryAsString | Boolean | Specifies if the bytes parquet column which is 
not logically marked as a string or enum type should be treated as a UTF-8 
encoded string. | no(default = false) |
 
 When the time dimension is a [DateType 
column](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md),
@@ -1109,7 +1103,7 @@ Note that the `int96` Parquet value type is not supported 
with this parser.
 
 |Field     | Type        | Description                                         
                                   | Required|
 
|----------|-------------|----------------------------------------------------------------------------------------|---------|
-| type      | String      | This should say `parquet-avro`. | yes |
+| type      | String      | Set value to `parquet-avro`. | yes |
 | parseSpec | JSON Object | Specifies the timestamp and dimensions of the 
data, and optionally, a flatten spec. Should be `avro`. | yes |
 | binaryAsString | Boolean | Specifies if the bytes parquet column which is 
not logically marked as a string or enum type should be treated as a UTF-8 
encoded string. | no(default = false) |
 
@@ -1182,7 +1176,7 @@ This parser is for [stream 
ingestion](./index.md#streaming) and reads Avro data
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `avro_stream`. | no |
+| type | String | Set value to `avro_stream`. | no |
 | avroBytesDecoder | JSON Object | Specifies [`avroBytesDecoder`](#Avro Bytes 
Decoder) to decode bytes to Avro record. | yes |
 | parseSpec | JSON Object | Specifies the timestamp and dimensions of the 
data. Should be an "avro" parseSpec. | yes |
 
@@ -1222,9 +1216,9 @@ This parser is for [stream 
ingestion](./index.md#streaming) and reads Protocol b
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `protobuf`. | yes |
+| type | String | Set value to `protobuf`. | yes |
 | `protoBytesDecoder` | JSON Object | Specifies how to decode bytes to 
Protobuf record. | yes |
-| parseSpec | JSON Object | Specifies the timestamp and dimensions of the 
data.  The format must be JSON. See [JSON ParseSpec](#json-parsespec) for more 
configuration options.  Note that timeAndDims parseSpec is no longer supported. 
| yes |
+| parseSpec | JSON Object | Specifies the timestamp and dimensions of the 
data.  The format must be JSON. See [JSON ParseSpec](#json-parsespec) for more 
configuration options. Note that `timeAndDims` `parseSpec` is no longer 
supported. | yes |
 
 Sample spec:
 
@@ -1273,9 +1267,9 @@ This Protobuf bytes decoder first read a descriptor file, 
and then parse it to g
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `file`. | yes |
+| type | String | Set value to `file`. | yes |
 | descriptor | String | Protobuf descriptor file name in the classpath or URL. 
| yes |
-| protoMessageType | String | Protobuf message type in the descriptor.  Both 
short name and fully qualified name are accepted.  The parser uses the first 
message type found in the descriptor if not specified. | no |
+| protoMessageType | String | Protobuf message type in the descriptor.  Both 
short name and fully qualified name are accepted. The parser uses the first 
message type found in the descriptor if not specified. | no |
 
 Sample spec:
 
@@ -1294,10 +1288,10 @@ For details, see the Schema Registry 
[documentation](http://docs.confluent.io/cu
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| type | String | This should say `schema_registry`. | yes |
-| url | String | Specifies the url endpoint of the Schema Registry. | yes |
+| type | String | Set value to `schema_registry`. | yes |
+| url | String | Specifies the URL endpoint of the Schema Registry. | yes |
 | capacity | Integer | Specifies the max size of the cache (default = 
Integer.MAX_VALUE). | no |
-| urls | Array<String> | Specifies the url endpoints of the multiple Schema 
Registry instances. | yes(if `url` is not provided) |
+| urls | Array<String> | Specifies the URL endpoints of the multiple Schema 
Registry instances. | yes (if `url` is not provided) |
 | config | Json | To send additional configurations, configured for Schema 
Registry. This can be supplied via a 
[DynamicConfigProvider](../operations/dynamic-config-provider.md).  | no |
 | headers | Json | To send headers to the Schema Registry.  This can be 
supplied via a 
[DynamicConfigProvider](../operations/dynamic-config-provider.md) | no |
 
@@ -1366,7 +1360,7 @@ Use this with the String Parser to load JSON.
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| format | String | This should say `json`. | no |
+| format | String |`json`| no |
 | timestampSpec | JSON Object | Specifies the column and format of the 
timestamp. | yes |
 | dimensionsSpec | JSON Object | Specifies the dimensions of the data. | yes |
 | flattenSpec | JSON Object | Specifies flattening configuration for nested 
JSON data. See [`flattenSpec`](#flattenspec) for more info. | no |
@@ -1393,7 +1387,7 @@ This is a special variation of the JSON ParseSpec that 
lower cases all the colum
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| format | String | This should say `jsonLowercase`. | yes |
+| format | String | `jsonLowercase` | yes |
 | timestampSpec | JSON Object | Specifies the column and format of the 
timestamp. | yes |
 | dimensionsSpec | JSON Object | Specifies the dimensions of the data. | yes |
 
@@ -1403,7 +1397,7 @@ Use this with the String Parser to load CSV. Strings are 
parsed using the com.op
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| format | String | This should say `csv`. | yes |
+| format | String | `csv` | yes |
 | timestampSpec | JSON Object | Specifies the column and format of the 
timestamp. | yes |
 | dimensionsSpec | JSON Object | Specifies the dimensions of the data. | yes |
 | listDelimiter | String | A custom delimiter for multi-value dimensions. | no 
(default = ctrl+A) |
@@ -1448,7 +1442,7 @@ the delimiter is a tab, so this will load TSV.
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| format | String | This should say `tsv`. | yes |
+| format | String | `tsv` | yes |
 | timestampSpec | JSON Object | Specifies the column and format of the 
timestamp. | yes |
 | dimensionsSpec | JSON Object | Specifies the dimensions of the data. | yes |
 | delimiter | String | A custom delimiter for data values. | no (default = \t) 
|
@@ -1537,7 +1531,7 @@ handle all formatting decisions on their own, without 
using the ParseSpec.
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| format | String | This should say `timeAndDims`. | yes |
+| format | String | `timeAndDims` | yes |
 | timestampSpec | JSON Object | Specifies the column and format of the 
timestamp. | yes |
 | dimensionsSpec | JSON Object | Specifies the dimensions of the data. | yes |
 
@@ -1547,7 +1541,7 @@ Use this with the Hadoop ORC Parser to load ORC files.
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| format | String | This should say `orc`. | no |
+| format | String |  `orc`| no |
 | timestampSpec | JSON Object | Specifies the column and format of the 
timestamp. | yes |
 | dimensionsSpec | JSON Object | Specifies the dimensions of the data. | yes |
 | flattenSpec | JSON Object | Specifies flattening configuration for nested 
JSON data. See [`flattenSpec`](#flattenspec) for more info. | no |
@@ -1558,7 +1552,7 @@ Use this with the Hadoop Parquet Parser to load Parquet 
files.
 
 | Field | Type | Description | Required |
 |-------|------|-------------|----------|
-| format | String | This should say `parquet`. | no |
+| format | String |  `parquet`| no |
 | timestampSpec | JSON Object | Specifies the column and format of the 
timestamp. | yes |
 | dimensionsSpec | JSON Object | Specifies the dimensions of the data. | yes |
 | flattenSpec | JSON Object | Specifies flattening configuration for nested 
JSON data. See [`flattenSpec`](#flattenspec) for more info. | no |

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch master updated: edits to kafka inputFormat (#11796)

Reply via email to