[ https://issues.apache.org/jira/browse/DRILL-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vitalii Diravka resolved DRILL-8204. ------------------------------------ Resolution: Fixed > Allow Provided Schema for HTTP Plugin in JSON Mode > -------------------------------------------------- > > Key: DRILL-8204 > URL: https://issues.apache.org/jira/browse/DRILL-8204 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other > Affects Versions: 1.20.0 > Reporter: Charles Givre > Assignee: Charles Givre > Priority: Major > Fix For: 2.0.0 > > > One of the challenges of querying APIs is inconsistent data. Drill allows you > to provide a schema for individual endpoints. You can do this in one of two > ways: either by > providing a serialized TupleMetadata of the desired schema. This is an > advanced functionality and should only be used by advanced Drill users. > The schema provisioning currently supports complex types of Arrays and Maps > at any nesting level. > ### Example Schema Provisioning: > ```json > "jsonOptions": { > "providedSchema": [ > { > "fieldName": "int_field", > "fieldType": "bigint" > }, { > "fieldName": "jsonField", > "fieldType": "varchar", > "properties": { > "drill.json-mode":"json" > } > },{ > // Array field > "fieldName": "stringField", > "fieldType": "varchar", > "isArray": true > }, { > // Map field > "fieldName": "mapField", > "fieldType": "map", > "fields": [ > { > "fieldName": "nestedField", > "fieldType": "int" > },{ > "fieldName": "nestedField2", > "fieldType": "varchar" > } > ] > } > ] > } > ``` > ### Example Provisioning the Schema with a JSON String > ```json > "jsonOptions": { > "jsonSchema": > "\{\"type\":\"tuple_schema\",\"columns\":[{\"name\":\"outer_map\",\"type\":\"STRUCT<`int_field` > BIGINT, `int_array` ARRAY<BIGINT>>\",\"mode\":\"REQUIRED\"}]}" > } > ``` > You can print out a JSON string of a schema with the Java code below. > ```java > TupleMetadata schema = new SchemaBuilder() > .addNullable("a", MinorType.BIGINT) > .addNullable("m", MinorType.VARCHAR) > .build(); > ColumnMetadata m = schema.metadata("m"); > m.setProperty(JsonLoader.JSON_MODE, JsonLoader.JSON_LITERAL_MODE); > System.out.println(schema.jsonString()); > ``` > This will generate something like the JSON string below: > ```json > { > "type":"tuple_schema", > "columns":[ > {"name":"a","type":"BIGINT","mode":"OPTIONAL"}, > {"name":"m","type":"VARCHAR","mode":"OPTIONAL","properties":\{"drill.json-mode":"json"} > } > ] > } > ``` > ## Dealing With Inconsistent Schemas > One of the major challenges of interacting with JSON data is when the schema > is inconsistent. Drill has a `UNION` data type which is marked as > experimental. At the time of > writing, the HTTP plugin does not support the `UNION`, however supplying a > schema can solve a lot of those issues. > ### Json Mode > Drill offers the option of reading all JSON values as a string. While this > can complicate downstream analytics, it can also be a more memory-efficient > way of reading data with > inconsistent schema. Unfortunately, at the time of writing, JSON-mode is only > available with a provided schema. However, future work will allow this mode > to be enabled for > any JSON data. > #### Enabling JSON Mode: > You can enable JSON mode simply by adding the `drill.json-mode` property with > a value of `json` to a field, as shown below: > ```json > { > "fieldName": "jsonField", > "fieldType": "varchar", > "properties": { > "drill.json-mode": "json" > } > } > ``` -- This message was sent by Atlassian Jira (v8.20.7#820007)