[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

vesense Tue, 15 Nov 2016 00:36:13 -0800

Github user vesense commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1777#discussion_r87967670
  
    --- Diff: docs/storm-sql-reference.md ---
    @@ -1203,4 +1203,103 @@ and class for aggregate function is here:
     For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
     but this behavior is subject to change so providing `result` is 
recommended. 
     
    -Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
    \ No newline at end of file
    +Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
    +
    +## External Data Sources
    +
    +### Specifying External Data Sources
    +
    +In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
    +
    +```
    +CREATE EXTERNAL TABLE table_name field_list
    +    [ STORED AS
    +      INPUTFORMAT input_format_classname
    +      OUTPUTFORMAT output_format_classname
    +    ]
    +    LOCATION location
    +    [ TBLPROPERTIES tbl_properties ]
    +    [ AS select_stmt ]
    +```
    +
    +Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
    +
    +For example, the following statement specifies a Kafka spout and sink:
    +
    +```
    +CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
    +```
    +
    +### Plugging in External Data Sources
    +
    +Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
    +
    +### Supported Formats
    +
    +| Format          | Input format class | Output format class | Requires 
properties
    +|:--------------- |:------------------ |:------------------- 
|:--------------------
    +| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
    +| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
    +| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
    +| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
    +
    +#### Avro
    +
    +Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
    +Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
    +Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
    +
    +Example Schema description:
    +
    +`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
    +
    +`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
    +
    +#### CSV
    +
    +It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
    --- End diff --
    
    Minor. How about add a link to RFC4180? It is convenient for users who want 
to look.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

Reply via email to