[jira] [Updated] (DRILL-6965) Adjust table function usage for all storage plugins and implement schema parameter

Arina Ielchiieva (JIRA) Wed, 01 May 2019 07:42:28 -0700


     [ 
https://issues.apache.org/jira/browse/DRILL-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arina Ielchiieva updated DRILL-6965:
------------------------------------
    Description: 
Schema can be used while reading the table into two ways:
 a. schema is created in the table root folder using CREATE SCHEMA command and 
schema usage command is enabled;
 b. schema indicated in table function.
 This Jira implements point b.

Schema indication using table function is useful when user does not want to 
persist schema in table root location or when reading from file, not folder.

Schema parameter can be used as individual unit or in together with for format 
plugin table properties.

Usage examples:

Pre-requisites: 
 V3 reader must be enabled: {{set `exec.storage.enable_v3_text_reader` = true;}}

Query examples:

1. There is folder with files or just one file (ex: dfs.tmp.text_table) and 
user wants to apply schema to them:
 a. indicate schema inline:
{noformat}
select * from table(dfs.tmp.`text_table`(
schema => 'inline=(col1 date properties {`drill.format` = `yyyy-MM-dd`}) 
properties {`drill.strict` = `false`}'))
{noformat}
To indicate only table properties use the following syntax:
{noformat}
select * from table(dfs.tmp.`text_table`(
schema => 'inline=() 
properties {`drill.strict` = `false`}'))
{noformat}
b. indicate schema using path:
 First schema was created in some location using CREATE SCHEMA command. For 
example:
{noformat}
create schema 
(col int)
path '/tmp/my_schema'
{noformat}
Now user wants to apply this schema in table function:
{noformat}
select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))
{noformat}
2. User wants to apply schema along side with format plugin table function 
parameters.
 Assuming that user has CSV file with headers with extension that does not 
comply to default text file with headers extension (ex: cars.csvh-test):
{noformat}
select * from table(dfs.tmp.`cars.csvh-test`(type => 'text', 
fieldDelimiter => ',', extractHeader => true,
schema => 'inline=(col1 date)'))
{noformat}
More details about syntax can be found in design document:
 
[https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit]

  was:
Schema can be used while reading the table into two ways:
 a. schema is created in the table root folder using CREATE SCHEMA command and 
schema usage command is enabled;
 b. schema indicated in table function.
 This Jira implements point b.

Schema indication using table function is useful when user does not want to 
persist schema in table root location or when reading from file, not folder.

Schema parameter can be used as individual unit or in together with for format 
plugin table properties.

Usage examples:

Pre-requisites: 
 V3 reader must be enabled: {{set `exec.storage.enable_v3_text_reader` = true;}}

Query examples:

1. There is folder with files or just one file (ex: dfs.tmp.text_table) and 
user wants to apply schema to them:
 a. indicate schema inline:
{noformat}
select * from table(dfs.tmp.`text_table`(
schema => 'inline=(col1 date properties {`drill.format` = `yyyy-MM-dd`}) 
properties {`drill.strict` = `false`}'))
{noformat}
b. indicate schema using path:
 First schema was created in some location using CREATE SCHEMA command. For 
example:
{noformat}
create schema 
(col int)
path '/tmp/my_schema'
{noformat}
Now user wants to apply this schema in table function:
{noformat}
select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))
{noformat}
2. User wants to apply schema along side with format plugin table function 
parameters.
 Assuming that user has CSV file with headers with extension that does not 
comply to default text file with headers extension (ex: cars.csvh-test):
{noformat}
select * from table(dfs.tmp.`cars.csvh-test`(type => 'text', 
fieldDelimiter => ',', extractHeader => true,
schema => 'inline=(col1 date)'))
{noformat}
More details about syntax can be found in design document:
 
[https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit]


> Adjust table function usage for all storage plugins and implement schema 
> parameter
> ----------------------------------------------------------------------------------
>
>                 Key: DRILL-6965
>                 URL: https://issues.apache.org/jira/browse/DRILL-6965
>             Project: Apache Drill
>          Issue Type: Sub-task
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.17.0
>
>
> Schema can be used while reading the table into two ways:
>  a. schema is created in the table root folder using CREATE SCHEMA command 
> and schema usage command is enabled;
>  b. schema indicated in table function.
>  This Jira implements point b.
> Schema indication using table function is useful when user does not want to 
> persist schema in table root location or when reading from file, not folder.
> Schema parameter can be used as individual unit or in together with for 
> format plugin table properties.
> Usage examples:
> Pre-requisites: 
>  V3 reader must be enabled: {{set `exec.storage.enable_v3_text_reader` = 
> true;}}
> Query examples:
> 1. There is folder with files or just one file (ex: dfs.tmp.text_table) and 
> user wants to apply schema to them:
>  a. indicate schema inline:
> {noformat}
> select * from table(dfs.tmp.`text_table`(
> schema => 'inline=(col1 date properties {`drill.format` = `yyyy-MM-dd`}) 
> properties {`drill.strict` = `false`}'))
> {noformat}
> To indicate only table properties use the following syntax:
> {noformat}
> select * from table(dfs.tmp.`text_table`(
> schema => 'inline=() 
> properties {`drill.strict` = `false`}'))
> {noformat}
> b. indicate schema using path:
>  First schema was created in some location using CREATE SCHEMA command. For 
> example:
> {noformat}
> create schema 
> (col int)
> path '/tmp/my_schema'
> {noformat}
> Now user wants to apply this schema in table function:
> {noformat}
> select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))
> {noformat}
> 2. User wants to apply schema along side with format plugin table function 
> parameters.
>  Assuming that user has CSV file with headers with extension that does not 
> comply to default text file with headers extension (ex: cars.csvh-test):
> {noformat}
> select * from table(dfs.tmp.`cars.csvh-test`(type => 'text', 
> fieldDelimiter => ',', extractHeader => true,
> schema => 'inline=(col1 date)'))
> {noformat}
> More details about syntax can be found in design document:
>  
> [https://docs.google.com/document/d/1mp4egSbNs8jFYRbPVbm_l0Y5GjH3HnoqCmOpMTR_g4w/edit]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6965) Adjust table function usage for all storage plugins and implement schema parameter

Reply via email to