[ 
https://issues.apache.org/jira/browse/DRILL-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038357#comment-17038357
 ] 

benj commented on DRILL-6096:
-----------------------------

Just trying to use this new functionality. Some points (tested in 1.17 and last 
1.18 @ 2020-02-17) :
 * Should at least add _"write_text"_ in description of allowed values for 
option _store.format_
 * Why _write_text_ doesn't appears in default storage configuration ?
 * Try to create write_text or equivalent in storage configuration but use of 
_"fieldDelimiter"_ produce _"Please retry: Error (invalid JSON mapping)"_ - 
need a new ticket ?

> Provide mechanisms to specify field delimiters and quoted text for 
> TextRecordWriter
> -----------------------------------------------------------------------------------
>
>                 Key: DRILL-6096
>                 URL: https://issues.apache.org/jira/browse/DRILL-6096
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 1.12.0
>            Reporter: Kunal Khatua
>            Assignee: Arina Ielchiieva
>            Priority: Major
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.17.0
>
>
> Currently, there is no way for a user to specify theĀ field delimiter for the 
> writing records as a text output. Further more, if the fields contain the 
> delimiter, we have no mechanism of specifying quotes.
> By default, quotes should be used to enclose non-numeric fields being written.
> *Description of the implemented changes:*
> 2 options are added to control text writer output:
> {{store.text.writer.add_header}} - indicates if header should be added in 
> created text file. Default is true.
> {{store.text.writer.force_quotes}} - indicates if all value should be quoted. 
> Default is false. It means only values that contain special characters (line 
> / field separators) will be quoted.
> Line / field separators, quote / escape characters can be configured using 
> text format configuration using Web UI. User can create special format only 
> for writing data and then use it when creating files. Though such format can 
> be always used to read back written data.
> {noformat}
>   "formats": {
>     "write_text": {
>       "type": "text",
>       "extensions": [
>         "txt"
>       ],
>       "lineDelimiter": "\n",
>       "fieldDelimiter": "!",
>       "quote": "^",
>       "escape": "^",
>     }
>    },
> ...
> {noformat}
> Next set specified format and create text file:
> {noformat}
> alter session set `store.format` = 'write_text';
> create table dfs.tmp.t as select 1 as id from (values(1));
> {noformat}
> Notes:
> 1. To write data univocity-parsers are used, they limit line separator length 
> to not more than 2 characters, though Drill allows setting more 2 chars as 
> line separator since Drill can read data splitting by line separator of any 
> length, during data write exception will be thrown.
> 2. {{extractHeader}} in text format configuration does not affect if header 
> will be written to text file, only {{store.text.writer.add_header}} controls 
> this action. {{extractHeader}} is used only when reading the data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to