Zbigniew Tomanek created DRILL-8457:
---------------------------------------

             Summary: Allow configuring csv parser in http storage plugin 
configuration
                 Key: DRILL-8457
                 URL: https://issues.apache.org/jira/browse/DRILL-8457
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - HTTP
    Affects Versions: Future
            Reporter: Zbigniew Tomanek
             Fix For: Future


Currently there is no way to configure csv parser when http plugin is used. 
Because of that some kind of files cannot be parsed (e.g. when any column has 
more than 4096 chars or file has a delimiter different from `,`).

Since in DataWalk we utilize http plugin quite often we've changed our internal 
fork of Drill so following parser/format properties can be configured using 
additional `csvOptions` field:

 ```json
{
 "csvOptions": {
          "delimiter": "\t",
          "quote": "\"",
          "quote_escape": "\"",
          "line_separator": "\n",
          "header_extraction_enabled": null,
          "number_of_rows_to_skip": 0,
          "number_of_records_to_read": -1,
          "line_separator_detection_enabled": true,
          "max_columns": 512,
          "max_chars_per_column": 4096,
          "skip_empty_lines": true,
          "ignore_leading_whitespaces": true,
          "ignore_trailing_whitespaces": true,
          "null_value": null
        }
}
```

I'd be glad to get feedback whether creating PR with these changes would bring 
any value to the Drill



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to