[
https://issues.apache.org/jira/browse/DRILL-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781003#comment-17781003
]
ASF GitHub Bot commented on DRILL-8457:
---------------------------------------
ztomanek-dw commented on PR #2840:
URL: https://github.com/apache/drill/pull/2840#issuecomment-1785165480
@cgivre
Thanks for your feedback!
According to your comments:
- written unit tests for `HttpCSVOptions`
- written unit tests for `HttpApiConfig`, by the way fixing small bug on
`HttpMethod` validation
- added tsv parsing test to `TestHttpPlugin`
- documented `csvOptions` configuration in `CSV_Options.md`
Let me know if you see anything else to cover :)
> Allow configuring csv parser in http storage plugin configuration
> -----------------------------------------------------------------
>
> Key: DRILL-8457
> URL: https://issues.apache.org/jira/browse/DRILL-8457
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - HTTP
> Affects Versions: Future
> Reporter: Zbigniew Tomanek
> Priority: Minor
> Fix For: Future
>
>
> Currently there is no way to configure csv parser when http plugin is used.
> Because of that some kind of files cannot be parsed (e.g. when any column has
> more than 4096 chars or file has a delimiter different from `,`).
> Since in DataWalk we utilize http plugin quite often we've changed our
> internal fork of Drill so following parser/format properties can be
> configured using additional `csvOptions` field:
>
> {code:json}
> {
> "csvOptions": {
> "delimiter": "\t",
> "quote": "\"",
> "quote_escape": "\"",
> "line_separator": "\n",
> "header_extraction_enabled": null,
> "number_of_rows_to_skip": 0,
> "number_of_records_to_read": -1,
> "line_separator_detection_enabled": true,
> "max_columns": 512,
> "max_chars_per_column": 4096,
> "skip_empty_lines": true,
> "ignore_leading_whitespaces": true,
> "ignore_trailing_whitespaces": true,
> "null_value": null
> }
> }{code}
> I'd be glad to get feedback whether creating PR with these changes would
> bring any value to the Drill
--
This message was sent by Atlassian Jira
(v8.20.10#820010)