[
https://issues.apache.org/jira/browse/DRILL-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729312#comment-17729312
]
Diksha Chaturvedi edited comment on DRILL-8439 at 6/5/23 1:20 PM:
------------------------------------------------------------------
Hi [~cgivre],
1) Thanks for the hint. I've found the invisible unicode character in the CSV
file which is **
*!image-2023-06-05-18-05-25-417.png!*
As per findings this character is added when the file is saved as UTF8 with
BOM. FYI this CSV file is created using the Apache Metamodel using [this data
context|https://metamodel.apache.org/apidocs/4.4.0/org/apache/metamodel/DataContextFactory.html#createCsvDataContext(java.io.File,%20char,%20char)]
in which the default encoding(UTF-8) is used. Any idea how can we handle this?
2) I tried using SELECT `col_PRODUCTID` as product_id ...
but drill does not return any data; I can see the rows but not the content.
!image-2023-06-05-18-16-47-293.png|width=215,height=173!
was (Author: JIRAUSER300122):
Hi [~cgivre],
1) Thanks for the hint. I've found the invisible unicode character in the CSV
file which is **
*!image-2023-06-05-18-05-25-417.png!*
As per findings this character is added when the file is saved as UTF8 with
BOM. FYI this CSV file is created using the Apache Metamodel using [this data
context|https://metamodel.apache.org/apidocs/4.4.0/org/apache/metamodel/DataContextFactory.html#createCsvDataContext(java.io.File,%20char,%20char)]
in which the default encoding(UTF-8) is used. Any idea how can we fix this?
2) I tried using SELECT `col_PRODUCTID` as product_id ...
but drill does not return any data; I can see the rows but not the content.
!image-2023-06-05-18-16-47-293.png|width=215,height=173!
> Getting col__ prefix for columns that are not special when extractHeader is
> enabled
> -----------------------------------------------------------------------------------
>
> Key: DRILL-8439
> URL: https://issues.apache.org/jira/browse/DRILL-8439
> Project: Apache Drill
> Issue Type: Bug
> Components: Metadata, SQL Parser
> Affects Versions: 1.21.0
> Environment: Enabled {{extractHeader}} in the csv config of dfs
> plugin.
> No. of drillbits: Single
> OS: Windows
> Reporter: Diksha Chaturvedi
> Priority: Major
> Labels: drill, extractHeader
> Attachments: image-2023-06-05-18-05-25-417.png,
> image-2023-06-05-18-16-47-293.png
>
>
> As per documentation, Drill appends col_ to the columns that start with a
> number or special characters.
> {code:java}
> /**
> * Prefix used to replace non-alphabetic characters at the start of
> * a column name. For example, $foo becomes col_foo. Used
> * because SQL does not allow _foo.
> */
> public static final String COLUMN_PREFIX = "col_";
> {code}
> But in my case I'm getting it even for all alphabetical column name.
> ----
> I have the following data in the CSV file,
> ||PRODUCTID||PRODUCTNAME||SUPPLIERID||CATEGORYID||UNIT||PRICE||
> |1|Chais|1|1|10 boxes x 20 bags|18|
> |2|Chang|1|1|24 - 12 oz bottles|19|
> |3|Aniseed Syrup|1|2|12 - 550 ml bottles|10|
> |4|Chef Anton's Cajun Seasoning|2|2|48 - 6 oz jars|22|
> |5|Chef Anton's Gumbo Mix|2|2|36 boxes|21.35|
>
> While querying on the csv file using following query:
> {code:sql}
> SELECT * FROM dfs.`/var/lib/PRODUCT.csv`{code}
> The output is
> [!https://i.stack.imgur.com/FBNmn.png|width=611,height=130!|https://i.stack.imgur.com/FBNmn.png]
> ----
> I know about other criterias like
> {{#UNITS}} is changed to {{col_UNITS}}
> {{FINANCIAL$RECORD}} is changed to {{FINANCIAL_RECORD}}
> But what's with {{{}PRODUCTID{}}}; Why is it changed to
> {{col___PRODUCTID__}}? In this case it has appended extra underscores also.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)