[ https://issues.apache.org/jira/browse/DRILL-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730083#comment-17730083 ]
Diksha Chaturvedi commented on DRILL-8439: ------------------------------------------ I've noticed different outcomes based on the bom-character placement in the csv file. *Case 1:* {code:java} "PRODUCTID"^"PRODUCTNAME"^"SUPPLIERID"^"CATEGORYID"^"UNIT"^"PRICE"{code} In this case the output is col_PRODUCTID (working as expected) *Case 2:* {code:java} "1"^"Chais"^"1"^"1"^"10 boxes x 20 bags"^"18"{code} !bomInColDataInBeginning.PNG|width=273,height=243! *Case 3:* {code:java} "1"^"Chais"^"1"^"1"^"10 boxes x 20 bags"^"18"{code} !bomInColData.PNG|width=271,height=244! *Case 4:* {code:java} "1"^"Chais"^"1"^"1"^"10 boxes x 20 bags"^"18"{code} !bomInEnd.PNG|width=157,height=211! *Case 5:* {code:java} "1"^"Chais"^"1"^"1"^"10 boxes x 20 bags"^"18"{code} !bomInMiddle.PNG|width=199,height=217! Please enlighten. > Getting col__ prefix for columns that are not special when extractHeader is > enabled > ----------------------------------------------------------------------------------- > > Key: DRILL-8439 > URL: https://issues.apache.org/jira/browse/DRILL-8439 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, SQL Parser > Affects Versions: 1.21.0 > Environment: Enabled {{extractHeader}} in the csv config of dfs > plugin. > No. of drillbits: Single > OS: Windows > Reporter: Diksha Chaturvedi > Priority: Major > Labels: drill, extractHeader > Attachments: bomInColData.PNG, bomInColDataInBeginning.PNG, > bomInEnd.PNG, bomInMiddle.PNG, bomInsideColumnName-1.PNG, > bomInsideColumnName.PNG, image-2023-06-05-18-05-25-417.png, > image-2023-06-05-18-16-47-293.png > > > As per documentation, Drill appends col_ to the columns that start with a > number or special characters. > {code:java} > /** > * Prefix used to replace non-alphabetic characters at the start of > * a column name. For example, $foo becomes col_foo. Used > * because SQL does not allow _foo. > */ > public static final String COLUMN_PREFIX = "col_"; > {code} > But in my case I'm getting it even for all alphabetical column name. > ---- > I have the following data in the CSV file, > ||PRODUCTID||PRODUCTNAME||SUPPLIERID||CATEGORYID||UNIT||PRICE|| > |1|Chais|1|1|10 boxes x 20 bags|18| > |2|Chang|1|1|24 - 12 oz bottles|19| > |3|Aniseed Syrup|1|2|12 - 550 ml bottles|10| > |4|Chef Anton's Cajun Seasoning|2|2|48 - 6 oz jars|22| > |5|Chef Anton's Gumbo Mix|2|2|36 boxes|21.35| > > While querying on the csv file using following query: > {code:sql} > SELECT * FROM dfs.`/var/lib/PRODUCT.csv`{code} > The output is > [!https://i.stack.imgur.com/FBNmn.png|width=611,height=130!|https://i.stack.imgur.com/FBNmn.png] > ---- > I know about other criterias like > {{#UNITS}} is changed to {{col_UNITS}} > {{FINANCIAL$RECORD}} is changed to {{FINANCIAL_RECORD}} > But what's with {{{}PRODUCTID{}}}; Why is it changed to > {{col___PRODUCTID__}}? In this case it has appended extra underscores also. -- This message was sent by Atlassian Jira (v8.20.10#820010)