[jira] [Commented] (DRILL-8439) Getting col__ prefix for columns that are not special when extractHeader is enabled

Diksha Chaturvedi (Jira) Wed, 07 Jun 2023 04:48:11 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730083#comment-17730083
 ]


Diksha Chaturvedi commented on DRILL-8439:
------------------------------------------

I've noticed different outcomes based on the bom-character placement in the csv 
file.

 

*Case 1:*
{code:java}
"ï»¿PRODUCTID"^"PRODUCTNAME"^"SUPPLIERID"^"CATEGORYID"^"UNIT"^"PRICE"{code}
In this case the output is col_PRODUCTID (working as expected)

 

*Case 2:*
{code:java}
ï»¿"1"^"Chais"^"1"^"1"^"10 boxes x 20 bags"^"18"{code}
!bomInColDataInBeginning.PNG|width=273,height=243!

 

*Case 3:*
{code:java}
"ï»¿1"^"Chais"^"1"^"1"^"10 boxes x 20 bags"^"18"{code}
!bomInColData.PNG|width=271,height=244!

 

*Case 4:*
{code:java}
"1"^"Chais"^"1"^"1"^"10 boxes x 20 bags"^"18"ï»¿{code}
!bomInEnd.PNG|width=157,height=211!

 

*Case 5:*
{code:java}
"1"^"Chais"^"1"^"1"^ï»¿"10 boxes x 20 bags"^"18"{code}
!bomInMiddle.PNG|width=199,height=217!

Please enlighten.

 

> Getting col__ prefix for columns that are not special when extractHeader is 
> enabled
> -----------------------------------------------------------------------------------
>
>                 Key: DRILL-8439
>                 URL: https://issues.apache.org/jira/browse/DRILL-8439
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata, SQL Parser
>    Affects Versions: 1.21.0
>         Environment: Enabled {{extractHeader}} in the csv config of dfs 
> plugin.
> No. of drillbits: Single
> OS: Windows
>            Reporter: Diksha Chaturvedi
>            Priority: Major
>              Labels: drill, extractHeader
>         Attachments: bomInColData.PNG, bomInColDataInBeginning.PNG, 
> bomInEnd.PNG, bomInMiddle.PNG, bomInsideColumnName-1.PNG, 
> bomInsideColumnName.PNG, image-2023-06-05-18-05-25-417.png, 
> image-2023-06-05-18-16-47-293.png
>
>
> As per documentation, Drill appends col_ to the columns that start with a 
> number or special characters.
> {code:java}
> /**
>  * Prefix used to replace non-alphabetic characters at the start of
>  * a column name. For example, $foo becomes col_foo. Used
>  * because SQL does not allow _foo.
>  */
> public static final String COLUMN_PREFIX = "col_";
> {code}
> But in my case I'm getting it even for all alphabetical column name.
> ----
> I have the following data in the CSV file,
> ||PRODUCTID||PRODUCTNAME||SUPPLIERID||CATEGORYID||UNIT||PRICE||
> |1|Chais|1|1|10 boxes x 20 bags|18|
> |2|Chang|1|1|24 - 12 oz bottles|19|
> |3|Aniseed Syrup|1|2|12 - 550 ml bottles|10|
> |4|Chef Anton's Cajun Seasoning|2|2|48 - 6 oz jars|22|
> |5|Chef Anton's Gumbo Mix|2|2|36 boxes|21.35|
>  
> While querying on the csv file using following query:
> {code:sql}
> SELECT * FROM dfs.`/var/lib/PRODUCT.csv`{code}
> The output is 
> [!https://i.stack.imgur.com/FBNmn.png|width=611,height=130!|https://i.stack.imgur.com/FBNmn.png]
> ----
> I know about other criterias like
> {{#UNITS}} is changed to {{col_UNITS}}
> {{FINANCIAL$RECORD}} is changed to {{FINANCIAL_RECORD}}
> But what's with {{{}PRODUCTID{}}}; Why is it changed to 
> {{col___PRODUCTID__}}? In this case it has appended extra underscores also. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (DRILL-8439) Getting col__ prefix for columns that are not special when extractHeader is enabled

Reply via email to