[ https://issues.apache.org/jira/browse/DRILL-7020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756537#comment-16756537 ]
Paul Rogers commented on DRILL-7020: ------------------------------------ The size limitation is hard-coded into the "complaint" text reader, as you noted. I'm not sure the limit is necessary. Drill uses a 4-byte offset vector to track VARCHAR values within a VARCHAR vector. Might be as easy as removing the size check. > big varchar doesn't work with extractHeader=true > ------------------------------------------------ > > Key: DRILL-7020 > URL: https://issues.apache.org/jira/browse/DRILL-7020 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV > Affects Versions: 1.15.0 > Reporter: benj > Priority: Major > > with a TEST file of csv type like > {code:java} > col1,col2 > w,x > ...y...,z > {code} > where ...y... is > 65536 characters string (let say 66000 for example) > SELECT with +*extractHeader=false*+ are OK > {code:java} > SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',', > extractHeader => false)); > col1 | col2 > +---------+------ > | w | x > | ...y... | z > {code} > But SELECT with +*extractHeader=true*+ gives an error > {code:java} > SELECT * FROM TABLE(tmp.`TEST`(type => 'text', fieldDelimiter => ',', > extractHeader => true)); > Error: UNSUPPORTED_OPERATION ERROR: Trying to write something big in a column > columnIndex 1 > Limit 65536 > Fragment 0:0 > {code} > Note that is possible to use extractHeader=false with skipFirstLine=true but > in this case it's not possible to automatically get columns names. -- This message was sent by Atlassian JIRA (v7.6.3#76005)