(Reposting an answer sent earlier for those whose e-mail filters discard 
Github-related e-mails…)

Drill is an analytic engine optimized for numbers and short strings. At 
present, Drill’s practical limit on string (i.e. Varchar) length is 256 
characters or less.

Drill is vectorized. Drill tries to create “batches” of data with up to 64K 
records. When individual columns are wider than 256 (on average) our vectors 
grow larger than 16 MB in size and we run into memory issues.

If VarChar columns are 64K in size (the current maximum), we hit the vector 
limit with only 256 records. Unfortunately, at present, our readers and 
operators don’t know how to limit their batch sizes to such a low number 
(though we are actively working on a fix.)

Thanks,

- Paul
> On Aug 29, 2017, at 6:27 PM, César Tenganán <[email protected]> wrote:
> 
> Hi,
> 
> 
> We are working with a dataset with columns of large Strings on it, and we
> are having the error "UNSUPPORTED_OPERATION ERROR: Trying to write
> something big in a column columnIndex 0 Limit 65536 Failure while reading
> file ..."
> 
> 
> There is a validation on FieldVarCharOutput.java
> <https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/compliant/FieldVarCharOutput.java>
> using
> MAX_FIELD_LENGTH
> 
> Why this validation use this specific value MAX_FIELD_LENGTH = 1024 * 64?
> Is possible to handle large Strings like as used to represent wkt
> Geometries?
> 
> Regards!
> 
> -- 
> Julio César Tenganán Daza
> Ingeniero En Sistemas
> Universidad Del Valle

Reply via email to