[ https://issues.apache.org/jira/browse/IMPALA-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Rodoni updated IMPALA-5740: -------------------------------- Issue Type: Bug (was: Documentation) > Clarify STRING size limit > ------------------------- > > Key: IMPALA-5740 > URL: https://issues.apache.org/jira/browse/IMPALA-5740 > Project: IMPALA > Issue Type: Bug > Components: Docs > Affects Versions: Impala 2.10.0 > Reporter: Tim Armstrong > Assignee: Alex Rodoni > Priority: Critical > Fix For: Impala 2.13.0, Impala 3.1.0 > > > The Impala docs currently state that strings have a maximum size of 32kb. > http://impala.apache.org/docs/build/html/topics/impala_string.html > This is misleading and causes confusion. We should clarify this in a way that > makes it clear that 32kb is not the actual limit, but also makes it clear > that performance and memory consumption will degrade with larger strings. > Here's what the behaviour is. Unsure the best way to succinctly characterise > it. > * We expect that queries operating on 32KB strings will work reliably and not > hit significant performance or memory problems (unless you have very complex > queries, very many columns, etc). > * There is an absolute hard limit of ~ 2GB on strings. > * Memory consumption of queries will grow as string sizes increase and the > probability of hitting "memory limit exceeded" will increase. > * Performance of queries will decrease as strings get larger. > * The row size (i.e. total size of all string and other columns) is limited > in various places by the implementation of various operators. > ** Rows coming from the right side of any hash join > ** Rows coming from either side of a spilling hash join > ** Rows being sorted > ** Rows in a grouping aggregation. > In Impala <= 2.9 the default limit in those places is 8MB. > With the IMPALA-3200 changes that default number may decrease significantly, > to 2MB or less, but there will be a new query option, something like > MAX_ROW_SIZE, that will make this row size limit configurable on a per-query > basis. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org