Csaba Ringhofer created IMPALA-13052: ----------------------------------------
Summary: Sampling aggregate result sizes are underestimated Key: IMPALA-13052 URL: https://issues.apache.org/jira/browse/IMPALA-13052 Project: IMPALA Issue Type: Bug Reporter: Csaba Ringhofer Sampling aggregates (sample, appx_median, histogram) return a string that can be quite large, but the planner assumes it to have a fixed small size. Examples: select sample(l_orderkey) from tpch.lineitem; according to plan: row-size=12B in reality: TotalBytesSent: 254.45 KB (this is single row sent by a host) select appx_median(l_orderkey) from tpch.lineitem; according to plan: row-size= 8B in reality: TotalBytesSent: 254.68 KB (this is single row sent by a host) select histogram(l_orderkey) from tpch.lineitem; according to plan: row-size=12B in reality: TotalBytesSent: 254.35 KB (this is single row sent by a host) This may be also relevant for datasketches functions. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org