Holden Karau created SPARK-48362: ------------------------------------ Summary: Add CollectSetWIthLimit Key: SPARK-48362 URL: https://issues.apache.org/jira/browse/SPARK-48362 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Holden Karau
See [https://stackoverflow.com/questions/38730912/how-to-limit-functions-collect-set-in-spark-sql] Some users want to collect a set but if the number of distinct elements is too large they may get a Cannot grow BufferHolder error from trying to collect the set then trim it. We should offer a collect set which pre-emptively does not add more elements than needed to reduce the amount of memory used. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org