[ https://issues.apache.org/jira/browse/SPARK-44239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748396#comment-17748396 ]
Snoot.io commented on SPARK-44239: ---------------------------------- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/41782 > Free memory allocated by large vectors when vectors are reset > ------------------------------------------------------------- > > Key: SPARK-44239 > URL: https://issues.apache.org/jira/browse/SPARK-44239 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.5.0 > Reporter: Wan Kun > Priority: Major > Attachments: image-2023-06-29-12-58-12-256.png, > image-2023-06-29-13-03-15-470.png > > > When spark reads a data file into a WritableColumnVector, the memory > allocated by the WritableColumnVectors is not freed until the > VectorizedColumnReader completes. > It will save memory allocation time by reusing the allocated array objects. > But it also takes up too many unused memory after the current large vector > batch has been read. > Add a memory reserve policy for this scenario which will reuse the allocated > array object for small column vectors and free the memory for huge column > vectors. > !image-2023-06-29-12-58-12-256.png!!image-2023-06-29-13-03-15-470.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org