[ 
https://issues.apache.org/jira/browse/ARROW-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-14790.
----------------------------------
    Fix Version/s: 9.0.0
       Resolution: Fixed

Issue resolved by pull request 13228
[https://github.com/apache/arrow/pull/13228]

> [GLib] Memory leak on creating GArrowData
> -----------------------------------------
>
>                 Key: ARROW-14790
>                 URL: https://issues.apache.org/jira/browse/ARROW-14790
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python, Ruby
>            Reporter: Sten Larsson
>            Assignee: Kouhei Sutou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 9.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> We're having problem with a memory leak in a Ruby script that processes many 
> CSV files. I have written some short scripts do demonstrate the problem: 
> [https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214]
> The first script, 
> [arrow_test_csv.rb|https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_test_csv-rb],
>  creates a 184 MB CSV file for testing.
> The second script, 
> [arrow_memory_leak.rb|https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_memory_leak-rb],
>  then loads the CSV file 10 times using Arrow. It uses the 
> [get_process_mem|https://rubygems.org/gems/get_process_mem] gem to print the 
> memory usage both before and after each iteration. It also invokes the 
> garbage collector on each iteration to ensure the problem is not that Ruby 
> holds on to any objects. This is what it prints on my MacBook Pro using Arrow 
> 6.0.0:
> {noformat}
> 127577 objects, 34.234375 MB
> 127577 objects, 347.625 MB
> 127577 objects, 438.7890625 MB
> 127577 objects, 457.6953125 MB
> 127577 objects, 469.8046875 MB
> 127577 objects, 480.88671875 MB
> 127577 objects, 487.96484375 MB
> 127577 objects, 493.8359375 MB
> 127577 objects, 497.671875 MB
> 127577 objects, 498.55859375 MB
> 127577 objects, 501.42578125 MB
> {noformat}
> The third script, [arrow_memory_leak.py 
> |https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_memory_leak-py]
>  is a Python implementation of the same script. This shows that the problem 
> is not in the Ruby bindings:
> {noformat}
> 2106 objects, 31.75390625 MB
> 2106 objects, 382.28515625 MB
> 2106 objects, 549.41796875 MB
> 2106 objects, 656.78125 MB
> 2106 objects, 679.6875 MB
> 2106 objects, 691.9921875 MB
> 2106 objects, 708.73828125 MB
> 2106 objects, 717.296875 MB
> 2106 objects, 724.390625 MB
> 2106 objects, 729.19921875 MB
> 2106 objects, 734.47265625 MB
> {noformat}
> I have also tested Arrow 5.0.0 and it has the same problem.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to