[ https://issues.apache.org/jira/browse/ARROW-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kouhei Sutou resolved ARROW-14790. ---------------------------------- Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13228 [https://github.com/apache/arrow/pull/13228] > [GLib] Memory leak on creating GArrowData > ----------------------------------------- > > Key: ARROW-14790 > URL: https://issues.apache.org/jira/browse/ARROW-14790 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python, Ruby > Reporter: Sten Larsson > Assignee: Kouhei Sutou > Priority: Major > Labels: pull-request-available > Fix For: 9.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > We're having problem with a memory leak in a Ruby script that processes many > CSV files. I have written some short scripts do demonstrate the problem: > [https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214] > The first script, > [arrow_test_csv.rb|https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_test_csv-rb], > creates a 184 MB CSV file for testing. > The second script, > [arrow_memory_leak.rb|https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_memory_leak-rb], > then loads the CSV file 10 times using Arrow. It uses the > [get_process_mem|https://rubygems.org/gems/get_process_mem] gem to print the > memory usage both before and after each iteration. It also invokes the > garbage collector on each iteration to ensure the problem is not that Ruby > holds on to any objects. This is what it prints on my MacBook Pro using Arrow > 6.0.0: > {noformat} > 127577 objects, 34.234375 MB > 127577 objects, 347.625 MB > 127577 objects, 438.7890625 MB > 127577 objects, 457.6953125 MB > 127577 objects, 469.8046875 MB > 127577 objects, 480.88671875 MB > 127577 objects, 487.96484375 MB > 127577 objects, 493.8359375 MB > 127577 objects, 497.671875 MB > 127577 objects, 498.55859375 MB > 127577 objects, 501.42578125 MB > {noformat} > The third script, [arrow_memory_leak.py > |https://gist.github.com/stenlarsson/60b1e4e99416738b41ee30e7ba294214#file-arrow_memory_leak-py] > is a Python implementation of the same script. This shows that the problem > is not in the Ruby bindings: > {noformat} > 2106 objects, 31.75390625 MB > 2106 objects, 382.28515625 MB > 2106 objects, 549.41796875 MB > 2106 objects, 656.78125 MB > 2106 objects, 679.6875 MB > 2106 objects, 691.9921875 MB > 2106 objects, 708.73828125 MB > 2106 objects, 717.296875 MB > 2106 objects, 724.390625 MB > 2106 objects, 729.19921875 MB > 2106 objects, 734.47265625 MB > {noformat} > I have also tested Arrow 5.0.0 and it has the same problem. -- This message was sent by Atlassian Jira (v8.20.7#820007)