stenlarsson commented on issue #48880: URL: https://github.com/apache/arrow/issues/48880#issuecomment-3776529312
From what I can gather the problem is that there is an `Arrow::Buffer` that points to data owned by a Ruby string, but the string is destroyed by the garbage collector. There is a long chain of references which must be maintained for the string to survive. It looks something like this in the below example: `ExecutePlan` => `ExecuteNode` => `ProjectNodeOptions` => `CallExpression` => `LiteralExpression` => `ScalarDatum` => `StringScalar` => `Buffer` => string I tried adding all of these classes [here](https://github.com/apache/arrow/blob/e78abb9cc3bf07f077b05d1acd97f95045e6d246/ruby/red-arrow/lib/arrow/loader.rb#L45-L64) to keep the references, but `ExecutePlan` is not so easy. Have a look at the following example: ```ruby require 'bundler/setup' require 'arrow' table = Arrow::Table.new( 'foo' => [1, 2], 'bar' => [%w[a b], %w[c d]], ) plan = Arrow::ExecutePlan.new node = plan.build_source_node(table) node = plan.build_project_node( node, Arrow::ProjectNodeOptions.new( [ :foo, Arrow::CallExpression.new('binary_join', [:bar, ',']), ], %w[foo bar], ), ) puts plan.nodes.map(&:object_id).join(', ') puts plan.nodes.map(&:object_id).join(', ') GC.start puts plan.nodes.map(&:object_id).join(', ') puts plan.nodes.map(&:object_id).join(', ') ``` When I run this example I get the following output: ``` 6128, 6136 6128, 6136 6128, 6144 6128, 6144 ``` It looks like the project-node gets represented by a new Ruby object after running the garbage collector. @kou, what could be the cause of this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
