[ https://issues.apache.org/jira/browse/ARROW-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624887#comment-17624887 ]
Kouhei Sutou commented on ARROW-18161: -------------------------------------- Thanks. {{Arrow::Table.new(**table)}} is the cause of this problem. We need to keep {{Arrow::Buffer}} for each {{arrow_frame[key].value}} while {{Arrow::Table.new(\*\*table)}} object is alive to avoid memory copy. {{Arrow::Table.load(Arrow::Buffer.new)}} keeps a reference to the given {{Arrow::Buffer}} but {{Arrow::table.new(\*\*table)}} doesn't. So {{Arrow::Buffer}} s are GC-ed. How about the following for now? {code:ruby} def _get_arrow_frame_from_proto_arrow_frame(arrow_frame) columns = {} buffers = [] arrow_frame.keys.each do |key| buffer = Arrow::Buffer.new(arrow_frame[key].value) buffers << buffer tmp = Arrow::Table.load(buffer) col_name = create_friendly_name_for_key(key) columns[col_name] = tmp[0].data end table = Arrow::Table.new(**columns) table.instance_variable_set(:@buffers, buffers) table end {code} We can avoid the {{instance_variable_set}} in the future by referring the associated buffer from all related objects such as {{Arrow::ChunkedArray}} in {{Arrow::Table}}. > Reading error table causes mutations > ------------------------------------ > > Key: ARROW-18161 > URL: https://issues.apache.org/jira/browse/ARROW-18161 > Project: Apache Arrow > Issue Type: Bug > Components: Ruby > Affects Versions: 9.0.0 > Environment: Ruby 3.1.2 > Reporter: Noah Horton > Assignee: Kouhei Sutou > Priority: Major > > ven an Arrow::Table with several columns "X" > > {code:ruby} > # Rails console outputs > 3.1.2 :107 > x.schema > => > #<Arrow::Schema:0x7ff2fbc426d8 ptr=0x55851587bc20 actual_values: int64 > dates: date32[day] > expected_values: double> > 3.1.2 :108 > x.schema > => > #<Arrow::Schema:0x7ff2fbbcda68 ptr=0x55851a541020 actual_values: int64 > dates: date32[day] > expected_values: double> > 3.1.2 :109 > {code} > Note that the object and pointer have both changed values. > But the far bigger issue is that repeated reads from it will cause different > results: > {code:ruby} > 3.1.2 :097 > x[1][0] > => Sun, 22 Aug 2021 > 3.1.2 :098 > x[1][1] > => nil > 3.1.2 :099 > x[1][0] > => nil {code} > I have a lot of issues like this - when I have done these types of read > operations, I get the original table with the data in the columns all > shuffled around or deleted. > I do ingest the data slightly oddly in the first place as it comes in over > GRPC and I am using Arrow::Buffer to read it from the GRPC and then passing > that into Arrow::Table.load. But I would not expect that once it was in > Arrow::Table that I could do anything to permute it unintentionally. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)