[ https://issues.apache.org/jira/browse/ARROW-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-13253: ----------------------------------- Labels: pull-request-available (was: ) > [C++][FlightRPC] Segfault when sending record batch >2GB > -------------------------------------------------------- > > Key: ARROW-13253 > URL: https://issues.apache.org/jira/browse/ARROW-13253 > Project: Apache Arrow > Issue Type: Bug > Components: C++, FlightRPC > Affects Versions: 4.0.1 > Reporter: David Li > Assignee: David Li > Priority: Major > Labels: pull-request-available > Fix For: 5.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > When sending a record batch > 2GiB, the server will segfault. Although Flight > checks for this case and returns an error, it turns out that gRPC always > tries to increment the refcount of the result buffer whether the > serialization handler returned successfully or not: > {code:cpp} > // From gRPC 1.36 > Status CallOpSendMessage::SendMessagePtr(const M* message, > WriteOptions options) { > msg_ = message; > write_options_ = options; > // Store the serializer for later since we have access to the message > serializer_ = [this](const void* message) { > bool own_buf; > // TODO(vjpai): Remove the void below when possible > // The void in the template parameter below should not be needed > // (since it should be implicit) but is needed due to an observed > // difference in behavior between clang and gcc for certain internal users > Status result = SerializationTraits<M, void>::Serialize( > *static_cast<const M*>(message), send_buf_.bbuf_ptr(), &own_buf); > if (!own_buf) { > // XXX(lidavidm): This should perhaps check result.ok(), or Serialize > should > // unconditionally initialize send_buf_ > send_buf_.Duplicate(); > } > return result; > }; > return Status(); > } > {code} > Hence when Flight returns an error without initializing the buffer, we get a > segfault. > Originally reported on StackOverflow: > [https://stackoverflow.com/questions/68230146/pyarrow-flight-do-get-segfault-when-pandas-dataframe-over-3gb] -- This message was sent by Atlassian Jira (v8.3.4#803005)