ava6969 opened a new issue, #45888:
URL: https://github.com/apache/arrow/issues/45888
### Describe the bug, including details regarding any error messages,
version, and platform.
Description
Table::Make does not validate that the data types in the schema match the
actual data types of the provided columns. This allows tables to be created
with mismatched types, leading to potential crashes, data corruption, or
undefined behavior when later accessing the data.
SIGABT from using acero with this tab
Steps to reproduce
`
#include <arrow/api.h>
#include <iostream>
#include <vector>
int main() {
// Create string data but will mismatch with schema
std::vector<std::string> string_data = {"one", "two", "three"};
arrow::StringBuilder string_builder;
ARROW_RETURN_NOT_OK(string_builder.AppendValues(string_data));
std::shared_ptr<arrow::Array> string_array =
string_builder.Finish().ValueOrDie();
auto string_chunked =
std::make_shared<arrow::ChunkedArray>(string_array);
// Numeric data (correctly typed in schema)
std::vector<double> numeric_data = {1.0, 2.0, 3.0};
arrow::DoubleBuilder double_builder;
ARROW_RETURN_NOT_OK(double_builder.AppendValues(numeric_data));
std::shared_ptr<arrow::Array> double_array =
double_builder.Finish().ValueOrDie();
auto double_chunked =
std::make_shared<arrow::ChunkedArray>(double_array);
// Column vector with mismatched types
std::vector<std::shared_ptr<arrow::ChunkedArray>> columns =
{string_chunked, double_chunked};
// Schema incorrectly claims first column is double
auto incorrect_schema = arrow::schema({
arrow::field("column1", arrow::float64()), // Wrong! This is
actually string data
arrow::field("column2", arrow::float64()) // This is correct
});
// No validation error is raised!
auto table = arrow::Table::Make(incorrect_schema, columns);
// Trying to access will likely cause corrupt data or crashes
std::cout << "Table created with incorrect types: " << table->ToString()
<< std::endl;
std::cout << "First column data: " << table->column(0)->ToString() <<
std::endl;
return 0;
}
`
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]