sjperkins commented on a change in pull request #8510:
URL: https://github.com/apache/arrow/pull/8510#discussion_r722148585
##########
File path: cpp/src/arrow/extension_type_test.cc
##########
@@ -333,4 +334,144 @@ TEST_F(TestExtensionType, ValidateExtensionArray) {
ASSERT_OK(ext_arr4->ValidateFull());
}
+class TensorArray : public ExtensionArray {
+ public:
+ using ExtensionArray::ExtensionArray;
+};
+
+class TensorArrayType : public ExtensionType {
+ public:
+ explicit TensorArrayType(const std::shared_ptr<DataType>& type,
+ const std::vector<int64_t>& shape,
+ const std::vector<int64_t>& strides)
+ : ExtensionType(type), type_(type), shape_(shape), strides_(strides) {}
+
+ std::shared_ptr<DataType> type() const { return type_; }
+ std::vector<int64_t> shape() const { return shape_; }
+ std::vector<int64_t> strides() const { return strides_; }
+
+ std::string extension_name() const override {
+ std::stringstream s;
+ s << "ext-array-tensor-type<type=" << *storage_type() << ", shape=(";
+ for (uint64_t i = 0; i < shape_.size(); i++) {
+ s << shape_[i];
+ if (i < shape_.size() - 1) {
+ s << ", ";
+ }
+ }
+ s << "), strides=(";
+ for (uint64_t i = 0; i < strides_.size(); i++) {
+ s << strides_[i];
+ if (i < strides_.size() - 1) {
+ s << ", ";
+ }
+ }
+ s << ")>";
+ return s.str();
+ }
+
+ bool ExtensionEquals(const ExtensionType& other) const override {
+ return this->shape() == static_cast<const TensorArrayType&>(other).shape();
Review comment:
> Ah, I misunderstood your suggestion. Why would you need it so loose?
Practically speaking, this means that one cannot have for e.g. a `(10, 5,
4)` shape Tensor and a `(10, 6, 2)` Tensor in separate parquet files in the
same dataset -- IIRC the Dataset API will complain that their metadata doesn't
agree (due to the strict equality comparison). I do run into these sort of
cases in the datasets that I deal with.
More formally, I would argue that parameterising Tensors Types on `shape`
and `stride` introduces an infinite number of parameterisations and I'm not
sure that this class of parameterisations is useful. This doesn't imply that
`shape` and `stride` should not be attributes on a Tensor Type!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]