westonpace commented on code in PR #13567: URL: https://github.com/apache/arrow/pull/13567#discussion_r917974623
########## cpp/src/arrow/dataset/partition.h: ########## @@ -63,13 +64,18 @@ struct ARROW_DS_EXPORT PartitionPathFormat { /// Paths are consumed from left to right. Paths must be relative to /// the root of a partition; path prefixes must be removed before passing /// the path to a partitioning for parsing. -class ARROW_DS_EXPORT Partitioning { +class ARROW_DS_EXPORT Partitioning : public util::EqualityComparable<Partitioning> { public: virtual ~Partitioning() = default; /// \brief The name identifying the kind of partitioning virtual std::string type_name() const = 0; + /// \brief determines if partiioning is exactly equal + virtual bool Equals(const Partitioning& other, bool check_metadata = false) const { Review Comment: We shouldn't need a `check_metadata` flag here. This method here should pass `check_metadata=false` in always. If, by some chance, a future partitioning decides to depend on the schema metadata (which should be possible) then it should override this method and pass `check_metadata=true`. ########## cpp/src/arrow/dataset/partition.cc: ########## @@ -82,6 +82,14 @@ std::shared_ptr<Partitioning> Partitioning::Default() { std::string type_name() const override { return "default"; } + bool Equals(const Partitioning& other, bool check_metadata) const override { + if (this == &other) { + return true; + } + return checked_cast<const DefaultPartitioning&>(other).type_name() == type_name() && Review Comment: I agree. If you received a pointer type `Partitioning*` you could do a `dynamic_cast` instead of the `type_name` comparison. However, a failed `dynamic_cast` is an exception when you have a reference type and it's probably better to just compare `type_name`. The comparison to `type_name` is basically doing the same thing as a cast check anyways. The reason a `checked_cast` is a bad idea is that it should be ok for a user to do: ``` DefaultPartitioning one; KeyValuePartitioning two(schema); if (one.Equals(two)) { std::cout << "equal" << std::endl; } ``` This `Equals` method should always return false so this toy example is somewhat meaningless. However, it could be useful if the actual partitions were based on file metadata or something dynamically calculated. If there is a `checked_cast` then, instead of false, we will get an abort in debug mode and potentially a segmentation fault in release mode. ########## cpp/src/arrow/dataset/partition.h: ########## @@ -310,6 +322,11 @@ class ARROW_DS_EXPORT FunctionPartitioning : public Partitioning { std::string type_name() const override { return name_; } + bool Equals(const Partitioning& other, bool check_metadata) const override { + return ::arrow::internal::checked_cast<const FunctionPartitioning&>(other) + .type_name() == type_name(); Review Comment: I was going to suggest we at least compare `parse_impl_` and `format_impl_` but it appears [that is somewhat meaningless](https://en.cppreference.com/w/cpp/utility/functional/function/operator_cmp). I think we should always return false here and, if a user wants to allow comparisons, they can override this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org