westonpace commented on code in PR #13567:
URL: https://github.com/apache/arrow/pull/13567#discussion_r917974623


##########
cpp/src/arrow/dataset/partition.h:
##########
@@ -63,13 +64,18 @@ struct ARROW_DS_EXPORT PartitionPathFormat {
 /// Paths are consumed from left to right. Paths must be relative to
 /// the root of a partition; path prefixes must be removed before passing
 /// the path to a partitioning for parsing.
-class ARROW_DS_EXPORT Partitioning {
+class ARROW_DS_EXPORT Partitioning : public 
util::EqualityComparable<Partitioning> {
  public:
   virtual ~Partitioning() = default;
 
   /// \brief The name identifying the kind of partitioning
   virtual std::string type_name() const = 0;
 
+  /// \brief determines if partiioning is exactly equal
+  virtual bool Equals(const Partitioning& other, bool check_metadata = false) 
const {

Review Comment:
   We shouldn't need a `check_metadata` flag here.  This method here should 
pass `check_metadata=false` in always.
   
   If, by some chance, a future partitioning decides to depend on the schema 
metadata (which should be possible) then it should override this method and 
pass `check_metadata=true`.



##########
cpp/src/arrow/dataset/partition.cc:
##########
@@ -82,6 +82,14 @@ std::shared_ptr<Partitioning> Partitioning::Default() {
 
     std::string type_name() const override { return "default"; }
 
+    bool Equals(const Partitioning& other, bool check_metadata) const override 
{
+      if (this == &other) {
+        return true;
+      }
+      return checked_cast<const DefaultPartitioning&>(other).type_name() == 
type_name() &&

Review Comment:
   I agree.  If you received a pointer type `Partitioning*` you could do a 
`dynamic_cast` instead of the `type_name` comparison.  However, a failed 
`dynamic_cast` is an exception when you have a reference type and it's probably 
better to just compare `type_name`.
   
   The comparison to `type_name` is basically doing the same thing as a cast 
check anyways.
   
   The reason a `checked_cast` is a bad idea is that it should be ok for a user 
to do:
   
   ```
   DefaultPartitioning one;
   KeyValuePartitioning two(schema);
   if (one.Equals(two)) {
     std::cout << "equal" << std::endl;
   }
   ```
   
   This `Equals` method should always return false so this toy example is 
somewhat meaningless.  However, it could be useful if the actual partitions 
were based on file metadata or something dynamically calculated.  If there is a 
`checked_cast` then, instead of false, we will get an abort in debug mode and 
potentially a segmentation fault in release mode.



##########
cpp/src/arrow/dataset/partition.h:
##########
@@ -310,6 +322,11 @@ class ARROW_DS_EXPORT FunctionPartitioning : public 
Partitioning {
 
   std::string type_name() const override { return name_; }
 
+  bool Equals(const Partitioning& other, bool check_metadata) const override {
+    return ::arrow::internal::checked_cast<const FunctionPartitioning&>(other)
+               .type_name() == type_name();

Review Comment:
   I was going to suggest we at least compare `parse_impl_` and `format_impl_` 
but it appears [that is somewhat 
meaningless](https://en.cppreference.com/w/cpp/utility/functional/function/operator_cmp).
   
   I think we should always return false here and, if a user wants to allow 
comparisons, they can override this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to