edponce commented on a change in pull request #11019:
URL: https://github.com/apache/arrow/pull/11019#discussion_r700603134



##########
File path: cpp/src/arrow/compute/api_vector.cc
##########
@@ -140,6 +144,15 @@ PartitionNthOptions::PartitionNthOptions(int64_t pivot)
     : FunctionOptions(internal::kPartitionNthOptionsType), pivot(pivot) {}
 constexpr char PartitionNthOptions::kTypeName[];
 
+SelectKOptions::SelectKOptions(int64_t k, std::vector<std::string> keys, 
std::string keep,
+                               SortOrder order)
+    : FunctionOptions(internal::kSelectKOptionsType),
+      k(k),
+      keys(std::move(keys)),
+      keep(keep),
+      order(order) {}
+constexpr char SelectKOptions::kTypeName[];
+

Review comment:
       The select K algorithm is a general approach to get the topK, bottomK, 
or median statistic. It seems that SortOrder option is always `Descending` for 
topK and `Ascending` for bottomK, so I recommend to use an enum for the type of 
statistic desired instead of specifying ordering. As a user, if I specify 
`Ascending` it is not intuitive that it corresponds to topK because it depends 
from which side the sorted data is searched.
   ```
   enum class SelectKOperator {
     TOP,
     BOTTOM,
     MEDIAN,  // possibly for another PR
   };
   ```
   Also, I am not sure that the`keys` and `keep` options are part of common 
selectK APIs. I think that having a sorter data member which represents the 
options for a sorting algorithm would be better structured.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to