edponce commented on a change in pull request #11019: URL: https://github.com/apache/arrow/pull/11019#discussion_r700603134
########## File path: cpp/src/arrow/compute/api_vector.cc ########## @@ -140,6 +144,15 @@ PartitionNthOptions::PartitionNthOptions(int64_t pivot) : FunctionOptions(internal::kPartitionNthOptionsType), pivot(pivot) {} constexpr char PartitionNthOptions::kTypeName[]; +SelectKOptions::SelectKOptions(int64_t k, std::vector<std::string> keys, std::string keep, + SortOrder order) + : FunctionOptions(internal::kSelectKOptionsType), + k(k), + keys(std::move(keys)), + keep(keep), + order(order) {} +constexpr char SelectKOptions::kTypeName[]; + Review comment: The select K algorithm is a general approach to get the topK, bottomK, or median statistic. It seems that SortOrder option is always `Descending` for topK and `Ascending` for bottomK, so I recommend to use an enum for the type of statistic desired instead of specifying ordering. As a user, if I specify `Ascending` it is not intuitive that it corresponds to topK because it depends from which side the sorted data is searched. ``` enum class SelectKOperator { TOP, BOTTOM, MEDIAN, // possibly for another PR }; ``` Also, I am not sure that the`keys` and `keep` options are part of common selectK APIs. I think that having a sorter data member which represents the options for a sorting algorithm would be better structured. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org