Hi all,

We have min/max/sort APIs in Crunch. The min and max rely on S(user type) being comparable while the Sort API relies on the corresponding writable type being comparable i. WritableComparable. To me the min and max API are special cases of Sort API and the three should be in sync with each other. If this is not the case then at-least theoretically we could have cases where sorting produces results that are different from min/max functions. We could adopt the Sort approach for all three but there are some issues in that api like if the Writable is not comparable then the error will not be that clear, S could have a comparator that is different from the Writable then the results are not as expected by user etc. Or maybe we can use comparable S in Sort api, I am not sure, but I think we would not be able to use hadoop shuffle and sort then. I do not have complete idea how we could make the three in sync. Any thoughts on the same ? But I would like to ask first should we even try to to do that ? or I am just cooking some theory and this has no practical use case. There has been some discussion on this in CRUNCH-57 <https://issues.apache.org/jira/browse/CRUNCH-57> issue. Let me know what you think.

regards,
Rahul


Reply via email to