Hi all,
We have min/max/sort APIs in Crunch. The min and max rely on S(user
type) being comparable while the Sort API relies on the corresponding
writable type being comparable i. WritableComparable. To me the min
and max API are special cases of Sort API and the three should be in
sync with each other. If this is not the case then at-least
theoretically we could have cases where sorting produces results that
are different from min/max functions. We could adopt the Sort approach
for all three but there are some issues in that api like if the Writable
is not comparable then the error will not be that clear, S could have a
comparator that is different from the Writable then the results are not
as expected by user etc. Or maybe we can use comparable S in Sort api, I
am not sure, but I think we would not be able to use hadoop shuffle and
sort then. I do not have complete idea how we could make the three in
sync. Any thoughts on the same ? But I would like to ask first should we
even try to to do that ? or I am just cooking some theory and this has
no practical use case. There has been some discussion on this in
CRUNCH-57 <https://issues.apache.org/jira/browse/CRUNCH-57> issue. Let
me know what you think.
regards,
Rahul
- min/max/sort APIs not in sync Rahul
-