While fixing the bug where the IFn version of mapValues on PGroupedTable was missing, I got thinking that this is quite an inefficient way of including support for lambdas and method references, and it still didn't actually support quite a few of the features that would make it easy to code against.
Negative parts of existing lambda implementation: 1) Explosion of already-crowded PCollection, PTable and PGroupedTable interfaces, and having to implement those methods in all implementations. 2) Not supporting flatMap to Optional or Stream types. 3) Not exposing convenient types for reduce-type operations (Stream instead of Iterable, for example). Something that would solve all three of these is to build lambda support as a separate artifact (so we can use all java8 types), and instead of the API being directly on the PSomething interfaces, we just have convenient ways to wrap up lambdas into DoFns or MapFns via statically-imported methods. The usage then becomes import static org.apache.crunch.Lambda.*; ... someCollection.parallelDo(flatMap(d -> someFnOf(d)), pt) ... otherGroupedTable.mapValue(reduce(seq -> seq.mapToInt(i -> i).sum()), ints()) Where flatMap and reduce are static methods on Lambda, and Lambda goes in it's own artifact (to preserve compatibility with 6 and 7 for the rest of Crunch). I've attached a basic proof-of-concept implementation which I've tested a few things with, and I'm very happy to sketch out a more substantial implementation if people here think it's a good idea in general. Thoughts? Ideas? Suggestions? Please tell me if this is crazy.
