On 04/27/2015 05:23 PM, Paul Sandoz wrote:
On Apr 27, 2015, at 4:56 PM, Stephen Colebourne <scolebou...@joda.org> wrote:
Obviously, this is yet another possible workaround. But it is a
workaround.
I don't consider it "just a workaround" :-)


There really aren't that many rough edges with the set of
methods added with lambdas, but this is definitely one. That Guava
handled it specially is another good indication.

Tis conjecture, but perhaps it might have been different in post-lambda world?

One issue is there are zillions of possible more specific convenience 
operations we could add. Everyone has their own favourite. Some static methods 
were recently added to Stream and Optional in preference to such operations.

There has to be a really good reason to add new operations. I realize this 
use-case might be more common than others but i am still yet to be convinced 
that it has sufficient weight given flatMap + lambda + static method.

One reason might be that the workaround creates at least two new objects per included element of the stream and the overhead involved for executing the flat-map logic. A more general operation might be something like the following:

    /**
* Returns a stream consisting of the non-null results of applying the given
     * function to the elements of this stream.
     */
    <R> Stream<R> filterMap(Function<? super T, ? extends R> mapper);


Stephen's example would then read:

    return input.stream()
        .filterMap(t -> t instanceof Foo ? (Foo) t : null)
        .someTerminalOperation();


Combining filtering and mapping in one operation might often be desirable to avoid duplicate work (for example when filtering and mapping needs to compute some common but costly intermediate result for each element). flatMap is admittedly suitable for that too, but has it's overhead. At what per-operation cost this overhead pays-off can be seen at the end...

I know that null values were a controversial topic when this API was being designed and that the decision was made to basically "ignore" their presence in stream elements. So making null part of the API contract might be out of the question right? So what about Optional? Could it be used to make flatMap a little more efficient for the combined filter/map case?

For example, could the following composition be written in a more concise way?

input.stream()
    .map(t -> t instanceof Foo ? Optional.of((Foo) t) : Optional.empty())
    .filter(Optional::isPresent)
    .map(Optional::get)

Maybe with operation like:

    /**
* Returns a stream consisting of the "present" unwrapped results of applying the given
     * function to the elements of this stream.
     */
<R> Stream<R> mapOptionally(Function<? super T, Optional<? extends R>> mapper);


But that's not what Stephen would like to see, and I personally don't mind being a little more verbose if it makes code execute faster. I would be pretty confident writing the following:

input.stream()
    .map(t -> t instanceof Foo ? (Foo)t : null)
    .filter(f -> f != null)


To quantify the overheads involved with various approaches, I created a little benchmark that shows the following results:


Benchmark (opCost) Mode Samples Score Score error Units

j.t.StreamBench.filterThenMap 0 avgt 10 1.186 0.010 ms/op j.t.StreamBench.filterThenMap 10 avgt 10 2.642 0.205 ms/op j.t.StreamBench.filterThenMap 20 avgt 10 5.254 0.011 ms/op j.t.StreamBench.filterThenMap 30 avgt 10 8.187 0.165 ms/op j.t.StreamBench.filterThenMap 40 avgt 10 11.525 0.295 ms/op

j.t.StreamBench.flatMap 0 avgt 10 2.015 0.188 ms/op j.t.StreamBench.flatMap 10 avgt 10 3.287 0.224 ms/op j.t.StreamBench.flatMap 20 avgt 10 5.275 0.638 ms/op j.t.StreamBench.flatMap 30 avgt 10 7.033 0.209 ms/op j.t.StreamBench.flatMap 40 avgt 10 9.146 0.281 ms/op

j.t.StreamBench.mapToNullable 0 avgt 10 1.185 0.006 ms/op j.t.StreamBench.mapToNullable 10 avgt 10 2.120 0.392 ms/op j.t.StreamBench.mapToNullable 20 avgt 10 3.677 0.210 ms/op j.t.StreamBench.mapToNullable 30 avgt 10 5.526 0.126 ms/op j.t.StreamBench.mapToNullable 40 avgt 10 7.884 0.202 ms/op

j.t.StreamBench.mapToOptional 0 avgt 10 1.144 0.121 ms/op j.t.StreamBench.mapToOptional 10 avgt 10 2.322 0.146 ms/op j.t.StreamBench.mapToOptional 20 avgt 10 4.371 0.270 ms/op j.t.StreamBench.mapToOptional 30 avgt 10 6.215 0.536 ms/op j.t.StreamBench.mapToOptional 40 avgt 10 8.471 0.554 ms/op


Comparing .filter(op).map(op) with .flatMap(op) where each operation has it's cost, we see there is a tripping point at opCost=20 where flatMap() starts to pay off if we can merge the two ops into one with equal cost. But we can also see that flatMap has it's cost too, compared to other two approaches (mapToNullable/mapToOptional) which is most obvious when the operation cost is low.

So the conclusion? No, I don't think we need a new Stream method. I just wanted to show that flatMap() is maybe the most universal but not always the best (fastest) answer for each problem.

Regards, Peter

P.S. The benchmark source:

package jdk.test;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Stream;

/**
 * Created by peter on 4/27/15.
 */
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, warmups = 0)
@Warmup(iterations = 5)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class StreamBench {

    @Param({"0", "10", "20", "30", "40"})
    public long opCost;

    List<Object> objects;

    @Setup
    public void setup() {
        objects = new ArrayList<>(100000);
        ThreadLocalRandom tlr = ThreadLocalRandom.current();
        for (int i = 0; i < 100000; i++) {
            objects.add(tlr.nextBoolean() ? "123" : 123);
        }
    }

    <F, T> Function<F, T> withCost(Function<F, T> function) {
        return f -> {
            Blackhole.consumeCPU(opCost);
            return function.apply(f);
        };
    }

    <T> Predicate<T> withCost(Predicate<T> predicate) {
        return t -> {
            Blackhole.consumeCPU(opCost);
            return predicate.test(t);
        };
    }

    @Benchmark
    public long filterThenMap() {
        return objects.stream()
            .filter(withCost((Object o) -> o instanceof String))
            .map(withCost((Object o) -> (String) o))
            .count();
    }

    @Benchmark
    public long flatMap() {
        return objects.stream()
            .flatMap(withCost((Object o) -> o instanceof String
                ? Stream.of((String) o) : Stream.empty()))
            .count();
    }

    @Benchmark
    public long mapToOptional() {
        return objects.stream()
            .map(withCost((Object o) -> o instanceof String
                ? Optional.of((String) o) : Optional.empty()))
            .filter(Optional::isPresent)
            .map(Optional::get)
            .count();
    }

    @Benchmark
    public long mapToNullable() {
        return objects.stream()
            .map(withCost((Object o) -> o instanceof String
                ? (String) o : null))
            .filter(s -> s != null)
            .count();
    }
}



BTW, I wait months before making this request to see if it really was
common enough a pattern, but I'm confident that it is now.

Were you aware of the pattern using flatMap during those months?

Paul.


Reply via email to