Re: Additional method on Stream

Peter Levart Mon, 27 Apr 2015 12:06:17 -0700


On 04/27/2015 05:23 PM, Paul Sandoz wrote:

On Apr 27, 2015, at 4:56 PM, Stephen Colebourne <scolebou...@joda.org> wrote:

Obviously, this is yet another possible workaround. But it is a
workaround.

I don't consider it "just a workaround" :-)

There really aren't that many rough edges with the set of
methods added with lambdas, but this is definitely one. That Guava
handled it specially is another good indication.

Tis conjecture, but perhaps it might have been different in post-lambda world?

One issue is there are zillions of possible more specific convenience 
operations we could add. Everyone has their own favourite. Some static methods 
were recently added to Stream and Optional in preference to such operations.

There has to be a really good reason to add new operations. I realize this 
use-case might be more common than others but i am still yet to be convinced 
that it has sufficient weight given flatMap + lambda + static method.

One reason might be that the workaround creates at least two new objectsper included element of the stream and the overhead involved forexecuting the flat-map logic. A more general operation might besomething like the following:

/**

* Returns a stream consisting of the non-null results of applyingthe given

     * function to the elements of this stream.
     */
    <R> Stream<R> filterMap(Function<? super T, ? extends R> mapper);


Stephen's example would then read:

    return input.stream()
        .filterMap(t -> t instanceof Foo ? (Foo) t : null)
        .someTerminalOperation();

Combining filtering and mapping in one operation might often bedesirable to avoid duplicate work (for example when filtering andmapping needs to compute some common but costly intermediate result foreach element). flatMap is admittedly suitable for that too, but has it'soverhead. At what per-operation cost this overhead pays-off can be seenat the end...

I know that null values were a controversial topic when this API wasbeing designed and that the decision was made to basically "ignore"their presence in stream elements. So making null part of the APIcontract might be out of the question right? So what about Optional?Could it be used to make flatMap a little more efficient for thecombined filter/map case?

For example, could the following composition be written in a moreconcise way?


input.stream()
    .map(t -> t instanceof Foo ? Optional.of((Foo) t) : Optional.empty())
    .filter(Optional::isPresent)
    .map(Optional::get)

Maybe with operation like:

    /**

* Returns a stream consisting of the "present" unwrapped resultsof applying the given

     * function to the elements of this stream.
     */

<R> Stream<R> mapOptionally(Function<? super T, Optional<? extendsR>> mapper);

But that's not what Stephen would like to see, and I personally don'tmind being a little more verbose if it makes code execute faster. Iwould be pretty confident writing the following:


input.stream()
    .map(t -> t instanceof Foo ? (Foo)t : null)
    .filter(f -> f != null)

To quantify the overheads involved with various approaches, I created alittle benchmark that shows the following results:

Benchmark (opCost) Mode Samples Score Scoreerror Units

j.t.StreamBench.filterThenMap 0 avgt 10 1.1860.010 ms/opj.t.StreamBench.filterThenMap 10 avgt 10 2.6420.205 ms/opj.t.StreamBench.filterThenMap 20 avgt 10 5.2540.011 ms/opj.t.StreamBench.filterThenMap 30 avgt 10 8.1870.165 ms/opj.t.StreamBench.filterThenMap 40 avgt 10 11.5250.295 ms/op

j.t.StreamBench.flatMap 0 avgt 10 2.0150.188 ms/opj.t.StreamBench.flatMap 10 avgt 10 3.2870.224 ms/opj.t.StreamBench.flatMap 20 avgt 10 5.2750.638 ms/opj.t.StreamBench.flatMap 30 avgt 10 7.0330.209 ms/opj.t.StreamBench.flatMap 40 avgt 10 9.1460.281 ms/op

j.t.StreamBench.mapToNullable 0 avgt 10 1.1850.006 ms/opj.t.StreamBench.mapToNullable 10 avgt 10 2.1200.392 ms/opj.t.StreamBench.mapToNullable 20 avgt 10 3.6770.210 ms/opj.t.StreamBench.mapToNullable 30 avgt 10 5.5260.126 ms/opj.t.StreamBench.mapToNullable 40 avgt 10 7.8840.202 ms/op

j.t.StreamBench.mapToOptional 0 avgt 10 1.1440.121 ms/opj.t.StreamBench.mapToOptional 10 avgt 10 2.3220.146 ms/opj.t.StreamBench.mapToOptional 20 avgt 10 4.3710.270 ms/opj.t.StreamBench.mapToOptional 30 avgt 10 6.2150.536 ms/opj.t.StreamBench.mapToOptional 40 avgt 10 8.4710.554 ms/op

Comparing .filter(op).map(op) with .flatMap(op) where each operation hasit's cost, we see there is a tripping point at opCost=20 where flatMap()starts to pay off if we can merge the two ops into one with equal cost.But we can also see that flatMap has it's cost too, compared to othertwo approaches (mapToNullable/mapToOptional) which is most obvious whenthe operation cost is low.

So the conclusion? No, I don't think we need a new Stream method. I justwanted to show that flatMap() is maybe the most universal but not alwaysthe best (fastest) answer for each problem.


Regards, Peter

P.S. The benchmark source:

package jdk.test;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Stream;

/**
 * Created by peter on 4/27/15.
 */
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, warmups = 0)
@Warmup(iterations = 5)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
public class StreamBench {

    @Param({"0", "10", "20", "30", "40"})
    public long opCost;

    List<Object> objects;

    @Setup
    public void setup() {
        objects = new ArrayList<>(100000);
        ThreadLocalRandom tlr = ThreadLocalRandom.current();
        for (int i = 0; i < 100000; i++) {
            objects.add(tlr.nextBoolean() ? "123" : 123);
        }
    }

    <F, T> Function<F, T> withCost(Function<F, T> function) {
        return f -> {
            Blackhole.consumeCPU(opCost);
            return function.apply(f);
        };
    }

    <T> Predicate<T> withCost(Predicate<T> predicate) {
        return t -> {
            Blackhole.consumeCPU(opCost);
            return predicate.test(t);
        };
    }

    @Benchmark
    public long filterThenMap() {
        return objects.stream()
            .filter(withCost((Object o) -> o instanceof String))
            .map(withCost((Object o) -> (String) o))
            .count();
    }

    @Benchmark
    public long flatMap() {
        return objects.stream()
            .flatMap(withCost((Object o) -> o instanceof String
                ? Stream.of((String) o) : Stream.empty()))
            .count();
    }

    @Benchmark
    public long mapToOptional() {
        return objects.stream()
            .map(withCost((Object o) -> o instanceof String
                ? Optional.of((String) o) : Optional.empty()))
            .filter(Optional::isPresent)
            .map(Optional::get)
            .count();
    }

    @Benchmark
    public long mapToNullable() {
        return objects.stream()
            .map(withCost((Object o) -> o instanceof String
                ? (String) o : null))
            .filter(s -> s != null)
            .count();
    }
}

BTW, I wait months before making this request to see if it really was
common enough a pattern, but I'm confident that it is now.

Were you aware of the pattern using flatMap during those months?

Paul.

Re: Additional method on Stream

Reply via email to