Re: DISCUSS: Sorted MapState API

Jan Lukavský Thu, 18 Jun 2020 13:11:34 -0700

My questions were just an example. I fully agree there is a fundamentalneed for a sorted state (of some form, and I also think this links toefficient implementation of retrations) - I was reacting to Kenn'squestion about BIP. This one would be pretty nice example why it wouldbe good to have such a "process" - not everything can be solved on MLand there are fundamental decisions that might need a closer attention.


On 6/18/20 5:28 PM, Reuven Lax wrote:

Jan - my proposal is exactly TimeSortedBagState (more accurately -TimeSortedListState), though I went a bit further and also proposed away to have a dynamic number of tagged TimeSortedBagStates.

You are correct that the runner doesn't really have to store the datatime sorted - what's actually needed is the ability to fetch andremove timestamp ranges of data (though that does include fetching theentire list); TimeOrderedState is probably a more accurate name thenTimeSortedState. I don't think we could get away with operations thatonly act on the smallest timestamp, however we could limit the API toonly being able to fetch and remove prefixes of data (ordered bytimestamp). However if we support prefixes, we might as well supportarbitrary subranges.

On Thu, Jun 18, 2020 at 7:26 AM Jan Lukavský <[email protected]<mailto:[email protected]>> wrote:


    Big +1 for a BIP, as this might really help clarify all the pros
    and cons of all possibilities. There seem to be questions that
    need answering and motivating use cases - do we need sorted map
    state or can we solve our use cases by something simpler - e.g.
    the mentioned TimeSortedBagState? Does that really have to be
    time-sorted structure, or does it "only" have to have operations
    that can efficiently find and remove element with smallest
    timestamp (like a PriorityQueue)?

    Jan

    On 6/18/20 5:32 AM, Kenneth Knowles wrote:

    Zooming in from generic philosophy to be clear: adding time
    ordered buffer to the Fn state API is *not* a shortcut.It has
    benefits that will not be achieved by SDK-side implementation on
    top of either ordered or unordered multimap. Are those benefits
    worth expanding the API? I don't know.

    A change to allow a runner to have a specialized implementation
    for time-buffered state would be one or more StateKey types,
    right? Reuven, maybe put this and your Java API in a doc? A BIP?
    Seems like there's at least the following to explore:

     - how that Java API would map to an SDK-side implementation on
    top of multimap state key
     - how that Java API would map to a new StateKey
     - whether there's actually more than one relevant implementation
    of that StateKey
     - whether SDK-side implementation on some other state key would
    be performant enough in all SDK languages (present and future)

    Zooming back out to generic philosophy: Proliferation of StateKey
    types tuned by runners (which can very easily still share
    implementation) is probably better than proliferation of complex
    SDK-side implementations with varying completeness and performance.

    Kenn

    On Wed, Jun 17, 2020 at 3:24 PM Reuven Lax <[email protected]
    <mailto:[email protected]>> wrote:

        It might help for me to describe what I have in mind. I'm
        still proposing that we build multimap, just not a
        globally-sorted multimap.

        My previous proposal was that we provide a Multimap<Key,
        Value> state type, sorted by key. this would have two
        additional operations - multimap.getRange(startKey, endKey)
        and multimap.deleteRange(startKey, endKey). The primary use
        case was timestamp sorting, but I felt that a sorted multimap
        provided a nice generalization - after all, you can simply
        key the multimap by timestamp to get timestamp sorting.

        This approach had some issues immediately that would take
        some work to solve. Since a multimap key can have any type
        and a runner will only be able to sort by encoded type, we
        would need to introduce a concept of order-preserving coders
        into Beam and plumb that through. Robert pointed out that
        even our existing standard coders for simple integral types
        don't preserve order, so there will likely be surprises here.

        My current proposal is for a multimap that is not sorted by
        key, but that can support.ordered values for a single key.
        Remember that a multimap maps K -> Iterable<V>, so this means
        that each individual Iterable<V> is ordered, but the keys
        have no specific order relative to each other. This is not
        too different from many multimap implementations where the
        keys are unordered, but the list of values for a single key
        at least has a stable order.

        The interface would look like this:

        public interface MultimapState<K, V> extends State {
          // Add a value with a default timestamp.
          void put(K key, V value);

          // Add a timestamped value.
          void put(K, key, TimestampedValue<V> value);

          // Remove all values for a key.
          void remove (K key);

          // Remove all values for a key with timestamps within the
        specified range.
          void removeRange(K key, Instant startTs, Instant endTs);

          // Get an Iterable of values for V. The Iterable will be
        returned sorted by timestamp.
        ReadableState<Iterable<TimestampedValue<V>>> get(K key);

          // Get an Iterable of values for V in the specified range.
        The Iterable will be returned sorted by timestamp.
        ReadableState<Iterable<TimestampedValue<V>>> getRange(K key,
        Instant startTs, Instant endTs);

          ReadableState<Iterable<K>> keys();
        ReadableState<Iterable<TimestampedValue<V>>> values();
          ReadableState<Iterable<Map.Entry<K, TimestampedValue<V>>
        entries;
        }

        We can of course provide helper functions that allow using
        MultimapState without deailing with TimestampValue for users
        who only want a multimap and don't want sorting.

        I think many users will only need a single sorted list - not
        a full multimap. It's worth offering this as well, and we can
        simply build it on top of MultimapState. It will look like an
        extension of BagState

        public interface TimestampSortedListState<T> extends State {
          void add(TimestampedValue<T> value);
          Iterable<TimestampedValue<T>> read();
          Iterable<TimestampedValue<T>> readRange(Instant startTs,
        Instant endTs);
          void clearRange(Instant startTs, Instant endTs);
        }


        On Wed, Jun 17, 2020 at 2:47 PM Luke Cwik <[email protected]
        <mailto:[email protected]>> wrote:

            The portability layer is meant to live across multiple
            versions of Beam and I don't think it should be treated
            by doing the simple and useful thing now since I believe
            it will lead to a proliferation of the API.

            On Wed, Jun 17, 2020 at 2:30 PM Kenneth Knowles
            <[email protected] <mailto:[email protected]>> wrote:

                I have thoughts on the subject of whether to have
                APIs just for the lowest-level building blocks versus
                having APIs for higher-level constructs. Specifically
                this applies to providing only unsorted multimap vs
                what I will call "time-ordered buffer". TL;DR: I'd
                vote to focus on time-ordered buffer; if it turns out
                to be easy to go all the way to sorted multimap
                that's nice-to-have; if it turns out to be easy to
                implement on top of unsorted map state that should
                probably be under the hood

                Reasons to build low-level multimap in the runner &
                fn api and layer higher-level things in the SDK:

                 - It is less implementation for runners if they have
                to only provide fewer lower-level building blocks
                like multimap state.
                 - There are many more runners than SDKs (and will be
                even more and more) so this saves overall.

                Reasons to build higher-level constructs directly in
                the runner and fn api:

                 - Having multiple higher-level state types may
                actually be less implementation than one complex
                state type, especially if they map to runner primitives.
                 - The runner may have better specialized
                implementations, especially for something like a
                time-ordered buffer.
                 - The particular access patterns in an SDK-based
                implementation may not be ideal for each runner's
                underlying implementation of the low-level building
                block.
                 - There may be excessive gRPC overhead even for
                optimal access patterns.

                There are ways to have best of both worlds, like:

                1. Define multiple state types according to
                fundamental access patterns, like we did this before
                portability.
                2. If it is easy to layer one on top of the other, do
                that inside the runner. Provide shared code so for
                runners providing the lowest-level primitive they get
                all the types for free.

                I understand that this is an oversimplification. It
                still creates some more work. And APIs are a burden
                so it is good to introduce as few as possible for
                maintenance. But it has performance benefits and also
                unblocks "just doing the simple and useful thing now"
                which I always like to do as long as it is compatible
                with future changes. If the APIs are fundamental,
                like sets, maps, timestamp ordering, then it is safe
                to guess that they will change rarely and be useful
                forever.

                Kenn

                On Tue, Jun 16, 2020 at 2:54 PM Luke Cwik
                <[email protected] <mailto:[email protected]>> wrote:

                    I would be glad to take a stab at how to provide
                    sorting on top of unsorted multimap state.
                    Based upon your description, you want integer
                    keys representing timestamps and arbitrary user
                    value for the values, is that correct?
                    What kinds of operations do you need on the
                    sorted map state in order of efficiency requirements?
                    (e.g. Next(x), Previous(x), GetAll(Range[x, y)),
                    ClearAll(Range[x, y))
                    What kinds of operations do we expect the
                    underlying unsorted map state to be able to provide?
                    (at a minimum Get(K), Append(K), Clear(K) but
                    what else e.g. enumerate(K)?)

                    I went through a similar exercise of how to
                    provide a list like side input view over a
                    multimap[1] side input which efficiently allowed
                    computation of size and provided random access
                    while only having access to get(K) and enumerate K's.

                    1:
                    
https://github.com/lukecwik/incubator-beam/blob/ec8769f6163ca8a4daecc2fb29708bc1da430917/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollectionViews.java#L568

                    On Tue, Jun 16, 2020 at 8:47 AM Reuven Lax
                    <[email protected] <mailto:[email protected]>> wrote:

                        Bringing this subject up again,

                        I've spent some time looking into
                        implementing this for the Dataflow runner.
                        I'm unable to find a way to implement the
                        arbitrary sorted multimap efficiently for the
                        case where there are large numbers of unique
                        keys. Since the primary driving use case is
                        timestamp ordering (i.e. key is event
                        timestamp), you would expect to have nearly a
                        new key per element. I considered Luke's
                        suggestion above, but unfortunately it
                        doesn't really solve this issue.

                        The primary use case for sorting always seems
                        to be sorting by timestamp. I want to propose
                        that instead of building the fully-general
                        sorted multimap, we instead focus on a state
                        type where the sort key is an integral type
                        (like a timestamp or an integer). There is
                        still a valid use case for multimap, but we
                        can provide that as an unordered state. At
                        least for Dataflow, it will be much easier

                        While my difficulties here may be specific to
                        the Dataflow runner, any such support would
                        have to be built into other runners as well,
                        and limiting to integral sorting likely makes
                        it easier for other runners to implement
                        this. Also, if you look at this
                        
<https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/runtime/join/TemporalRowtimeJoin.scala#L95>
 Flink
                        comment pointed out by Aljoscha, for Flink
                        the main use case identified was also
                        timestamp sorting. This will also simplify
                        the API design for this feature: Sorted
                        multimap with arbitrary keys would require us
                        to introduce a way of mapping natural
                        ordering to encoded ordering (i.e. a new
                        OrderPreservingCoder), but if we limit sort
                        keys to integral types, the API design is
                        simpler as integral types can be represented
                        directly.

                        Reuven

                        On Sun, Jun 2, 2019 at 7:04 AM Reuven Lax
                        <[email protected] <mailto:[email protected]>>
                        wrote:

                            This sounds to me like a potential runner
                            strategy. However if a runner can
                            natively support sorted maps (e.g. we
                            expect the Dataflow runner to be able to
                            do so, and I think it would be useful for
                            other runners as well), then it's
                            probably preferable to allow the runner
                            to use its native capabilities.

                            On Fri, May 24, 2019 at 11:05 AM Lukasz
                            Cwik <[email protected]
                            <mailto:[email protected]>> wrote:

                                For the API that you proposed, the
                                map key is always "void" and the sort
                                key == user key. So in my example of
                                key: dummy value
                                key.000: token, (0001, value4)
                                key.001: token, (0010, value1),
                                (0011, value2)
                                key.01: token
                                key.1: token, (1011, value3)
                                you would have:
                                "void": dummy value
                                "void".000: token, (0001, value4)
                                "void".001: token, (0010, value1),
                                (0011, value2)
                                "void".01: token
                                "void".1: token, (1011, value3)

                                Iterable<KV<K, V>> entriesUntil(K
                                limit) translates into walking the
                                the prefixes until you find a common
                                prefix for K and then filter for
                                values where they have a sort key <=
                                K. Using the example above, to find
                                entriesUntil(0010) you would:
                                look for key."", miss
                                look for key.0, miss
                                look for key.00, miss
                                look for key.000, hit, sort all
                                contained values using secondary key,
                                provide value4 to user
                                look for key.001, hit, notice that
                                001 is a prefix of 0010 so we sort
                                all contained values using secondary
                                key, filter out value2 and provide value1

                                void removeUntil(K limit) also
                                translates into walking the prefixes
                                but instead we will clear them when
                                we have a "hit" with some special
                                logic for when the sort key is a
                                prefix of the key. Used the example,
                                to removeUntil(0010) you would:
                                look for key."", miss
                                look for key.0, miss
                                look for key.00, miss
                                look for key.000, hit, clear
                                look for key.001, hit, notice that
                                001 is a prefix of 0010 so we sort
                                all contained values using secondary
                                key, store in memory all values that
                                > 0010, clear and append values
                                stored in memory.

                                On Fri, May 24, 2019 at 10:36 AM
                                Reuven Lax <[email protected]
                                <mailto:[email protected]>> wrote:

                                    Can you explain how fetching and
                                    deleting ranges of keys would
                                    work with this data structure?

                                    On Fri, May 24, 2019 at 9:50 AM
                                    Lukasz Cwik <[email protected]
                                    <mailto:[email protected]>> wrote:

                                        Reuven, for the example, I
                                        assume that we never want to
                                        store more then 2 values at a
                                        given sort key prefix, and if
                                        we do then we will create a
                                        new longer prefix splitting
                                        up the values based upon the
                                        sort key.

                                        Tuple representation in
                                        examples below is (key, sort
                                        key, value) and . is a
                                        character outside of the
                                        alphabet which can be
                                        represented by using an
                                        escaping encoding that wraps
                                        the key + sort key encoding.

                                        To insert (key, 0010,
                                        value1), we lookup "key" +
                                        all the prefixes of 0010
                                        finding one that is not
                                        empty. In this case its 0, so
                                        we append value to the map at
                                        key.0 ending up with (we also
                                        set the key to any dummy
                                        value to know that it it
                                        contains values):
                                        key: dummy value
                                        key."": token, (0010, value1)
                                        Now we insert (key, 0011,
                                        value2), we again lookup
                                        "key" + all the prefixes of
                                        0010, finding "", so we
                                        append value2 to key.""
                                        ending up with:
                                        key: dummy value
                                        key."": token, (0010,
                                        value1), (0011, value2)
                                        Now we insert (key, 1011,
                                        value3), we again lookup
                                        "key" + all the prefixes of
                                        1011 finding "" but notice
                                        that it is full, so we
                                        partition all the values into
                                        two prefixes 0 and 1. We also
                                        clear the "" prefix ending up
                                        with:
                                        key: dummy value
                                        key.0: token, (0010, value1),
                                        (0011, value2)
                                        key.1: token, (1011, value3)
                                        Now we insert (key, 0001,
                                        value4), we again lookup
                                        "key" + all the prefixes of
                                        the value finding 0 but
                                        notice that it is full, so we
                                        partition all the values into
                                        two prefixes 00 and 01 but
                                        notice this doesn't help us
                                        since 00 will be too full so
                                        we split 00 again to 000,
                                        001. We also clear the 0
                                        prefix ending up with:
                                        key: dummy value
                                        key.000: token, (0001, value4)
                                        key.001: token, (0010,
                                        value1), (0011, value2)
                                        key.01: token
                                        key.1: token, (1011, value3)

                                        We are effectively building a
                                        trie[1] where we only have
                                        values at the leaves and
                                        control how full each leaf
                                        can be. There are other trie
                                        representations like a radix
                                        tree that may be better.

                                        Looking up the values in
                                        sorted order for "key" would
                                        go like this:
                                        Is key set, yes
                                        look for key."", miss
                                        look for key.0, miss
                                        look for key.00, miss
                                        look for key.000, hit, sort
                                        all contained values using
                                        secondary key, provide value4
                                        to user
                                        look for key.001, hit, sort
                                        all contained values using
                                        secondary key, provide value1
                                        followed by value2 to user
                                        look for key.01, hit, empty,
                                        return no values to user
                                        look for key.1, hit, sort all
                                        contained values using
                                        secondary key, provide value3
                                        to user
                                        we have walked the entire
                                        prefix space, signal end of
                                        iterable

                                        Some notes for the above:
                                        * The dummy value is used to
                                        know that the key contains
                                        values and the token is to
                                        know whether there are any
                                        values deeper in the trie so
                                        when we know when to stop
                                        searching.
                                        * If we can recalculate the
                                        sort key from the combination
                                        of the key and value, then we
                                        don't need to store it.
                                        * Keys with lots of values
                                        will perform worse then keys
                                        with less values since we
                                        have to look up more keys but
                                        they will be empty reads. The
                                        number of misses can be
                                        controlled by how many
                                        elements we are willing to
                                        store at a given node before
                                        we subdivide.

                                        In reality you could build a
                                        lot of structures (e.g. red
                                        black tree, binary tree)
                                        using the sort key, the issue
                                        is the cost of
                                        rebalancing/re-organizing the
                                        structure in map form and
                                        whether it has a convenient
                                        pre-order traversal for lookups.



                                        On Fri, May 24, 2019 at 8:14
                                        AM Reuven Lax
                                        <[email protected]
                                        <mailto:[email protected]>> wrote:

                                            Some great comments!

                                            *Aljoscha*: absolutely
                                            this would have to be
                                            implemented by runners to
                                            be efficient. We can of
                                            course provide a default
                                            (inefficient)
                                            implementation, but
                                            ideally runners would
                                            provide better ones.

                                            *Jan* Exactly. I think
                                            MapState can be dropped
                                            or backed by this. E.g.

                                            *Robert* Great point
                                            about standard coders not
                                            satisfying this. That's
                                            why I suggested that we
                                            provide a way to tag the
                                            coders that do preserve
                                            order, and only accept
                                            those as key coders
                                            Alternatively we could
                                            present a more limited
                                            API - e.g. only allowing
                                            a hard-coded set of types
                                            to be used as keys - but
                                            that seems counter to the
                                            direction Beam usually
                                            goes. So users will have
                                            two ways .of creating
                                            multimap state specs:

                                               private final
                                            StateSpec<MultimapState<Long,
                                            String>> state =
                                            
StateSpecs.multimap(VarLongCoder.of(),
                                            StringUtf8Coder.of());

                                            or
                                             private final
                                            StateSpec<MultimapState<Long,
                                            String>> state =
                                            
StateSpecs.orderedMultimap(VarLongCoder.of(),
                                            StringUtf8Coder.of());

                                            The second one will
                                            validate that the key
                                            coder preserves order,
                                            and fails otherwise
                                            (similar to coder
                                            determinism checking in
                                            GroupByKey). (BTW we
                                            would also have versions
                                            of these functions that
                                            use coder inference to
                                            "guess" the coder, but
                                            those will do the same
                                            checking)

                                            Also the API I proposed
                                            did support
                                            random access! We could
                                            separate out
                                            OrderedBagState again if
                                            we think the use cases
                                            are fundamentally
                                            different. I merged the
                                            proposal into that of
                                            MultimapState because
                                            there seemed be 99% overlap.

                                            Reuven

                                            On Fri, May 24, 2019 at
                                            6:19 AM Robert Bradshaw
                                            <[email protected]
                                            <mailto:[email protected]>>
                                            wrote:

                                                On Fri, May 24, 2019
                                                at 5:32 AM Reuven Lax
                                                <[email protected]
                                                <mailto:[email protected]>>
                                                wrote:
                                                >
                                                > On Thu, May 23,
                                                2019 at 1:53 PM Ahmet
                                                Altay
                                                <[email protected]
                                                <mailto:[email protected]>>
                                                wrote:
                                                >>
                                                >>
                                                >>
                                                >> On Thu, May 23,
                                                2019 at 1:38 PM
                                                Lukasz Cwik
                                                <[email protected]
                                                <mailto:[email protected]>>
                                                wrote:
                                                >>>
                                                >>>
                                                >>>
                                                >>> On Thu, May 23,
                                                2019 at 11:37 AM Rui
                                                Wang
                                                <[email protected]
                                                <mailto:[email protected]>>
                                                wrote:
                                                >>>>>
                                                >>>>> A few obvious
                                                problems with this code:
                                                >>>>>   1. Removing
                                                the elements already
                                                processed from the
                                                bag requires clearing
                                                and rewriting the
                                                entire bag. This is
                                                O(n^2) in the number
                                                of input trades.
                                                >>>>
                                                >>>> why it's not O(2
                                                * n) to clearing and
                                                rewriting trade state?
                                                >>>>
                                                >>>>>
                                                >>>>> public
                                                interface
                                                SortedMultimapState<K,
                                                V> extends State {
                                                >>>>>   // Add a
                                                value to the map.
                                                >>>>>   void put(K
                                                key, V value);
                                                >>>>>   // Get all
                                                values for a given key.
                                                >>>>>
                                                 ReadableState<Iterable<V>>
                                                get(K key);
                                                >>>>>  // Return all
                                                entries in the map.
                                                >>>>>
                                                 ReadableState<Iterable<KV<K,
                                                V>>> allEntries();
                                                >>>>>   // Return all
                                                entries in the map
                                                with keys <= limit.
                                                returned elements are
                                                sorted by the key.
                                                >>>>>
                                                 ReadableState<Iterable<KV<K,
                                                V>>> entriesUntil(K
                                                limit);
                                                >>>>>
                                                >>>>>  // Remove all
                                                values with the given
                                                key;
                                                >>>>>   void remove(K
                                                key);
                                                >>>>>  // Remove all
                                                entries in the map
                                                with keys <= limit.
                                                >>>>>   void
                                                removeUntil(K limit);
                                                >>>>
                                                >>>> Will
                                                removeUntilExcl(K
                                                limit) also useful?
                                                It will remove all
                                                entries in the map
                                                with keys < limit.
                                                >>>>
                                                >>>>>
                                                >>>>> Runners will
                                                sort based on the
                                                encoded value of the
                                                key. In order to make
                                                this easier for
                                                users, I propose that
                                                we introduce a new
                                                tag on Coders
                                                PreservesOrder. A
                                                Coder that contains
                                                this tag guarantees
                                                that the encoded
                                                value preserves the
                                                same ordering as the
                                                base Java type.
                                                >>>>
                                                >>>>
                                                >>>> Could you
                                                clarify what is 
                                                "encoded value
                                                preserves the same
                                                ordering as the base
                                                Java type"?
                                                >>>
                                                >>>
                                                >>> Lets say A and B
                                                represent two
                                                different instances
                                                of the same Java type
                                                like a double, then A
                                                < B (using the
                                                languages comparison
                                                operator) iff
                                                encode(A) < encode(B)
                                                (note the encoded
                                                versions are compared
                                                lexicographically)
                                                >>
                                                >>
                                                >> Since coders are
                                                shared across SDKs,
                                                do we expect A < B
                                                iff e(A) < e(P)
                                                property to hold for
                                                all languages we
                                                support? What happens
                                                A, B sort differently
                                                in different languages?
                                                >
                                                >
                                                > That would have to
                                                be the property of
                                                the coder (which
                                                means that this
                                                property probably
                                                needs to be
                                                represented in the
                                                portability
                                                representation of the
                                                coder). I imagine the
                                                common use cases will
                                                be for simple coders
                                                like int, long,
                                                string, etc., which
                                                are likely to sort
                                                the same in most
                                                languages.

                                                The standard coders
                                                for both double and
                                                integral types do not
                                                respect
                                                the natural ordering
                                                (consider negative
                                                values). KV coders
                                                violate the
                                                "natural"
                                                lexicographic
                                                ordering on
                                                components as well. I
                                                think
                                                implicitly sorting on
                                                encoded value would
                                                yield many surprises.
                                                (The
                                                state, of course,
                                                could take a
                                                order-preserving, bytes
                                                (string?)-producing
                                                callable as a
                                                parameter of course).
                                                (As for
                                                naming, I'd probably
                                                call this
                                                OrderedBagState or
                                                something like
                                                that...rather than
                                                Map which tends to
                                                imply random access.)

Re: DISCUSS: Sorted MapState API

Reply via email to