Github user facaiy commented on the issue:

    https://github.com/apache/spark/pull/19666
  
    Hi, I write a demo with python. I'll be happy if it could be useful.
    
    For N bins, say `[x_1, x_2, ..., x_N]`, since all its splits contain either 
`x_1` or not, so we can choose the half splits which doesn't contain x_1 as 
left splits.
    If I understand it correctly, the left splits are indeed all combinations 
of the left bins, `[x_2, x_3, ... x_N]`. The problem can be solved by the 
[backtracking algorithm](https://en.wikipedia.org/wiki/Backtracking).
    
    Please correct me if I'm wrong. Thanks very much.
    
    ```python
    #!/usr/bin/env python
    
    def gen_splits(bins):
        if len(bins) == 1:
            return bins
        results = []
        partial_res = []
        gen_splits_iter(1, bins, partial_res, results)
        return results
    
    
    def gen_splits_iter(dep, bins, partial_res, results):
        if partial_res:
            left_splits = partial_res[:]
            right_splits = [x for x in bins if x not in left_splits]
            results.append("left: {:20}, right: {}".format(str(left_splits), 
right_splits))
    
        for m in range(dep, len(bins)):
            partial_res.append(bins[m])
            gen_splits_iter(m+1, bins, partial_res, results)
            partial_res.pop()
    
    
    if __name__ == "__main__":
        print("first example:")
        bins = ["a", "b", "c"]
        print("bins: {}\n-----".format(bins))
        splits = gen_splits(bins)
        for s in splits:
            print(s)
    
        print("\n\n=============")
        print("second example:")
        bins = ["a", "b", "c", "d", "e"]
        print("bins: {}\n-----".format(bins))
        splits = gen_splits(bins)
        for s in splits:
            print(s)
    ```
    
    logs:
    ```bash
    ~/Downloads ❯❯❯ python test.py
    first example:
    bins: ['a', 'b', 'c']
    -----
    left: ['b']               , right: ['a', 'c']
    left: ['b', 'c']          , right: ['a']
    left: ['c']               , right: ['a', 'b']
    
    
    =============
    second example:
    bins: ['a', 'b', 'c', 'd', 'e']
    -----
    left: ['b']               , right: ['a', 'c', 'd', 'e']
    left: ['b', 'c']          , right: ['a', 'd', 'e']
    left: ['b', 'c', 'd']     , right: ['a', 'e']
    left: ['b', 'c', 'd', 'e'], right: ['a']
    left: ['b', 'c', 'e']     , right: ['a', 'd']
    left: ['b', 'd']          , right: ['a', 'c', 'e']
    left: ['b', 'd', 'e']     , right: ['a', 'c']
    left: ['b', 'e']          , right: ['a', 'c', 'd']
    left: ['c']               , right: ['a', 'b', 'd', 'e']
    left: ['c', 'd']          , right: ['a', 'b', 'e']
    left: ['c', 'd', 'e']     , right: ['a', 'b']
    left: ['c', 'e']          , right: ['a', 'b', 'd']
    left: ['d']               , right: ['a', 'b', 'c', 'e']
    left: ['d', 'e']          , right: ['a', 'b', 'c']
    left: ['e']               , right: ['a', 'b', 'c', 'd']
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to