Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/13680
  
    I think that it is not easy to put ```[not written, use offset3]``` with 
good performance. I am thinking about **two cases**.
    
    In **case 1**, my assumptions are
    * Do not initialize ```[offset area]``` before writing offsets for 
performance
    * Order of writing elements may not be ascending 
    
    Here, the following two steps are executed.
    1. writ e ```offset0```
    2. write ```offset3```
    At step 2, it is not easy to determine which ```[not written ]``` fields 
should be filled by ```[ use offset3]```. This is because we cannot assume any 
values in ```[not written]```, and then hard to recognize ```[offset0]``` has 
been written.
    ```
    offset: 0             1             2             3             4
    init :  [not written] [not written] [not written] [not written] [not 
written]
    step1:  [offset0]     [not written] [not written] [not written] [not 
written]
    step2:  [offset0]     [not written] [not written] [offset3]     [not 
written]
    ```
    
    In **case 2**, my assumptions are
    * Initialize ```[offset area]``` by `zero` before writing offsets. But, it 
may lead to performance issue.
    * Order of writing elements may not be ascending 
    
    Here, the following two steps are executed.
    1. writ e ```offset4``` and fill all of predecessor fields, which have not 
been written, by using `[use offset4]`
    2. write ```offset2```and fill all of predecessor fields, which have not 
been written, by using `[use offset2]`
    This approach always check and fill all of predecessor fields until a 
field, which have been written, is found.
    
    ```
    offset: 0             1             2             3             4
    init :  [zero       ] [zero       ] [zero       ] [zero       ] [zero       
]
    step1:  [use offset4] [use offset4] [use offset4] [use offset4] [offset4]
    step2:  [use offset2] [use offset2] [offset2]     [use offset4] [offset4]
    ```
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to