jonded94 commented on PR #9374:
URL: https://github.com/apache/arrow-rs/pull/9374#issuecomment-3876339801

   Hey @etseidl :wave: 
   
   Thanks for your review! I'm not super well versed around data page decoding 
but I agree that after skipping 2 records, `[50, 60]` probably should be the 
next decoded record.. I'm unfortunately super busy with internal bugfixes right 
now, I just plugged that back into the AI slop machine and it came out with a 
fix (d738d35) slightly different from your suggestion that *does* correctly 
return `[50, 60]`:
   
   ```
     The corrected fix (addressing reviewer @etseidl's feedback):               
                                                                                
                                               
                     
     - Before: flush_partial() was called at the end of skip_records after all 
records were skipped — this only cleared the stale has_partial state but didn't 
fix the over-counting (3 records skipped instead
      of 2).         
     - After: flush_partial() is called before the num_rows whole-page-skip 
shortcut. This properly counts the pending partial record from the previous 
page, adjusts remaining_records (2→1), and prevents the
      v2 page from being entirely skipped (since rows=2 > remaining=1). The 
skip then falls through to level-based decoding, correctly skipping only 1 
record from page 2.
   
     The corrected test asserts that after skip_records(2) over 4 records 
[10,20], [30,40], [50,60], [70,80], read_records(1) returns [50, 60] (the 3rd 
record), not [70, 80] (the 4th).
   ```
   
   I'm really sorry I can't invest more time into that right now, but I hope 
what I committed is now finally correct!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to