RafaelHerrero commented on PR #21260:
URL: https://github.com/apache/datafusion/pull/21260#issuecomment-4163846798

   Hi @alamb, I dug into the sqllogictest version mismatch and found a way to 
update the extended tests without needing the Omega359 fork.
   
   The problem with the fork approach:
   The regenerate_sqlite_files.sh script swaps in a forked sqllogictest at 
v0.27.2, but main now uses v0.29.1. The APIs changed, so the fork doesn't 
compile.
   
   What I found:
   The standard --complete mode (v0.29.1) works and generates correct results, 
but it has two issues with the SQLite test files:
     1. It doesn't respect control resultmode valuewise — writes 
space-separated rows instead of one value per line
     2. Hash values get computed with the wrong sort order for newly-generated 
blocks
   
   The fix (2-pass approach):
     1. Run --complete on all SQLite files to generate results
     2. Post-process with a script that only touches blocks that were query 
error ... Projections require unique expression names — converts results to 
valuewise format with valuesort, leaving all existing blocks untouched
     3. Run --complete again so hash values are recomputed with the correct 
sort mode
     4. Post-process again to re-apply valuewise format
   
   Results:
     - All 595 SQLite test files pass locally
     - 279 files in datafusion-testing need updating (~39,706 query error 
blocks that now succeed or have a different error)
     - No changes needed to datafusion beyond the code fix (revert of revert)
   
   Full disclosure: I used Claude Code to help debug this and write the 
post-processing script — it took quite a bit of iteration to figure out the 
valuewise format and hash issues.
   
   Proposed next steps:
     1. Open a PR to apache/datafusion-testing with the 279 updated .slt files
     2. Once merged, update the submodule reference in this PR
   
   Let me know if this approach looks good and I'll open the datafusion-testing 
PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to