Baunsgaard opened a new pull request, #2511:
URL: https://github.com/apache/systemds/pull/2511

   Introduce a DELTA file format that reads and writes Delta Lake tables 
natively through the Spark-free Delta Kernel library, for matrices on the 
single-node CP path. DML read/write with format="delta" now operates directly 
on Delta tables without a Spark DataFrame round-trip.
   
   - Add FileFormat.DELTA and exclude it from the text formats
   - Accept format="delta" with unknown dimensions in the parser (like CSV) and 
set blocksize -1 for the columnar format
   - Wire DELTA into the matrix reader and writer factories
   - Add DeltaKernelUtils plus ReaderDelta and WriterDelta with 
column-at-a-time, boxing-free data transfer
   - Refresh cached matrix metadata after a Delta read (discovered dimensions)
   - Pin parquet 1.13.1 and jackson core/annotations to 2.15.2 to align with 
Delta Kernel's transitive requirements
   
   Append/overwrite table semantics, distributed execution, frames, and time 
travel are out of scope.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to