Baunsgaard opened a new pull request, #2511: URL: https://github.com/apache/systemds/pull/2511
Introduce a DELTA file format that reads and writes Delta Lake tables natively through the Spark-free Delta Kernel library, for matrices on the single-node CP path. DML read/write with format="delta" now operates directly on Delta tables without a Spark DataFrame round-trip. - Add FileFormat.DELTA and exclude it from the text formats - Accept format="delta" with unknown dimensions in the parser (like CSV) and set blocksize -1 for the columnar format - Wire DELTA into the matrix reader and writer factories - Add DeltaKernelUtils plus ReaderDelta and WriterDelta with column-at-a-time, boxing-free data transfer - Refresh cached matrix metadata after a Delta read (discovered dimensions) - Pin parquet 1.13.1 and jackson core/annotations to 2.15.2 to align with Delta Kernel's transitive requirements Append/overwrite table semantics, distributed execution, frames, and time travel are out of scope. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
