Baunsgaard opened a new pull request, #2520: URL: https://github.com/apache/systemds/pull/2520
Add scripts/databricks: a self-contained kit for deploying and running SystemDS on a Databricks cluster (tested on DBR 16.4 LTS, Spark 3.5.2, Scala 2.12, where the SystemDS jar runs unchanged). - deploy.sh: create a UC volume, upload SystemDS.jar, create a single-user cluster with the required Vector API / --add-opens JVM flags, install the Delta Kernel Maven libraries, and import the demo notebooks. All settings come from a .env file (template in .env.example); local state (.env, .cluster_id) is git-ignored. - SystemDS_MLContext_Demo.scala: Unity Catalog table round-trip via the MLContext (Scala) API with a configurable DML script and execution mode. - SystemDS_vs_SparkML_LinReg.scala: linear regression with categorical encoding (transformencode + lm) vs a Spark ML OneHotEncoder + LinearRegression pipeline, timing encode + train. - SystemDS_Delta_E2E.scala: end-to-end Delta -> transformencode -> lm, reading a Delta table natively as a frame, compared against the equivalent Spark ML pipeline; prints a per-instruction breakdown. - SystemDS_Python_Demo.py and demo.dml: minimal Python API and DML smoke tests. - README.md: setup, configuration, node-type guidance, Delta Kernel library requirement, and indicative benchmark numbers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
