yuqi1129 commented on code in PR #9623: URL: https://github.com/apache/gravitino/pull/9623#discussion_r2681094716
########## docs/lance-rest-integration.md: ########## @@ -0,0 +1,130 @@ +--- +title: "Lance REST integration with Spark and Ray" +slug: /lance-rest-spark-ray-integration +keywords: + - lance + - lance-rest + - spark + - ray + - integration +license: "This software is licensed under the Apache License version 2." +--- + +## Overview + +This guide shows how to use the Lance REST service from Apache Gravitino with the [Lance Spark connector](https://lance.org/integrations/spark/) (`lance-spark`) and the [Lance Ray connector](https://lance.org/integrations/ray/) (`lance-ray`). It builds on the Lance REST service setup described in [Lance REST service](./lance-rest-service). + +## Compatibility matrix + +| Gravitino version (Lance REST) | Supported lance-spark versions | Supported lance-ray versions | +|--------------------------------|--------------------------------|------------------------------| +| 1.1.1 | 0.0.10 – 0.0.15 | 0.0.6 – 0.0.8 | + +These version ranges represent combinations that are expected to be compatible. Only a subset of versions within each range may have been explicitly tested, so you should verify a specific connector version in your own environment. + +Why does we need to maintain compatibility matrix? As Lance and Lance connectors are actively developed, some APIs and features may change over time. Gravitino's Lance REST service relies on specific versions of these connectors to ensure seamless integration and functionality. Using incompatible versions may lead to unexpected behavior or errors. + + +## Prerequisites + +- Gravitino server running with Lance REST service enabled (default endpoint: `http://localhost:9101/lance`). +- A Lance catalog created in Gravitino via Lance REST namespace API(see `CreateNamespace` in [docs](./lance-rest-service.md)) or Gravitino REST API, for example `lance_catalog`. +- Downloaded `lance-spark` bundle JAR that matches your Spark version (set the absolute path in the examples below). +- Python environments with required packages: + - Spark: `pyspark` + - Ray: `ray`, `lance-namespace`, `lance-ray` + +## Using Lance REST with Spark + +The example below starts a local PySpark session that talks to Lance REST and creates a table through Spark SQL. + +```python +from pyspark.sql import SparkSession +import os +import logging +logging.basicConfig(level=logging.INFO) + +# Point to your downloaded lance-spark bundle. +os.environ["PYSPARK_SUBMIT_ARGS"] = "--jars /path/to/lance-spark-bundle-3.5_2.12-0.0.15.jar --conf \"spark.driver.extraJavaOptions=--add-opens=java.base/sun.nio.ch=ALL-UNNAMED\" --conf \"spark.executor.extraJavaOptions=--add-opens=java.base/sun.nio.ch=ALL-UNNAMED\" --master local[1] pyspark-shell" + +# Create the Lance catalog named "lance_catalog" in Gravitino beforehand. +spark = SparkSession.builder \ + .appName("lance_rest_example") \ + .config("spark.sql.catalog.lance", "com.lancedb.lance.spark.LanceNamespaceSparkCatalog") \ + .config("spark.sql.catalog.lance.impl", "rest") \ + .config("spark.sql.catalog.lance.uri", "http://localhost:9101/lance") \ + .config("spark.sql.catalog.lance.parent", "lance_catalog") \ + .config("spark.sql.defaultCatalog", "lance") \ + .getOrCreate() + +spark.sparkContext.setLogLevel("DEBUG") + +# Create schema and table, write, then read data. +spark.sql("create database schema") +spark.sql(""" +create table schema.sample(id int, score float) +USING lance +LOCATION '/tmp/schema/sample.lance/' Review Comment: Yes, it works with the latest release, not the main branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
