eric-maynard commented on code in PR #1503: URL: https://github.com/apache/polaris/pull/1503#discussion_r2069767577
########## site/content/in-dev/unreleased/polaris-spark-client.md: ########## @@ -0,0 +1,155 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Polaris Spark Client +type: docs +weight: 400 +--- + +Apache Polaris now provides Catalog support for Generic Tables (non-iceberg tables), please check out +the [Catalog API Spec]({{% ref "polaris-catalog-service" %}}) for Generic Table API specs. + +Along with the Generic Table Catalog support, Polaris is also releasing a Spark Client, which help +providing an end-to-end solution for Apache Spark to manage Delta tables using Polaris. + +Note the Polaris Spark Client is able to handle both Iceberg and Delta tables, not just Delta. + +This pages documents how to build and use the Apache Polaris Spark Client before formal release. + +## Prerequisite +1. Check out the polaris repo +```shell +cd ~ +git clone https://github.com/apache/polaris.git +``` +2. Spark with version >= 3.5.3 and <= 3.5.5, recommended with 3.5.5. +```shell +cd ~ +wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz +mkdir spark-3.5 +tar xzvf spark-3.5.5-bin-hadoop3.tgz -C spark-3.5 --strip-components=1 +cd spark-3.5 +``` + +All Spark Client code is available under `plugins/spark` of the polaris repo. + +## Quick Start with Local Polaris Service +If you want to quickly try out the functionality with a local Polaris service, you can follow the instructions +in `plugins/spark/v3.5/getting-started/README.md`. + +The getting-started will start two containers: +1) The `polaris` service for running Apache Polaris using an in-memory metastore +2) The `jupyter` service for running Jupyter notebook with PySpark (Spark 3.5.5 is used) + +The notebook `SparkPolaris.ipynb` provided under `plugins/spark/v3.5/getting-started/notebooks` provides examples +with basic commands, includes: +1) Connect to Polaris using Python client to create Catalog and Roles Review Comment: ```suggestion 1) Connect to Polaris using Python client to create a Catalog and Roles ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org