mchades commented on code in PR #9173:
URL: https://github.com/apache/gravitino/pull/9173#discussion_r2596939589
##########
docs/lakehouse-generic-catalog.md:
##########
@@ -0,0 +1,587 @@
+---
+title: "Generic Lakehouse Catalog"
+slug: /lakehouse-generic-catalog
+keywords:
+ - lakehouse
+ - lance
+ - metadata
+ - generic catalog
+ - file system
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Generic Lakehouse Catalog is a Gravitino catalog implementation designed
to seamlessly integrate with lakehouse storage systems built on file
system-based architectures. This catalog enables unified metadata management
for lakehouse tables stored on various storage backends, providing a consistent
interface for data discovery, governance, and access control.
+
+### What is a Lakehouse?
+
+A lakehouse combines the best features of data lakes and data warehouses:
+
+- **Data Lake Benefits**:
+ - Low-cost storage for massive volumes of raw data
+ - Support for diverse data formats (structured, semi-structured,
unstructured)
+ - Decoupled storage and compute for flexible scaling
+
+- **Data Warehouse Benefits**:
+ - ACID transactions for data consistency
+ - Schema enforcement and evolution
+ - High-performance analytical queries
+ - Time travel and versioning
+
+### Supported Storage Systems
+
+The catalog works with lakehouse systems built on top of:
+
+**Storage Backends:**
+- **Object Stores:** Amazon S3, Azure Blob Storage, Google Cloud Storage, MinIO
+- **Distributed File Systems:** HDFS, Apache Ozone
+- **Local File Systems:** For development and testing
+
+**Lakehouse Formats:**
+- **Lance** ✅ (We only support Lance format fully at present)
+
+:::info Current Support Status
+While the architecture is designed to support various lakehouse formats,
Gravitino currently provides **native production support only for Lance-based
lakehouse systems** with comprehensive testing and optimization.
+:::
+
+### Why Use Generic Lakehouse Catalog?
+
+1. **Unified Metadata Management**: Single source of truth for table metadata
across multiple storage backends
+2. **Multi-Format Support**: Extensible architecture to support various
lakehouse table formats
+3. **Storage Flexibility**: Work with any file system - local, HDFS, or cloud
object stores
+4. **Gravitino Integration**: Leverage Gravitino's access control, lineage
tracking, and data discovery
+5. **Easy Migration**: Register existing lakehouse tables without data movement
+
+### System Requirements
+
+**Storage Requirements:**
+- Lakehouse storage system must support standard file system operations:
+ - Directory listing and navigation
+ - File reading and writing with atomic operations
+ - File deletion and renaming
+ - Path-based access control (optional but recommended)
+
+**Gravitino Requirements:**
+- Gravitino server version 1.1.0 or later
+- Configured metalake for catalog creation
+- Appropriate permissions for catalog management
+
+**Network Requirements:**
+- Network connectivity between Gravitino server and storage backend
+- For cloud storage: Internet access and valid credentials
+- For HDFS: Proper Hadoop configuration and network access
+
+## Catalog Management
+
+### Capabilities
+
+The Generic Lakehouse Catalog provides comprehensive relational metadata
management capabilities equivalent to standard relational catalogs:
+
+**Supported Operations:**
+- ✅ Create, read, update, and delete catalogs
+- ✅ List all catalogs in a metalake
+- ✅ Manage catalog properties and metadata
+- ✅ Set and modify catalog locations
+- ✅ Configure storage backend credentials
+
+For detailed information on available operations, see [Manage Relational
Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md).
+
+### Properties
+
+#### Required Properties
+
+| Property | Description | Example
| Required |
Review Comment:
should add `Since Version` column?
##########
docs/lakehouse-generic-catalog.md:
##########
Review Comment:
you should also update the doc
https://github.com/apache/gravitino/blob/main/docs/manage-relational-metadata-using-gravitino.md
##########
docs/lance-rest-service.md:
##########
@@ -0,0 +1,397 @@
+---
+title: "Lance REST service"
+slug: /lance-rest-service
+keywords:
+ - Lance REST
+ - Lance datasets
+ - REST API
+license: "This software is licensed under the Apache License version 2."
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## Overview
+
+The Lance REST service provides a RESTful interface for managing Lance
datasets through HTTP endpoints. Introduced in Gravitino version 1.1.0, this
service enables seamless interaction with Lance datasets for data operations
and metadata management.
+
+The service implements the [Lance REST API
specification](https://editor-next.swagger.io/?url=https://raw.githubusercontent.com/lancedb/lance-namespace/refs/heads/main/docs/src/rest.yaml).
For detailed specification documentation, see the [official Lance REST
documentation](https://lance.org/format/namespace/impls/rest/).
+
+### What is Lance?
+
+[Lance](https://lancedb.github.io/lance/) is a modern columnar data format
designed for AI/ML workloads. It provides:
+
+- **High-performance vector search**: Native support for similarity search on
high-dimensional embeddings
+- **Columnar storage**: Optimized for analytical queries and machine learning
pipelines
+- **Fast random access**: Efficient row-level operations unlike traditional
columnar formats
+- **Version control**: Built-in dataset versioning and time-travel capabilities
+- **Incremental updates**: Append and update data without full rewrites
+
+### Architecture
+
+The Lance REST service acts as a bridge between Lance datasets and
applications:
+
+```
+┌─────────────────┐
+│ Applications │
+│ (Python/Java) │
+└────────┬────────┘
+ │ HTTP/REST
+ ▼
+┌─────────────────┐
+│ Lance REST │◄──── Gravitino Metalake
+│ Service │ (Metadata Backend)
+└────────┬────────┘
+ │ File System Operations
+ ▼
+┌─────────────────┐
+│ Lance Datasets │
+│ (S3/HDFS/Local) │
+└─────────────────┘
+```
+
+**Key Features:**
+- Full compliance with Lance REST API specification
+- Can run standalone or integrated with Gravitino server
+- Support for namespace and table management
+- Index creation and management capabilities
+- Metadata stored in Gravitino for unified governance
+
+## Supported Operations
+
+The Lance REST service provides comprehensive support for namespace
management, table management, and index operations. The table below lists all
supported operations:
+
+| Operation | Description
| HTTP Method | Endpoint Pattern | Since Version |
+|-------------------|-------------------------------------------------------------------|-------------|-------------------------------------|---------------|
+| CreateNamespace | Create a new Lance namespace
| POST | `/lance/v1/namespace/{id}/create` | 1.1.0 |
+| ListNamespaces | List all namespaces under a parent namespace
| GET | `/lance/v1/namespace/{parent}/list` | 1.1.0 |
+| DescribeNamespace | Retrieve detailed information about a specific namespace
| POST | `/lance/v1/namespace/{id}/describe` | 1.1.0 |
+| DropNamespace | Delete a namespace
| POST | `/lance/v1/namespace/{id}/drop` | 1.1.0 |
+| NamespaceExists | Check whether a namespace exists
| POST | `/lance/v1/namespace/{id}/exists` | 1.1.0 |
+| ListTables | List all tables in a namespace
| GET | `/lance/v1/table/{namespace}/list` | 1.1.0 |
+| CreateTable | Create a new table in a namespace
| POST | `/lance/v1/table/{id}/create` | 1.1.0 |
+| DropTable | Delete a table including both metadata and data
| POST | `/lance/v1/table/{id}/drop` | 1.1.0 |
+| TableExists | Check whether a table exists
| POST | `/lance/v1/table/{id}/exists` | 1.1.0 |
+| RegisterTable | Register an existing Lance table to a namespace
| POST | `/lance/v1/table/{id}/register` | 1.1.0 |
+| DeregisterTable | Unregister a table from a namespace (metadata only, data
remains) | POST | `/lance/v1/table/{id}/deregister` | 1.1.0 |
+
+### Operation Details
+
+#### Namespace Operations
+
+**CreateNamespace** supports three modes:
+- `create`: Fails if namespace already exists
+- `exist_ok`: Succeeds even if namespace exists
+- `overwrite`: Replaces existing namespace
+
+**DropNamespace** behavior:
+- Recursively deletes all child namespaces and tables
+- Deletes both metadata and Lance data files
+- Operation is irreversible
+
+#### Table Operations
+
+**RegisterTable vs CreateTable**:
+- **RegisterTable**: Links existing Lance datasets into Gravitino catalog
without data movement
+- **CreateTable**: Creates new Lance table with schema and writes data files
+
+**DropTable vs DeregisterTable**:
+- **DropTable**: Permanently deletes metadata and data files from storage
+- **DeregisterTable**: Removes metadata from Gravitino but preserves Lance
data files
+
+:::note
+Index deletion is not supported in version 1.1.0.
+:::
+
+## Deployment
+
+### Running with Gravitino Server
+
+To enable the Lance REST service within Gravitino server, configure the
following properties in your Gravitino configuration file:
+
+| Configuration Property | Description
| Default Value |
Required | Since Version |
+|-------------------------------------------|------------------------------------------------------------------------------|-------------------------|----------|---------------|
+| `gravitino.auxService.names` | Auxiliary services to run.
Include `lance-rest` to enable Lance REST service | iceberg-rest,lance-rest |
Yes | 0.2.0 |
+| `gravitino.lance-rest.classpath` | Classpath for Lance REST
service, relative to Gravitino home directory | lance-rest-server/libs |
Yes | 1.1.0 |
+| `gravitino.lance-rest.httpPort` | Port number for Lance REST
service | 9101 |
Yes | 1.1.0 |
+| `gravitino.lance-rest.host` | Hostname for Lance REST service
| 0.0.0.0 | Yes
| 1.1.0 |
+| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend
(currently only `gravitino` is supported) | gravitino |
Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino-uri` | Gravitino server URI (required
when namespace-backend is `gravitino`) | http://localhost:8090 | Yes
| 1.1.0 |
+| `gravitino.lance-rest.gravitino-metalake` | Gravitino metalake name
(required when namespace-backend is `gravitino`) | (none)
| Yes | 1.1.0 |
+
+**Example Configuration:**
+
+```properties
+gravitino.auxService.names = lance-rest
+gravitino.lance-rest.httpPort = 9101
+gravitino.lance-rest.host = 0.0.0.0
+gravitino.lance-rest.namespace-backend = gravitino
+gravitino.lance-rest.gravitino.uri = http://localhost:8090
+gravitino.lance-rest.gravitino.metalake-name = my_metalake
+```
+
+### Running Standalone
+
+To run Lance REST service independently without Gravitino server:
+
+```shell
+{GRAVITINO_HOME}/bin/gravitino-lance-rest-server.sh start
+```
+
+Configure the service by editing `gravitino-lance-rest-server.conf` or passing
command-line arguments:
+
+| Configuration Property | Description
| Default Value | Required | Since Version |
+|------------------------------------------------|-----------------------------|-----------------------|----------|---------------|
+| `gravitino.lance-rest.namespace-backend` | Namespace metadata backend
| gravitino | Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino.uri` | Gravitino server URI
| http://localhost:8090 | Yes | 1.1.0 |
+| `gravitino.lance-rest.gravitino.metalake-name` | Gravitino metalake name
| (none) | Yes | 1.1.0 |
+| `gravitino.lance-rest.httpPort` | Service port number
| 9101 | No | 1.1.0 |
+| `gravitino.lance-rest.host` | Service hostname
| 0.0.0.0 | No | 1.1.0 |
+
+:::tip
+In most cases, you only need to configure
`gravitino.lance-rest.gravitino.metalake-name`. Other properties can use their
default values.
+:::
+
+## Usage Guidelines
+
+When using Lance REST service with Gravitino backend, keep the following
considerations in mind:
+
+### Prerequisites
+- A running Gravitino server with a created metalake
+- A generic-lakehouse catalog created in Gravitino metalake
+
+### Namespace Hierarchy
+Gravitino follows a three-level hierarchy: **catalog → schema → table**. When
creating namespaces or tables:
+
+1. **Parent must exist:** Before creating `lance_catalog/schema`, ensure
`lance_catalog` catalog exists in Gravitino metalake
+2. **Two-level limit:** You can create `lance_catalog/schema`, but **not**
`lance_catalog/schema/sub_schema`
+3. **Table placement:** Tables can only be created under
`lance_catalog/schema`, not at catalog level
+
+**Example Hierarchy:**
+```
+metalake
+└── lance_catalog (catalog - must pre-exist in Gravitino)
+ └── schema (namespace - create via REST)
+ └── table01 (table - create via REST)
+```
+
+### Delimiter Convention
+
+The Lance REST API uses `$` as the default delimiter to separate namespace
levels in URIs. When making HTTP requests:
+
+- **URL Encoding Required**: `$` must be URL-encoded as `%24`
+- **Example**: `lance_catalog$schema$table01` becomes
`lance_catalog%24schema%24table01` in URLs
+
+**Common Delimiters:**
+```
+Namespace path: lance_catalog.schema.table01
+URI representation: lance_catalog$schema$table01
+URL encoded: lance_catalog%24schema%24table01
+```
+
+:::caution Important Limitations
+- Currently supports only **two levels of namespaces** before tables
+- Tables **cannot** be nested deeper than schema level
+- Parent catalog must be created in Gravitino before using Lance REST API
+- Metadata operations require Gravitino server to be available
+- Namespace deletion is recursive and irreversible
+:::
+- Currently supports only **two levels of namespaces** before tables
+- Tables **cannot** be nested deeper than schema level
+- Parent catalog must be created in Gravitino before using Lance REST API
+:::
Review Comment:
Duplicated content
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]