Re: [PR] Add Polaris benchmarks [polaris-tools]

via GitHub Tue, 01 Apr 2025 10:10:11 -0700


pingtimeout commented on code in PR #2:
URL: https://github.com/apache/polaris-tools/pull/2#discussion_r2023142049



##########
benchmarks/README.md:
##########
@@ -0,0 +1,242 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Polaris Benchmarks
+
+This repository contains benchmarks for the Polaris service using Gatling.
+
+## Available Benchmarks
+
+### Dataset Creation Benchmark
+
+The CreateTreeDataset benchmark creates a test dataset with a specific 
structure. It exists in two variants:
+
+- `org.apache.polaris.benchmarks.simulations.CreateTreeDatasetSequential`: 
Creates entities one at a time
+- `org.apache.polaris.benchmarks.simulations.CreateTreeDatasetConcurrent`: 
Creates up to 50 entities simultaneously
+
+These are write-only workloads designed to populate the system for subsequent 
benchmarks.
+
+### Read/Update Benchmark
+
+The ReadUpdateTreeDataset benchmark tests read and update operations on an 
existing dataset. It exists in two variants:
+
+- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDatasetSequential`: 
Performs read/update operations one at a time
+- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDatasetConcurrent`: 
Performs up to 20 read/update operations simultaneously
+
+These benchmarks can only be run after using CreateTreeDataset to populate the 
system.
+
+## Parameters
+
+All parameters are configured through the `application.conf` file located in 
`src/gatling/resources/`. The configuration uses the [Typesafe 
Config](https://github.com/lightbend/config) format.
+
+### Dataset Structure Parameters
+
+These parameters must be consistent across all benchmarks and are configured 
under `dataset.tree`:
+
+```hocon
+dataset.tree {
+  num-catalogs = 1                               # Number of catalogs to create
+  namespace-width = 2                            # Width of the namespace tree
+  namespace-depth = 4                            # Depth of the namespace tree
+  tables-per-namespace = 5                       # Tables per namespace
+  views-per-namespace = 3                        # Views per namespace
+  columns-per-table = 10                         # Columns per table
+  columns-per-view = 10                          # Columns per view
+  default-base-location = "file:///tmp/polaris"  # Base location for datasets
+  namespace-properties = 10                      # Number of properties to add 
to each namespace
+  table-properties = 10                          # Number of properties to add 
to each table
+  view-properties = 10                           # Number of properties to add 
to each view
+  max-tables = -1                                # Maximum tables (-1 for 
unlimited)

Review Comment:
   Ha, good catch.  Those two properties cap the total number of tables that 
would otherwise be created/used.
   
   Example for the dataset creation benchmark:
   * with `width=2`, `depth=20`, `tables-per-namespace=4` and `max-tables=-1`
       * 1048575 namespaces are created
       * there are 524288 leaf namespaces
       * and 4 tables in each leaf namespaces are created/used, so a total of 
2097152 tables
   * with `width=2`, `depth=20`, `tables-per-namespace=4` and 
`max-tables=1000000`
       * 1048575 namespaces are created
       * there are 524288 leaf namespaces
       * and 4 tables in each leaf namespaces are created until 1 million 
tables have been created, then the benchmark stops
   
   I should probably add that to the README...



##########
benchmarks/build.gradle.kts:
##########
@@ -0,0 +1,29 @@
+plugins {
+    scala
+    id("io.gatling.gradle") version "3.13.5.2"
+    id("com.diffplug.spotless") version "7.0.2"
+}
+
+description = "Polaris Iceberg REST API performance tests"
+
+tasks.withType<ScalaCompile> {
+    scalaCompileOptions.forkOptions.apply {
+        jvmArgs = listOf("-Xss100m") // Scala compiler may require a larger 
stack size when compiling Gatling simulations

Review Comment:
   I don't think it is necessary, tbh, given that the initial PR did not have 
that and the benchmarks were compiling.  But this comes from the canonical 
example of gatling-gradle plugin 
https://github.com/gatling/gatling-gradle-plugin-demo-scala/blob/main/build.gradle.
  So I kept it.  



##########
benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/AuthenticationActions.scala:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.polaris.benchmarks.actions
+
+import io.gatling.core.Predef._
+import io.gatling.core.feeder.Feeder
+import io.gatling.core.structure.ChainBuilder
+import io.gatling.http.Predef._
+import org.apache.polaris.benchmarks.RetryOnHttpCodes.{
+  retryOnHttpStatus,
+  HttpRequestBuilderWithStatusSave
+}
+import org.apache.polaris.benchmarks.parameters.ConnectionParameters
+import org.slf4j.LoggerFactory
+
+import java.util.concurrent.atomic.AtomicReference
+
+/**
+ * Actions for performance testing authentication operations. This class 
provides methods to
+ * authenticate and manage access tokens for API requests.
+ *
+ * @param cp Connection parameters containing client credentials
+ * @param accessToken Reference to the authentication token shared across 
actions
+ * @param maxRetries Maximum number of retry attempts for failed operations
+ * @param retryableHttpCodes HTTP status codes that should trigger a retry
+ */
+case class AuthenticationActions(

Review Comment:
   I assume you are talking about the `*Actions.scala` classes.  These classes 
regroup the elements that are needed to perform certain actions based on either 
a feature (e.g. authenticating) or an entity (e.g. tables).
   
   They are not listed in the README, indeed, as they are an implementation 
detail.  This is also where the configuration is consumed, not defined.



##########
benchmarks/README.md:
##########
@@ -0,0 +1,242 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Polaris Benchmarks
+
+This repository contains benchmarks for the Polaris service using Gatling.
+
+## Available Benchmarks
+
+### Dataset Creation Benchmark
+
+The CreateTreeDataset benchmark creates a test dataset with a specific 
structure. It exists in two variants:
+
+- `org.apache.polaris.benchmarks.simulations.CreateTreeDatasetSequential`: 
Creates entities one at a time
+- `org.apache.polaris.benchmarks.simulations.CreateTreeDatasetConcurrent`: 
Creates up to 50 entities simultaneously
+
+These are write-only workloads designed to populate the system for subsequent 
benchmarks.
+
+### Read/Update Benchmark
+
+The ReadUpdateTreeDataset benchmark tests read and update operations on an 
existing dataset. It exists in two variants:
+
+- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDatasetSequential`: 
Performs read/update operations one at a time
+- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDatasetConcurrent`: 
Performs up to 20 read/update operations simultaneously
+
+These benchmarks can only be run after using CreateTreeDataset to populate the 
system.
+
+## Parameters
+
+All parameters are configured through the `application.conf` file located in 
`src/gatling/resources/`. The configuration uses the [Typesafe 
Config](https://github.com/lightbend/config) format.
+
+### Dataset Structure Parameters
+
+These parameters must be consistent across all benchmarks and are configured 
under `dataset.tree`:
+
+```hocon
+dataset.tree {
+  num-catalogs = 1                               # Number of catalogs to create
+  namespace-width = 2                            # Width of the namespace tree
+  namespace-depth = 4                            # Depth of the namespace tree
+  tables-per-namespace = 5                       # Tables per namespace
+  views-per-namespace = 3                        # Views per namespace
+  columns-per-table = 10                         # Columns per table
+  columns-per-view = 10                          # Columns per view
+  default-base-location = "file:///tmp/polaris"  # Base location for datasets
+  namespace-properties = 10                      # Number of properties to add 
to each namespace
+  table-properties = 10                          # Number of properties to add 
to each table
+  view-properties = 10                           # Number of properties to add 
to each view
+  max-tables = -1                                # Maximum tables (-1 for 
unlimited)
+  max-views = -1                                 # Maximum views (-1 for 
unlimited)
+}
+```
+
+### Connection Parameters
+
+Connection settings are configured under `http` and `auth`:
+
+```hocon
+http {
+  base-url = "http://localhost:8181";  # Service URL
+}
+
+auth {
+  client-id = null      # Required: OAuth2 client ID
+  client-secret = null  # Required: OAuth2 client secret
+}
+```
+
+### Workload Parameters
+
+Workload settings are configured under `workload`:
+
+```hocon
+workload {
+  read-write-ratio = 0.8  # Ratio of reads (0.0-1.0)
+}
+```
+
+## Running the Benchmarks
+
+The benchmark uses [typesafe-config](https://github.com/lightbend/config) for 
configuration management. Default settings are in 
`src/gatling/resources/benchmark-defaults.conf`. This file should not be 
modified directly.
+
+To customize the benchmark settings, create your own `application.conf` file 
and specify it using the `-Dconfig.file` parameter. Your settings will override 
the default values.
+
+Example `application.conf`:
+```hocon
+auth {
+  client-id = "your-client-id"
+  client-secret = "your-client-secret"
+}
+
+http {
+  base-url = "http://your-polaris-instance:8181";
+}
+
+workload {
+  read-write-ratio = 0.8
+}
+```
+
+Run benchmarks with your configuration:
+
+```bash
+# Sequential dataset creation
+./gradlew gatlingRun --simulation 
org.apache.polaris.benchmarks.simulations.CreateTreeDatasetSequential \
+  -Dconfig.file=./application.conf
+
+# Concurrent dataset creation
+./gradlew gatlingRun --simulation 
org.apache.polaris.benchmarks.simulations.CreateTreeDatasetConcurrent \
+  -Dconfig.file=./application.conf
+```
+
+A message will show the location of the Gatling report:
+```
+Reports generated in: 
./benchmarks/build/reports/gatling/<simulation-name>/index.html
+```
+
+### Example Polaris server startup
+
+For repeated testing and benchmarking purposes it's convenient to have fixed 
client-ID + client-secret combinations. **The following example is ONLY for 
testing and benchmarking against an airgapped Polaris instance**
+
+```bash
+# Start Polaris with the fixed client-ID/secret admin/admin
+# DO NEVER EVER USE THE FOLLOWING FOR ANY NON-AIRGAPPED POLARIS INSTANCE !!
+./gradlew :polaris-quarkus-server:quarkusBuild &&  java \
+  -Dpolaris.bootstrap.credentials=POLARIS,admin,admin \
+  -Djava.security.manager=allow \
+  -jar quarkus/server/build/quarkus-app/quarkus-run.jar
+```
+
+With the above you can run the benchmarks using a configuration file with 
`client-id = "admin"` and `client-secret = "admin"` - meant only for 
convenience in a fully airgapped system.
+
+# Test Dataset
+
+The benchmarks use synthetic procedural datasets that are generated 
deterministically at runtime. This means that given the same input parameters, 
the exact same dataset structure will always be generated. This approach allows 
generating large volumes of test data without having to store it, while 
ensuring reproducible benchmark results across different runs.
+
+The diagrams below describe the data sets that are used in benchmarks. Note 
that the benchmark dataset may not cover all Polaris features.
+
+## Generation rules
+
+The dataset has a tree shape. At the root of the tree is a Polaris realm that 
must exist before the dataset is created.
+
+An arbitrary number of catalogs can be created under the realm. However, only 
the first catalog (`C_0`) is used for the rest of the dataset.

Review Comment:
   I am not sure what you mean here?



##########
benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/NAryTreeBuilder.scala:
##########
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.polaris.benchmarks
+
+case class NAryTreeBuilder(nsWidth: Int, nsDepth: Int) {

Review Comment:
   You are absolutely correct.  I can add some javadoc, yes.  But it will be 
mostly a repeat of the readme without the charts that give visual examples.



##########
benchmarks/src/gatling/scala/org/apache/polaris/benchmarks/actions/CatalogActions.scala:
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.polaris.benchmarks.actions
+
+import io.gatling.core.Predef._
+import io.gatling.core.feeder.Feeder
+import io.gatling.core.structure.ChainBuilder
+import io.gatling.http.Predef._
+import org.apache.polaris.benchmarks.RetryOnHttpCodes.{
+  retryOnHttpStatus,
+  HttpRequestBuilderWithStatusSave
+}
+import org.apache.polaris.benchmarks.parameters.DatasetParameters
+import org.slf4j.LoggerFactory
+
+import java.util.concurrent.atomic.AtomicReference
+
+/**
+ * Actions for performance testing catalog operations in Apache Iceberg. This 
class provides methods
+ * to create and fetch catalogs.
+ *
+ * @param dp Dataset parameters controlling the dataset generation
+ * @param accessToken Reference to the authentication token for API requests
+ * @param maxRetries Maximum number of retry attempts for failed operations
+ * @param retryableHttpCodes HTTP status codes that should trigger a retry
+ */
+case class CatalogActions(
+    dp: DatasetParameters,
+    accessToken: AtomicReference[String],
+    maxRetries: Int = 10,
+    retryableHttpCodes: Set[Int] = Set(409, 500)
+) {
+  private val logger = LoggerFactory.getLogger(getClass)
+
+  /**
+   * Creates a Gatling Feeder that generates catalog names and their default 
storage locations. Each
+   * catalog will be named "C_n" where n is a sequential number, and will have 
a corresponding
+   * storage location under the configured base path.
+   *
+   * @return An iterator providing catalog names and their storage locations
+   */
+  def feeder(): Feeder[String] = Iterator
+    .from(0)
+    .map { i =>
+      val catalogName = s"C_$i"
+      Map(
+        "catalogName" -> catalogName,
+        "defaultBaseLocation" -> s"${dp.defaultBaseLocation}/$catalogName"
+      )
+    }
+    .take(dp.numCatalogs)
+
+  /**
+   * Creates a new Iceberg catalog with FILE storage type. The catalog is 
created as an INTERNAL
+   * type with a name and a default base location that are defined in the 
[[CatalogActions.feeder]].
+   * This represents the fundamental operation of establishing a new catalog 
in an Iceberg
+   * deployment.
+   *
+   * There is no limit to the number of users that can create catalogs 
concurrently.
+   */
+  val createCatalog: ChainBuilder =
+    retryOnHttpStatus(maxRetries, retryableHttpCodes, "Create catalog")(

Review Comment:
   It is a good question.  In my experience, Gatling benchmarks should have 
little, if any, external dependencies, so that they are easy to read.  The risk 
is for Gatling benchmarks to become only readable/editable with a fully fledged 
IDE, instead of a simple text editor or even Github UI.
   
   We _could_ extract those String constants into some sort of a Polaris 
client.  But that would be it.  The connection logic and all that would make a 
complete Polaris **client** would not be there, as they are Gatling specific.  
So I am not sure it would be net positive.



##########
benchmarks/README.md:
##########
@@ -0,0 +1,242 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+ 
+   http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Polaris Benchmarks
+
+This repository contains benchmarks for the Polaris service using Gatling.
+
+## Available Benchmarks
+
+### Dataset Creation Benchmark
+
+The CreateTreeDataset benchmark creates a test dataset with a specific 
structure. It exists in two variants:
+
+- `org.apache.polaris.benchmarks.simulations.CreateTreeDatasetSequential`: 
Creates entities one at a time
+- `org.apache.polaris.benchmarks.simulations.CreateTreeDatasetConcurrent`: 
Creates up to 50 entities simultaneously
+
+These are write-only workloads designed to populate the system for subsequent 
benchmarks.
+
+### Read/Update Benchmark
+
+The ReadUpdateTreeDataset benchmark tests read and update operations on an 
existing dataset. It exists in two variants:
+
+- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDatasetSequential`: 
Performs read/update operations one at a time
+- `org.apache.polaris.benchmarks.simulations.ReadUpdateTreeDatasetConcurrent`: 
Performs up to 20 read/update operations simultaneously
+
+These benchmarks can only be run after using CreateTreeDataset to populate the 
system.
+
+## Parameters
+
+All parameters are configured through the `application.conf` file located in 
`src/gatling/resources/`. The configuration uses the [Typesafe 
Config](https://github.com/lightbend/config) format.
+
+### Dataset Structure Parameters
+
+These parameters must be consistent across all benchmarks and are configured 
under `dataset.tree`:
+
+```hocon
+dataset.tree {
+  num-catalogs = 1                               # Number of catalogs to create
+  namespace-width = 2                            # Width of the namespace tree
+  namespace-depth = 4                            # Depth of the namespace tree
+  tables-per-namespace = 5                       # Tables per namespace
+  views-per-namespace = 3                        # Views per namespace
+  columns-per-table = 10                         # Columns per table
+  columns-per-view = 10                          # Columns per view
+  default-base-location = "file:///tmp/polaris"  # Base location for datasets
+  namespace-properties = 10                      # Number of properties to add 
to each namespace
+  table-properties = 10                          # Number of properties to add 
to each table
+  view-properties = 10                           # Number of properties to add 
to each view
+  max-tables = -1                                # Maximum tables (-1 for 
unlimited)
+  max-views = -1                                 # Maximum views (-1 for 
unlimited)
+}
+```
+
+### Connection Parameters
+
+Connection settings are configured under `http` and `auth`:
+
+```hocon
+http {
+  base-url = "http://localhost:8181";  # Service URL
+}
+
+auth {
+  client-id = null      # Required: OAuth2 client ID
+  client-secret = null  # Required: OAuth2 client secret
+}
+```
+
+### Workload Parameters
+
+Workload settings are configured under `workload`:
+
+```hocon
+workload {
+  read-write-ratio = 0.8  # Ratio of reads (0.0-1.0)
+}
+```
+
+## Running the Benchmarks
+
+The benchmark uses [typesafe-config](https://github.com/lightbend/config) for 
configuration management. Default settings are in 
`src/gatling/resources/benchmark-defaults.conf`. This file should not be 
modified directly.
+
+To customize the benchmark settings, create your own `application.conf` file 
and specify it using the `-Dconfig.file` parameter. Your settings will override 
the default values.
+
+Example `application.conf`:
+```hocon
+auth {
+  client-id = "your-client-id"
+  client-secret = "your-client-secret"
+}
+
+http {
+  base-url = "http://your-polaris-instance:8181";
+}
+
+workload {
+  read-write-ratio = 0.8
+}
+```
+
+Run benchmarks with your configuration:
+
+```bash
+# Sequential dataset creation
+./gradlew gatlingRun --simulation 
org.apache.polaris.benchmarks.simulations.CreateTreeDatasetSequential \
+  -Dconfig.file=./application.conf
+
+# Concurrent dataset creation
+./gradlew gatlingRun --simulation 
org.apache.polaris.benchmarks.simulations.CreateTreeDatasetConcurrent \
+  -Dconfig.file=./application.conf
+```
+
+A message will show the location of the Gatling report:
+```
+Reports generated in: 
./benchmarks/build/reports/gatling/<simulation-name>/index.html
+```
+
+### Example Polaris server startup
+
+For repeated testing and benchmarking purposes it's convenient to have fixed 
client-ID + client-secret combinations. **The following example is ONLY for 
testing and benchmarking against an airgapped Polaris instance**
+
+```bash
+# Start Polaris with the fixed client-ID/secret admin/admin
+# DO NEVER EVER USE THE FOLLOWING FOR ANY NON-AIRGAPPED POLARIS INSTANCE !!
+./gradlew :polaris-quarkus-server:quarkusBuild &&  java \
+  -Dpolaris.bootstrap.credentials=POLARIS,admin,admin \
+  -Djava.security.manager=allow \
+  -jar quarkus/server/build/quarkus-app/quarkus-run.jar
+```
+
+With the above you can run the benchmarks using a configuration file with 
`client-id = "admin"` and `client-secret = "admin"` - meant only for 
convenience in a fully airgapped system.
+
+# Test Dataset
+
+The benchmarks use synthetic procedural datasets that are generated 
deterministically at runtime. This means that given the same input parameters, 
the exact same dataset structure will always be generated. This approach allows 
generating large volumes of test data without having to store it, while 
ensuring reproducible benchmark results across different runs.
+
+The diagrams below describe the data sets that are used in benchmarks. Note 
that the benchmark dataset may not cover all Polaris features.
+
+## Generation rules
+
+The dataset has a tree shape. At the root of the tree is a Polaris realm that 
must exist before the dataset is created.
+
+An arbitrary number of catalogs can be created under the realm. However, only 
the first catalog (`C_0`) is used for the rest of the dataset.
+
+The namespaces part of the dataset is a complete `N`-ary tree. That is, it 
starts with a root namespace (`NS_0`) and then, each namespace contains exactly 
`0` or `N` children namespaces. The width as well as the depth of the 
namespaces tree are configurable. The total number of namespaces can easily be 
calculated with the following formulae, where `N` is the tree arity and `D` is 
the total tree depth, including the root:

Review Comment:
   Good point, changing that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add Polaris benchmarks [polaris-tools]

Reply via email to