This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push: new 00bb4ad46e37 [SPARK-46188][DOC][3.5] Fix the CSS of Spark doc's generated tables 00bb4ad46e37 is described below commit 00bb4ad46e373311a6303952f3944680b08e03d7 Author: Gengliang Wang <gengli...@apache.org> AuthorDate: Thu Nov 30 14:56:48 2023 -0800 [SPARK-46188][DOC][3.5] Fix the CSS of Spark doc's generated tables ### What changes were proposed in this pull request? After https://github.com/apache/spark/pull/40269, there is no border in the generated tables of Spark doc(for example, https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html) . This PR is to fix it by restoring part of the table style in https://github.com/apache/spark/pull/40269/files#diff-309b964023ca899c9505205f36d3f4d5b36a6487e5c9b2e242204ee06bbc9ce9L26 This PR also unifies all the styles of tables by removing the `class="table table-striped"` in HTML style tables in markdown docs. ### Why are the changes needed? Fix a regression in the table CSS of Spark docs ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually build docs and verify. Before changes: <img width="931" alt="image" src="https://github.com/apache/spark/assets/1097932/1eb7abff-65af-4c4c-bbd5-9077f38c1b43"> After changes: <img width="911" alt="image" src="https://github.com/apache/spark/assets/1097932/be77d4c6-1279-43ec-a234-b69ee02e3dc6"> ### Was this patch authored or co-authored using generative AI tooling? Generated-by: ChatGPT 4 Closes #44097 from gengliangwang/fixTable3.5. Authored-by: Gengliang Wang <gengli...@apache.org> Signed-off-by: Gengliang Wang <gengli...@apache.org> --- docs/building-spark.md | 2 +- docs/cluster-overview.md | 2 +- docs/configuration.md | 40 ++++++++++++------------ docs/css/custom.css | 13 ++++++++ docs/ml-classification-regression.md | 14 ++++----- docs/ml-clustering.md | 8 ++--- docs/mllib-classification-regression.md | 2 +- docs/mllib-decision-tree.md | 2 +- docs/mllib-ensembles.md | 2 +- docs/mllib-evaluation-metrics.md | 10 +++--- docs/mllib-linear-methods.md | 4 +-- docs/mllib-pmml-model-export.md | 2 +- docs/monitoring.md | 10 +++--- docs/rdd-programming-guide.md | 8 ++--- docs/running-on-kubernetes.md | 8 ++--- docs/running-on-mesos.md | 2 +- docs/running-on-yarn.md | 8 ++--- docs/security.md | 26 +++++++-------- docs/spark-standalone.md | 12 +++---- docs/sparkr.md | 6 ++-- docs/sql-data-sources-avro.md | 12 +++---- docs/sql-data-sources-csv.md | 2 +- docs/sql-data-sources-hive-tables.md | 4 +-- docs/sql-data-sources-jdbc.md | 2 +- docs/sql-data-sources-json.md | 2 +- docs/sql-data-sources-load-save-functions.md | 2 +- docs/sql-data-sources-orc.md | 4 +-- docs/sql-data-sources-parquet.md | 4 +-- docs/sql-data-sources-text.md | 2 +- docs/sql-distributed-sql-engine-spark-sql-cli.md | 4 +-- docs/sql-error-conditions-sqlstates.md | 26 +++++++-------- docs/sql-migration-guide.md | 4 +-- docs/sql-performance-tuning.md | 16 +++++----- docs/storage-openstack-swift.md | 2 +- docs/streaming-custom-receivers.md | 2 +- docs/streaming-programming-guide.md | 10 +++--- docs/structured-streaming-kafka-integration.md | 20 ++++++------ docs/structured-streaming-programming-guide.md | 12 +++---- docs/submitting-applications.md | 2 +- docs/web-ui.md | 2 +- 40 files changed, 164 insertions(+), 151 deletions(-) diff --git a/docs/building-spark.md b/docs/building-spark.md index 4b8e70655d59..33d253a49dbf 100644 --- a/docs/building-spark.md +++ b/docs/building-spark.md @@ -286,7 +286,7 @@ If use an individual repository or a repository on GitHub Enterprise, export bel ### Related environment variables -<table class="table table-striped"> +<table> <thead><tr><th>Variable Name</th><th>Default</th><th>Meaning</th></tr></thead> <tr> <td><code>SPARK_PROJECT_URL</code></td> diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md index 7da06a852089..34913bd97a41 100644 --- a/docs/cluster-overview.md +++ b/docs/cluster-overview.md @@ -91,7 +91,7 @@ The [job scheduling overview](job-scheduling.html) describes this in more detail The following table summarizes terms you'll see used to refer to cluster concepts: -<table class="table table-striped"> +<table> <thead> <tr><th style="width: 130px;">Term</th><th>Meaning</th></tr> </thead> diff --git a/docs/configuration.md b/docs/configuration.md index 4604360dda28..248f9333c9a3 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -135,7 +135,7 @@ of the most common options to set are: ### Application Properties -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.app.name</code></td> @@ -520,7 +520,7 @@ Apart from these, the following properties are also available, and may be useful ### Runtime Environment -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.driver.extraClassPath</code></td> @@ -907,7 +907,7 @@ Apart from these, the following properties are also available, and may be useful ### Shuffle Behavior -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.reducer.maxSizeInFlight</code></td> @@ -1282,7 +1282,7 @@ Apart from these, the following properties are also available, and may be useful ### Spark UI -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.eventLog.logBlockUpdates.enabled</code></td> @@ -1674,7 +1674,7 @@ Apart from these, the following properties are also available, and may be useful ### Compression and Serialization -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.broadcast.compress</code></td> @@ -1872,7 +1872,7 @@ Apart from these, the following properties are also available, and may be useful ### Memory Management -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.memory.fraction</code></td> @@ -1997,7 +1997,7 @@ Apart from these, the following properties are also available, and may be useful ### Execution Behavior -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.broadcast.blockSize</code></td> @@ -2247,7 +2247,7 @@ Apart from these, the following properties are also available, and may be useful ### Executor Metrics -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.eventLog.logStageExecutorMetrics</code></td> @@ -2315,7 +2315,7 @@ Apart from these, the following properties are also available, and may be useful ### Networking -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.rpc.message.maxSize</code></td> @@ -2478,7 +2478,7 @@ Apart from these, the following properties are also available, and may be useful ### Scheduling -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.cores.max</code></td> @@ -2962,7 +2962,7 @@ Apart from these, the following properties are also available, and may be useful ### Barrier Execution Mode -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.barrier.sync.timeout</code></td> @@ -3009,7 +3009,7 @@ Apart from these, the following properties are also available, and may be useful ### Dynamic Allocation -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.dynamicAllocation.enabled</code></td> @@ -3151,7 +3151,7 @@ finer granularity starting from driver and executor. Take RPC module as example like shuffle, just replace "rpc" with "shuffle" in the property names except <code>spark.{driver|executor}.rpc.netty.dispatcher.numThreads</code>, which is only for RPC module. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.{driver|executor}.rpc.io.serverThreads</code></td> @@ -3294,7 +3294,7 @@ External users can query the static sql config values via `SparkSession.conf` or ### Spark Streaming -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.streaming.backpressure.enabled</code></td> @@ -3426,7 +3426,7 @@ External users can query the static sql config values via `SparkSession.conf` or ### SparkR -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.r.numRBackendThreads</code></td> @@ -3482,7 +3482,7 @@ External users can query the static sql config values via `SparkSession.conf` or ### GraphX -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.graphx.pregel.checkpointInterval</code></td> @@ -3497,7 +3497,7 @@ External users can query the static sql config values via `SparkSession.conf` or ### Deploy -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.deploy.recoveryMode</code></td> @@ -3547,7 +3547,7 @@ copy `conf/spark-env.sh.template` to create it. Make sure you make the copy exec The following variables can be set in `spark-env.sh`: -<table class="table table-striped"> +<table> <thead><tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr></thead> <tr> <td><code>JAVA_HOME</code></td> @@ -3684,7 +3684,7 @@ Push-based shuffle helps improve the reliability and performance of spark shuffl ### External Shuffle service(server) side configuration options -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.shuffle.push.server.mergedShuffleFileManagerImpl</code></td> @@ -3718,7 +3718,7 @@ Push-based shuffle helps improve the reliability and performance of spark shuffl ### Client side configuration options -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.shuffle.push.enabled</code></td> diff --git a/docs/css/custom.css b/docs/css/custom.css index 4576f45d1ab7..e7416d9ded61 100644 --- a/docs/css/custom.css +++ b/docs/css/custom.css @@ -1110,5 +1110,18 @@ img { table { width: 100%; overflow-wrap: normal; + border-collapse: collapse; /* Ensures that the borders collapse into a single border */ } +table th, table td { + border: 1px solid #cccccc; /* Adds a border to each table header and data cell */ + padding: 6px 13px; /* Optional: Adds padding inside each cell for better readability */ +} + +table tr { + background-color: white; /* Sets a default background color for all rows */ +} + +table tr:nth-child(2n) { + background-color: #F1F4F5; /* Sets a different background color for even rows */ +} diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md index d184f4fe0257..604b3245272f 100644 --- a/docs/ml-classification-regression.md +++ b/docs/ml-classification-regression.md @@ -703,7 +703,7 @@ others. ### Available families -<table class="table table-striped"> +<table> <thead> <tr> <th>Family</th> @@ -1224,7 +1224,7 @@ All output columns are optional; to exclude an output column, set its correspond ### Input Columns -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> @@ -1251,7 +1251,7 @@ All output columns are optional; to exclude an output column, set its correspond ### Output Columns -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> @@ -1326,7 +1326,7 @@ All output columns are optional; to exclude an output column, set its correspond #### Input Columns -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> @@ -1353,7 +1353,7 @@ All output columns are optional; to exclude an output column, set its correspond #### Output Columns (Predictions) -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> @@ -1407,7 +1407,7 @@ All output columns are optional; to exclude an output column, set its correspond #### Input Columns -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> @@ -1436,7 +1436,7 @@ Note that `GBTClassifier` currently only supports binary labels. #### Output Columns (Predictions) -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md index 00a156b6645c..fdb8173ce3bb 100644 --- a/docs/ml-clustering.md +++ b/docs/ml-clustering.md @@ -40,7 +40,7 @@ called [kmeans||](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf). ### Input Columns -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> @@ -61,7 +61,7 @@ called [kmeans||](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf). ### Output Columns -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> @@ -204,7 +204,7 @@ model. ### Input Columns -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> @@ -225,7 +225,7 @@ model. ### Output Columns -<table class="table table-striped"> +<table> <thead> <tr> <th align="left">Param name</th> diff --git a/docs/mllib-classification-regression.md b/docs/mllib-classification-regression.md index 10cb85e39202..b3305314abc5 100644 --- a/docs/mllib-classification-regression.md +++ b/docs/mllib-classification-regression.md @@ -26,7 +26,7 @@ classification](http://en.wikipedia.org/wiki/Multiclass_classification), and [regression analysis](http://en.wikipedia.org/wiki/Regression_analysis). The table below outlines the supported algorithms for each type of problem. -<table class="table table-striped"> +<table> <thead> <tr><th>Problem Type</th><th>Supported Methods</th></tr> </thead> diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md index 174255c48b69..0d9886315e28 100644 --- a/docs/mllib-decision-tree.md +++ b/docs/mllib-decision-tree.md @@ -51,7 +51,7 @@ The *node impurity* is a measure of the homogeneity of the labels at the node. T implementation provides two impurity measures for classification (Gini impurity and entropy) and one impurity measure for regression (variance). -<table class="table table-striped"> +<table> <thead> <tr><th>Impurity</th><th>Task</th><th>Formula</th><th>Description</th></tr> </thead> diff --git a/docs/mllib-ensembles.md b/docs/mllib-ensembles.md index b1006f2730db..fdad7ae68dd4 100644 --- a/docs/mllib-ensembles.md +++ b/docs/mllib-ensembles.md @@ -191,7 +191,7 @@ Note that each loss is applicable to one of classification or regression, not bo Notation: $N$ = number of instances. $y_i$ = label of instance $i$. $x_i$ = features of instance $i$. $F(x_i)$ = model's predicted label for instance $i$. -<table class="table table-striped"> +<table> <thead> <tr><th>Loss</th><th>Task</th><th>Formula</th><th>Description</th></tr> </thead> diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md index f82f6a01136b..30acc3dc634b 100644 --- a/docs/mllib-evaluation-metrics.md +++ b/docs/mllib-evaluation-metrics.md @@ -76,7 +76,7 @@ plots (recall, false positive rate) points. **Available metrics** -<table class="table table-striped"> +<table> <thead> <tr><th>Metric</th><th>Definition</th></tr> </thead> @@ -179,7 +179,7 @@ For this section, a modified delta function $\hat{\delta}(x)$ will prove useful $$\hat{\delta}(x) = \begin{cases}1 & \text{if $x = 0$}, \\ 0 & \text{otherwise}.\end{cases}$$ -<table class="table table-striped"> +<table> <thead> <tr><th>Metric</th><th>Definition</th></tr> </thead> @@ -296,7 +296,7 @@ The following definition of indicator function $I_A(x)$ on a set $A$ will be nec $$I_A(x) = \begin{cases}1 & \text{if $x \in A$}, \\ 0 & \text{otherwise}.\end{cases}$$ -<table class="table table-striped"> +<table> <thead> <tr><th>Metric</th><th>Definition</th></tr> </thead> @@ -447,7 +447,7 @@ documents, returns a relevance score for the recommended document. $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{cases}$$ -<table class="table table-striped"> +<table> <thead> <tr><th>Metric</th><th>Definition</th><th>Notes</th></tr> </thead> @@ -553,7 +553,7 @@ variable from a number of independent variables. **Available metrics** -<table class="table table-striped"> +<table> <thead> <tr><th>Metric</th><th>Definition</th></tr> </thead> diff --git a/docs/mllib-linear-methods.md b/docs/mllib-linear-methods.md index b535d2de307a..448d881f794a 100644 --- a/docs/mllib-linear-methods.md +++ b/docs/mllib-linear-methods.md @@ -72,7 +72,7 @@ training error) and minimizing model complexity (i.e., to avoid overfitting). The following table summarizes the loss functions and their gradients or sub-gradients for the methods `spark.mllib` supports: -<table class="table table-striped"> +<table> <thead> <tr><th></th><th>loss function $L(\wv; \x, y)$</th><th>gradient or sub-gradient</th></tr> </thead> @@ -105,7 +105,7 @@ The purpose of the encourage simple models and avoid overfitting. We support the following regularizers in `spark.mllib`: -<table class="table table-striped"> +<table> <thead> <tr><th></th><th>regularizer $R(\wv)$</th><th>gradient or sub-gradient</th></tr> </thead> diff --git a/docs/mllib-pmml-model-export.md b/docs/mllib-pmml-model-export.md index e20d7c2fe4e1..02b5fda7a36d 100644 --- a/docs/mllib-pmml-model-export.md +++ b/docs/mllib-pmml-model-export.md @@ -28,7 +28,7 @@ license: | The table below outlines the `spark.mllib` models that can be exported to PMML and their equivalent PMML model. -<table class="table table-striped"> +<table> <thead> <tr><th>spark.mllib model</th><th>PMML model</th></tr> </thead> diff --git a/docs/monitoring.md b/docs/monitoring.md index 91b158bf85d2..e90ef46bdffe 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -69,7 +69,7 @@ The history server can be configured as follows: ### Environment Variables -<table class="table table-striped"> +<table> <thead><tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr></thead> <tr> <td><code>SPARK_DAEMON_MEMORY</code></td> @@ -145,7 +145,7 @@ Use it with caution. Security options for the Spark History Server are covered more detail in the [Security](security.html#web-ui) page. -<table class="table table-striped"> +<table> <thead> <tr> <th>Property Name</th> @@ -470,7 +470,7 @@ only for applications in cluster mode, not applications in client mode. Applicat can be identified by their `[attempt-id]`. In the API listed below, when running in YARN cluster mode, `[app-id]` will actually be `[base-app-id]/[attempt-id]`, where `[base-app-id]` is the YARN application ID. -<table class="table table-striped"> +<table> <thead><tr><th>Endpoint</th><th>Meaning</th></tr></thead> <tr> <td><code>/applications</code></td> @@ -669,7 +669,7 @@ The REST API exposes the values of the Task Metrics collected by Spark executors of task execution. The metrics can be used for performance troubleshooting and workload characterization. A list of the available metrics, with a short description: -<table class="table table-striped"> +<table> <thead> <tr> <th>Spark Executor Task Metric name</th> @@ -827,7 +827,7 @@ In addition, aggregated per-stage peak values of the executor memory metrics are Executor memory metrics are also exposed via the Spark metrics system based on the [Dropwizard metrics library](https://metrics.dropwizard.io/4.2.0). A list of the available metrics, with a short description: -<table class="table table-striped"> +<table> <thead> <tr><th>Executor Level Metric name</th> <th>Short description</th> diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md index aee22ad484e6..cc897aea06c9 100644 --- a/docs/rdd-programming-guide.md +++ b/docs/rdd-programming-guide.md @@ -378,7 +378,7 @@ resulting Java objects using [pickle](https://github.com/irmen/pickle/). When sa PySpark does the reverse. It unpickles Python objects into Java objects and then converts them to Writables. The following Writables are automatically converted: -<table class="table table-striped"> +<table> <thead><tr><th>Writable Type</th><th>Python Type</th></tr></thead> <tr><td>Text</td><td>str</td></tr> <tr><td>IntWritable</td><td>int</td></tr> @@ -954,7 +954,7 @@ and pair RDD functions doc [Java](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html)) for details. -<table class="table table-striped"> +<table> <thead><tr><th style="width:25%">Transformation</th><th>Meaning</th></tr></thead> <tr> <td> <b>map</b>(<i>func</i>) </td> @@ -1069,7 +1069,7 @@ and pair RDD functions doc [Java](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html)) for details. -<table class="table table-striped"> +<table> <thead><tr><th>Action</th><th>Meaning</th></tr></thead> <tr> <td> <b>reduce</b>(<i>func</i>) </td> @@ -1214,7 +1214,7 @@ to `persist()`. The `cache()` method is a shorthand for using the default storag which is `StorageLevel.MEMORY_ONLY` (store deserialized objects in memory). The full set of storage levels is: -<table class="table table-striped"> +<table> <thead><tr><th style="width:23%">Storage Level</th><th>Meaning</th></tr></thead> <tr> <td> MEMORY_ONLY </td> diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 38a745f1afca..a684e7caa1a0 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -579,7 +579,7 @@ See the [configuration page](configuration.html) for information on Spark config #### Spark Properties -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.kubernetes.context</code></td> @@ -1645,7 +1645,7 @@ See the below table for the full list of pod specifications that will be overwri ### Pod Metadata -<table class="table table-striped"> +<table> <thead><tr><th>Pod metadata key</th><th>Modified value</th><th>Description</th></tr></thead> <tr> <td>name</td> @@ -1681,7 +1681,7 @@ See the below table for the full list of pod specifications that will be overwri ### Pod Spec -<table class="table table-striped"> +<table> <thead><tr><th>Pod spec key</th><th>Modified value</th><th>Description</th></tr></thead> <tr> <td>imagePullSecrets</td> @@ -1734,7 +1734,7 @@ See the below table for the full list of pod specifications that will be overwri The following affect the driver and executor containers. All other containers in the pod spec will be unaffected. -<table class="table table-striped"> +<table> <thead><tr><th>Container spec key</th><th>Modified value</th><th>Description</th></tr></thead> <tr> <td>env</td> diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md index b1a54a089a54..3d1c57030982 100644 --- a/docs/running-on-mesos.md +++ b/docs/running-on-mesos.md @@ -374,7 +374,7 @@ See the [configuration page](configuration.html) for information on Spark config #### Spark Properties -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.mesos.coarse</code></td> diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 97cc9ac135af..d577b70a6803 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -143,7 +143,7 @@ To use a custom metrics.properties for the application master and executors, upd #### Spark Properties -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.yarn.am.memory</code></td> @@ -696,7 +696,7 @@ To use a custom metrics.properties for the application master and executors, upd #### Available patterns for SHS custom executor log URL -<table class="table table-striped"> +<table> <thead><tr><th>Pattern</th><th>Meaning</th></tr></thead> <tr> <td>{{HTTP_SCHEME}}</td> @@ -783,7 +783,7 @@ staging directory of the Spark application. ## YARN-specific Kerberos Configuration -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.kerberos.keytab</code></td> @@ -882,7 +882,7 @@ to avoid garbage collection issues during shuffle. The following extra configuration options are available when the shuffle service is running on YARN: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr></thead> <tr> <td><code>spark.yarn.shuffle.stopOnFailure</code></td> diff --git a/docs/security.md b/docs/security.md index 3c6fd507fec6..c5d132f680a4 100644 --- a/docs/security.md +++ b/docs/security.md @@ -60,7 +60,7 @@ distributing the shared secret. Each application will use a unique shared secret the case of YARN, this feature relies on YARN RPC encryption being enabled for the distribution of secrets to be secure. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.yarn.shuffle.server.recovery.disabled</code></td> @@ -82,7 +82,7 @@ that any user that can list pods in the namespace where the Spark application is also see their authentication secret. Access control rules should be properly set up by the Kubernetes admin to ensure that Spark authentication is secure. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.authenticate</code></td> @@ -103,7 +103,7 @@ Kubernetes admin to ensure that Spark authentication is secure. Alternatively, one can mount authentication secrets using files and Kubernetes secrets that the user mounts into their pods. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.authenticate.secret.file</code></td> @@ -159,7 +159,7 @@ is still required when talking to shuffle services from Spark versions older tha The following table describes the different options available for configuring this feature. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.network.crypto.enabled</code></td> @@ -219,7 +219,7 @@ encrypting output data generated by applications with APIs such as `saveAsHadoop The following settings cover enabling encryption for data written to disk: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.io.encryption.enabled</code></td> @@ -287,7 +287,7 @@ below. The following options control the authentication of Web UIs: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.ui.allowFramingFrom</code></td> @@ -391,7 +391,7 @@ servlet filters. To enable authorization in the SHS, a few extra options are used: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.history.ui.acls.enable</code></td> @@ -440,7 +440,7 @@ protocol-specific settings. This way the user can easily provide the common sett protocols without disabling the ability to configure each one individually. The following table describes the SSL configuration namespaces: -<table class="table table-striped"> +<table> <thead> <tr> <th>Config Namespace</th> @@ -471,7 +471,7 @@ describes the SSL configuration namespaces: The full breakdown of available SSL options can be found below. The `${ns}` placeholder should be replaced with one of the above namespaces. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr></thead> <tr> <td><code>${ns}.enabled</code></td> @@ -641,7 +641,7 @@ Apache Spark can be configured to include HTTP headers to aid in preventing Cros (XSS), Cross-Frame Scripting (XFS), MIME-Sniffing, and also to enforce HTTP Strict Transport Security. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.ui.xXssProtection</code></td> @@ -697,7 +697,7 @@ configure those ports. ## Standalone mode only -<table class="table table-striped"> +<table> <thead> <tr> <th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration @@ -748,7 +748,7 @@ configure those ports. ## All cluster managers -<table class="table table-striped"> +<table> <thead> <tr> <th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration @@ -824,7 +824,7 @@ deployment-specific page for more information. The following options provides finer-grained control for this feature: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.security.credentials.${service}.enabled</code></td> diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index e7ea2669a113..5babac9e2529 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -53,7 +53,7 @@ You should see the new node listed there, along with its number of CPUs and memo Finally, the following configuration options can be passed to the master and worker: -<table class="table table-striped"> +<table> <thead><tr><th style="width:21%">Argument</th><th>Meaning</th></tr></thead> <tr> <td><code>-h HOST</code>, <code>--host HOST</code></td> @@ -116,7 +116,7 @@ Note that these scripts must be executed on the machine you want to run the Spar You can optionally configure the cluster further by setting environment variables in `conf/spark-env.sh`. Create this file by starting with the `conf/spark-env.sh.template`, and _copy it to all your worker machines_ for the settings to take effect. The following settings are available: -<table class="table table-striped"> +<table> <thead><tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr></thead> <tr> <td><code>SPARK_MASTER_HOST</code></td> @@ -188,7 +188,7 @@ You can optionally configure the cluster further by setting environment variable SPARK_MASTER_OPTS supports the following system properties: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.master.ui.port</code></td> @@ -324,7 +324,7 @@ SPARK_MASTER_OPTS supports the following system properties: SPARK_WORKER_OPTS supports the following system properties: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.worker.cleanup.enabled</code></td> @@ -429,7 +429,7 @@ You can also pass an option `--total-executor-cores <numCores>` to control the n Spark applications supports the following configuration properties specific to standalone mode: -<table class="table table-striped"> +<table> <thead><tr><th style="width:21%">Property Name</th><th>Default Value</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.standalone.submit.waitAppCompletion</code></td> @@ -646,7 +646,7 @@ ZooKeeper is the best way to go for production-level high availability, but if y In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration: -<table class="table table-striped"> +<table> <thead><tr><th style="width:21%">System property</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.deploy.recoveryMode</code></td> diff --git a/docs/sparkr.md b/docs/sparkr.md index 8e6a98e40b68..a34a1200c4c0 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -77,7 +77,7 @@ sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g The following Spark driver properties can be set in `sparkConfig` with `sparkR.session` from RStudio: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Property group</th><th><code>spark-submit</code> equivalent</th></tr></thead> <tr> <td><code>spark.master</code></td> @@ -588,7 +588,7 @@ The following example shows how to save/load a MLlib model by SparkR. {% include_example read_write r/ml/ml.R %} # Data type mapping between R and Spark -<table class="table table-striped"> +<table> <thead><tr><th>R</th><th>Spark</th></tr></thead> <tr> <td>byte</td> @@ -728,7 +728,7 @@ function is masking another function. The following functions are masked by the SparkR package: -<table class="table table-striped"> +<table> <thead><tr><th>Masked function</th><th>How to Access</th></tr></thead> <tr> <td><code>cov</code> in <code>package:stats</code></td> diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md index b01174b91824..c846116ebf3e 100644 --- a/docs/sql-data-sources-avro.md +++ b/docs/sql-data-sources-avro.md @@ -233,7 +233,7 @@ Data source options of Avro can be set via: * the `.option` method on `DataFrameReader` or `DataFrameWriter`. * the `options` parameter in function `from_avro`. -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th><th><b>Since Version</b></th></tr></thead> <tr> <td><code>avroSchema</code></td> @@ -331,7 +331,7 @@ Data source options of Avro can be set via: ## Configuration Configuration of Avro can be done using the `setConf` method on SparkSession or by running `SET key=value` commands using SQL. -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr></thead> <tr> <td>spark.sql.legacy.replaceDatabricksSparkAvro.enabled</td> @@ -418,7 +418,7 @@ Submission Guide for more details. ## Supported types for Avro -> Spark SQL conversion Currently Spark supports reading all [primitive types](https://avro.apache.org/docs/1.11.2/specification/#primitive-types) and [complex types](https://avro.apache.org/docs/1.11.2/specification/#complex-types) under records of Avro. -<table class="table table-striped"> +<table> <thead><tr><th><b>Avro type</b></th><th><b>Spark SQL type</b></th></tr></thead> <tr> <td>boolean</td> @@ -483,7 +483,7 @@ All other union types are considered complex. They will be mapped to StructType It also supports reading the following Avro [logical types](https://avro.apache.org/docs/1.11.2/specification/#logical-types): -<table class="table table-striped"> +<table> <thead><tr><th><b>Avro logical type</b></th><th><b>Avro type</b></th><th><b>Spark SQL type</b></th></tr></thead> <tr> <td>date</td> @@ -516,7 +516,7 @@ At the moment, it ignores docs, aliases and other properties present in the Avro ## Supported types for Spark SQL -> Avro conversion Spark supports writing of all Spark SQL types into Avro. For most types, the mapping from Spark types to Avro types is straightforward (e.g. IntegerType gets converted to int); however, there are a few special cases which are listed below: -<table class="table table-striped"> +<table> <thead><tr><th><b>Spark SQL type</b></th><th><b>Avro type</b></th><th><b>Avro logical type</b></th></tr></thead> <tr> <td>ByteType</td> @@ -552,7 +552,7 @@ Spark supports writing of all Spark SQL types into Avro. For most types, the map You can also specify the whole output Avro schema with the option `avroSchema`, so that Spark SQL types can be converted into other Avro types. The following conversions are not applied by default and require user specified Avro schema: -<table class="table table-striped"> +<table> <thead><tr><th><b>Spark SQL type</b></th><th><b>Avro type</b></th><th><b>Avro logical type</b></th></tr></thead> <tr> <td>BinaryType</td> diff --git a/docs/sql-data-sources-csv.md b/docs/sql-data-sources-csv.md index 31167f551430..241aae357122 100644 --- a/docs/sql-data-sources-csv.md +++ b/docs/sql-data-sources-csv.md @@ -52,7 +52,7 @@ Data source options of CSV can be set via: * `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html) -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead> <tr> <td><code>sep</code></td> diff --git a/docs/sql-data-sources-hive-tables.md b/docs/sql-data-sources-hive-tables.md index 0de573ec64b8..13cd8fc2cc05 100644 --- a/docs/sql-data-sources-hive-tables.md +++ b/docs/sql-data-sources-hive-tables.md @@ -75,7 +75,7 @@ format("serde", "input format", "output format"), e.g. `CREATE TABLE src(id int) By default, we will read the table files as plain text. Note that, Hive storage handler is not supported yet when creating table, you can create a table using storage handler at Hive side, and use Spark SQL to read it. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Meaning</th></tr></thead> <tr> <td><code>fileFormat</code></td> @@ -123,7 +123,7 @@ will compile against built-in Hive and use those classes for internal execution The following options can be used to configure the version of Hive that is used to retrieve metadata: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.hive.metastore.version</code></td> diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md index f96776514c67..edcdef4bf008 100644 --- a/docs/sql-data-sources-jdbc.md +++ b/docs/sql-data-sources-jdbc.md @@ -51,7 +51,7 @@ For connection properties, users can specify the JDBC connection properties in t <code>user</code> and <code>password</code> are normally provided as connection properties for logging into the data sources. -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead> <tr> <td><code>url</code></td> diff --git a/docs/sql-data-sources-json.md b/docs/sql-data-sources-json.md index 881a69cb1cea..4ade5170a6d8 100644 --- a/docs/sql-data-sources-json.md +++ b/docs/sql-data-sources-json.md @@ -109,7 +109,7 @@ Data source options of JSON can be set via: * `schema_of_json` * `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html) -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead> <tr> <!-- TODO(SPARK-35433): Add timeZone to Data Source Option for CSV, too. --> diff --git a/docs/sql-data-sources-load-save-functions.md b/docs/sql-data-sources-load-save-functions.md index 9d0a3f9c72b9..31f6d944bc97 100644 --- a/docs/sql-data-sources-load-save-functions.md +++ b/docs/sql-data-sources-load-save-functions.md @@ -218,7 +218,7 @@ present. It is important to realize that these save modes do not utilize any loc atomic. Additionally, when performing an `Overwrite`, the data will be deleted before writing out the new data. -<table class="table table-striped"> +<table> <thead><tr><th>Scala/Java</th><th>Any Language</th><th>Meaning</th></tr></thead> <tr> <td><code>SaveMode.ErrorIfExists</code> (default)</td> diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md index 4e492598f595..561f601aa4e5 100644 --- a/docs/sql-data-sources-orc.md +++ b/docs/sql-data-sources-orc.md @@ -129,7 +129,7 @@ When reading from Hive metastore ORC tables and inserting to Hive metastore ORC ### Configuration -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr></thead> <tr> <td><code>spark.sql.orc.impl</code></td> @@ -230,7 +230,7 @@ Data source options of ORC can be set via: * `DataStreamWriter` * `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html) -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead> <tr> <td><code>mergeSchema</code></td> diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md index 925e47504e5e..f49bbd7a9d04 100644 --- a/docs/sql-data-sources-parquet.md +++ b/docs/sql-data-sources-parquet.md @@ -386,7 +386,7 @@ Data source options of Parquet can be set via: * `DataStreamWriter` * `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html) -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead> <tr> <td><code>datetimeRebaseMode</code></td> @@ -434,7 +434,7 @@ Other generic options can be found in <a href="https://spark.apache.org/docs/lat Configuration of Parquet can be done using the `setConf` method on `SparkSession` or by running `SET key=value` commands using SQL. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.parquet.binaryAsString</code></td> diff --git a/docs/sql-data-sources-text.md b/docs/sql-data-sources-text.md index bb485d29c396..aed8a2e9942f 100644 --- a/docs/sql-data-sources-text.md +++ b/docs/sql-data-sources-text.md @@ -47,7 +47,7 @@ Data source options of text can be set via: * `DataStreamWriter` * `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html) -<table class="table table-striped"> +<table> <thead><tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr></thead> <tr> <td><code>wholetext</code></td> diff --git a/docs/sql-distributed-sql-engine-spark-sql-cli.md b/docs/sql-distributed-sql-engine-spark-sql-cli.md index a67e009b9ae1..6d506cbb09c2 100644 --- a/docs/sql-distributed-sql-engine-spark-sql-cli.md +++ b/docs/sql-distributed-sql-engine-spark-sql-cli.md @@ -62,7 +62,7 @@ For example: `/path/to/spark-sql-cli.sql` equals to `file:///path/to/spark-sql-c ## Supported comment types -<table class="table table-striped"> +<table> <thead><tr><th>Comment</th><th>Example</th></tr></thead> <tr> <td>simple comment</td> @@ -115,7 +115,7 @@ Use `;` (semicolon) to terminate commands. Notice: ``` However, if ';' is the end of the line, it terminates the SQL statement. The example above will be terminated into `/* This is a comment contains ` and `*/ SELECT 1`, Spark will submit these two commands separated and throw parser error (`unclosed bracketed comment` and `Syntax error at or near '*/'`). -<table class="table table-striped"> +<table> <thead><tr><th>Command</th><th>Description</th></tr></thead> <tr> <td><code>quit</code> or <code>exit</code></td> diff --git a/docs/sql-error-conditions-sqlstates.md b/docs/sql-error-conditions-sqlstates.md index 5529c961b3bf..49cfb56b3662 100644 --- a/docs/sql-error-conditions-sqlstates.md +++ b/docs/sql-error-conditions-sqlstates.md @@ -33,7 +33,7 @@ Spark SQL uses the following `SQLSTATE` classes: ## Class `0A`: feature not supported -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>0A000</td> @@ -48,7 +48,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `21`: cardinality violation -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>21000</td> @@ -63,7 +63,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `22`: data exception -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>22003</td> @@ -168,7 +168,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `23`: integrity constraint violation -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>23505</td> @@ -183,7 +183,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `2B`: dependent privilege descriptors still exist -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>2BP01</td> @@ -198,7 +198,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `38`: external routine exception -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>38000</td> @@ -213,7 +213,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `39`: external routine invocation exception -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>39000</td> @@ -228,7 +228,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `42`: syntax error or access rule violation -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>42000</td> @@ -648,7 +648,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `46`: java ddl 1 -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>46110</td> @@ -672,7 +672,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `53`: insufficient resources -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>53200</td> @@ -687,7 +687,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `54`: program limit exceeded -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>54000</td> @@ -702,7 +702,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `HY`: CLI-specific condition -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>HY008</td> @@ -717,7 +717,7 @@ Spark SQL uses the following `SQLSTATE` classes: </table> ## Class `XX`: internal error -<table class="table table-striped"> +<table> <thead><tr><th>SQLSTATE</th><th>Description and issuing error classes</th></tr></thead> <tr> <td>XX000</td> diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 5cf0b28982c2..88635ee3d1f4 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -469,7 +469,7 @@ license: | ## Upgrading from Spark SQL 2.3 to 2.4 - In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below. - <table class="table table-striped"> + <table> <thead> <tr> <th> @@ -583,7 +583,7 @@ license: | - Since Spark 2.3, the Join/Filter's deterministic predicates that are after the first non-deterministic predicates are also pushed down/through the child operators, if possible. In prior Spark versions, these filters are not eligible for predicate pushdown. - Partition column inference previously found incorrect common type for different inferred types, for example, previously it ended up with double type as the common type for double type and date type. Now it finds the correct common type for such conflicts. The conflict resolution follows the table below: - <table class="table table-striped"> + <table> <thead> <tr> <th> diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md index 1467409bb500..2dec65cc553e 100644 --- a/docs/sql-performance-tuning.md +++ b/docs/sql-performance-tuning.md @@ -34,7 +34,7 @@ memory usage and GC pressure. You can call `spark.catalog.uncacheTable("tableNam Configuration of in-memory caching can be done using the `setConf` method on `SparkSession` or by running `SET key=value` commands using SQL. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.inMemoryColumnarStorage.compressed</code></td> @@ -62,7 +62,7 @@ Configuration of in-memory caching can be done using the `setConf` method on `Sp The following options can also be used to tune the performance of query execution. It is possible that these options will be deprecated in future release as more optimizations are performed automatically. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.files.maxPartitionBytes</code></td> @@ -253,7 +253,7 @@ Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that ma ### Coalescing Post Shuffle Partitions This feature coalesces the post shuffle partitions based on the map output statistics when both `spark.sql.adaptive.enabled` and `spark.sql.adaptive.coalescePartitions.enabled` configurations are true. This feature simplifies the tuning of shuffle partition number when running queries. You do not need to set a proper shuffle partition number to fit your dataset. Spark can pick the proper shuffle partition number at runtime once you set a large enough initial number of shuffle partitions [...] - <table class="table table-striped"> + <table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.adaptive.coalescePartitions.enabled</code></td> @@ -298,7 +298,7 @@ This feature coalesces the post shuffle partitions based on the map output stati </table> ### Spliting skewed shuffle partitions - <table class="table table-striped"> + <table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled</code></td> @@ -320,7 +320,7 @@ This feature coalesces the post shuffle partitions based on the map output stati ### Converting sort-merge join to broadcast join AQE converts sort-merge join to broadcast hash join when the runtime statistics of any join side is smaller than the adaptive broadcast hash join threshold. This is not as efficient as planning a broadcast hash join in the first place, but it's better than keep doing the sort-merge join, as we can save the sorting of both the join sides, and read shuffle files locally to save network traffic(if `spark.sql.adaptive.localShuffleReader.enabled` is true) - <table class="table table-striped"> + <table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.adaptive.autoBroadcastJoinThreshold</code></td> @@ -342,7 +342,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics ### Converting sort-merge join to shuffled hash join AQE converts sort-merge join to shuffled hash join when all post shuffle partitions are smaller than a threshold, the max threshold can see the config `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold`. - <table class="table table-striped"> + <table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold</code></td> @@ -356,7 +356,7 @@ AQE converts sort-merge join to shuffled hash join when all post shuffle partiti ### Optimizing Skew Join Data skew can severely downgrade the performance of join queries. This feature dynamically handles skew in sort-merge join by splitting (and replicating if needed) skewed tasks into roughly evenly sized tasks. It takes effect when both `spark.sql.adaptive.enabled` and `spark.sql.adaptive.skewJoin.enabled` configurations are enabled. - <table class="table table-striped"> + <table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.adaptive.skewJoin.enabled</code></td> @@ -393,7 +393,7 @@ Data skew can severely downgrade the performance of join queries. This feature d </table> ### Misc - <table class="table table-striped"> + <table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.sql.adaptive.optimizer.excludedRules</code></td> diff --git a/docs/storage-openstack-swift.md b/docs/storage-openstack-swift.md index 73b21a1f7c27..5b30786bdd7f 100644 --- a/docs/storage-openstack-swift.md +++ b/docs/storage-openstack-swift.md @@ -60,7 +60,7 @@ required by Keystone. The following table contains a list of Keystone mandatory parameters. <code>PROVIDER</code> can be any (alphanumeric) name. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Meaning</th><th>Required</th></tr></thead> <tr> <td><code>fs.swift.service.PROVIDER.auth.url</code></td> diff --git a/docs/streaming-custom-receivers.md b/docs/streaming-custom-receivers.md index 591a4415bb1a..11a52232510f 100644 --- a/docs/streaming-custom-receivers.md +++ b/docs/streaming-custom-receivers.md @@ -243,7 +243,7 @@ interval in the [Spark Streaming Programming Guide](streaming-programming-guide. The following table summarizes the characteristics of both types of receivers -<table class="table table-striped"> +<table> <thead> <tr> <th>Receiver Type</th> diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index f8f98ca54425..4b93fb7c89ad 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -433,7 +433,7 @@ Streaming core artifact `spark-streaming-xyz_{{site.SCALA_BINARY_VERSION}}` to the dependencies. For example, some of the common ones are as follows. -<table class="table table-striped"> +<table> <thead><tr><th>Source</th><th>Artifact</th></tr></thead> <tr><td> Kafka </td><td> spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}} </td></tr> <tr><td> Kinesis<br/></td><td>spark-streaming-kinesis-asl_{{site.SCALA_BINARY_VERSION}} [Amazon Software License] </td></tr> @@ -820,7 +820,7 @@ Similar to that of RDDs, transformations allow the data from the input DStream t DStreams support many of the transformations available on normal Spark RDD's. Some of the common ones are as follows. -<table class="table table-striped"> +<table> <thead><tr><th style="width:25%">Transformation</th><th>Meaning</th></tr></thead> <tr> <td> <b>map</b>(<i>func</i>) </td> @@ -1109,7 +1109,7 @@ JavaPairDStream<String, Integer> windowedWordCounts = pairs.reduceByKeyAndWindow Some of the common window operations are as follows. All of these operations take the said two parameters - <i>windowLength</i> and <i>slideInterval</i>. -<table class="table table-striped"> +<table> <thead><tr><th style="width:25%">Transformation</th><th>Meaning</th></tr></thead> <tr> <td> <b>window</b>(<i>windowLength</i>, <i>slideInterval</i>) </td> @@ -1280,7 +1280,7 @@ Since the output operations actually allow the transformed data to be consumed b they trigger the actual execution of all the DStream transformations (similar to actions for RDDs). Currently, the following output operations are defined: -<table class="table table-striped"> +<table> <thead><tr><th style="width:30%">Output Operation</th><th>Meaning</th></tr></thead> <tr> <td> <b>print</b>()</td> @@ -2485,7 +2485,7 @@ enabled](#deploying-applications) and reliable receivers, there is zero data los The following table summarizes the semantics under failures: -<table class="table table-striped"> +<table> <thead> <tr> <th style="width:30%">Deployment Scenario</th> diff --git a/docs/structured-streaming-kafka-integration.md b/docs/structured-streaming-kafka-integration.md index 66e6efb1c8a9..c5ffdf025b17 100644 --- a/docs/structured-streaming-kafka-integration.md +++ b/docs/structured-streaming-kafka-integration.md @@ -297,7 +297,7 @@ df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)"); </div> Each row in the source has the following schema: -<table class="table table-striped"> +<table> <thead><tr><th>Column</th><th>Type</th></tr></thead> <tr> <td>key</td> @@ -336,7 +336,7 @@ Each row in the source has the following schema: The following options must be set for the Kafka source for both batch and streaming queries. -<table class="table table-striped"> +<table> <thead><tr><th>Option</th><th>value</th><th>meaning</th></tr></thead> <tr> <td>assign</td> @@ -368,7 +368,7 @@ for both batch and streaming queries. The following configurations are optional: -<table class="table table-striped"> +<table> <thead><tr><th>Option</th><th>value</th><th>default</th><th>query type</th><th>meaning</th></tr></thead> <tr> <td>startingTimestamp</td> @@ -607,7 +607,7 @@ The caching key is built up from the following information: The following properties are available to configure the consumer pool: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td>spark.kafka.consumer.cache.capacity</td> @@ -657,7 +657,7 @@ Note that it doesn't leverage Apache Commons Pool due to the difference of chara The following properties are available to configure the fetched data pool: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td>spark.kafka.consumer.fetchedData.cache.timeout</td> @@ -685,7 +685,7 @@ solution to remove duplicates when reading the written data could be to introduc that can be used to perform de-duplication when reading. The Dataframe being written to Kafka should have the following columns in schema: -<table class="table table-striped"> +<table> <thead><tr><th>Column</th><th>Type</th></tr></thead> <tr> <td>key (optional)</td> @@ -725,7 +725,7 @@ will be used. The following options must be set for the Kafka sink for both batch and streaming queries. -<table class="table table-striped"> +<table> <thead><tr><th>Option</th><th>value</th><th>meaning</th></tr></thead> <tr> <td>kafka.bootstrap.servers</td> @@ -736,7 +736,7 @@ for both batch and streaming queries. The following configurations are optional: -<table class="table table-striped"> +<table> <thead><tr><th>Option</th><th>value</th><th>default</th><th>query type</th><th>meaning</th></tr></thead> <tr> <td>topic</td> @@ -912,7 +912,7 @@ It will use different Kafka producer when delegation token is renewed; Kafka pro The following properties are available to configure the producer pool: -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td>spark.kafka.producer.cache.timeout</td> @@ -1039,7 +1039,7 @@ When none of the above applies then unsecure connection assumed. Delegation tokens can be obtained from multiple clusters and <code>${cluster}</code> is an arbitrary unique identifier which helps to group different configurations. -<table class="table table-striped"> +<table> <thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead> <tr> <td><code>spark.kafka.clusters.${cluster}.auth.bootstrap.servers</code></td> diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3e87c45a3491..845f0617898b 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -545,7 +545,7 @@ checkpointed offsets after a failure. See the earlier section on [fault-tolerance semantics](#fault-tolerance-semantics). Here are the details of all the sources in Spark. -<table class="table table-striped"> +<table> <thead> <tr> <th>Source</th> @@ -1819,7 +1819,7 @@ regarding watermark delays and whether data will be dropped or not. ##### Support matrix for joins in streaming queries -<table class="table table-striped"> +<table> <thead> <tr> <th>Left Input</th> @@ -2307,7 +2307,7 @@ to `org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider`. Here are the configs regarding to RocksDB instance of the state store provider: -<table class="table table-striped"> +<table> <thead> <tr> <th>Config Name</th> @@ -2474,7 +2474,7 @@ More information to be added in future releases. Different types of streaming queries support different output modes. Here is the compatibility matrix. -<table class="table table-striped"> +<table> <thead> <tr> <th>Query Type</th> @@ -2613,7 +2613,7 @@ meant for debugging purposes only. See the earlier section on [fault-tolerance semantics](#fault-tolerance-semantics). Here are the details of all the sinks in Spark. -<table class="table table-striped"> +<table> <thead> <tr> <th>Sink</th> @@ -3201,7 +3201,7 @@ The trigger settings of a streaming query define the timing of streaming data pr the query is going to be executed as micro-batch query with a fixed batch interval or as a continuous processing query. Here are the different kinds of triggers that are supported. -<table class="table table-striped"> +<table> <thead> <tr> <th>Trigger Type</th> diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md index becdfb4b18f5..4821f883eef9 100644 --- a/docs/submitting-applications.md +++ b/docs/submitting-applications.md @@ -159,7 +159,7 @@ export HADOOP_CONF_DIR=XXX The master URL passed to Spark can be in one of the following formats: -<table class="table table-striped"> +<table> <thead><tr><th>Master URL</th><th>Meaning</th></tr></thead> <tr><td> <code>local</code> </td><td> Run Spark locally with one worker thread (i.e. no parallelism at all). </td></tr> <tr><td> <code>local[K]</code> </td><td> Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine). </td></tr> diff --git a/docs/web-ui.md b/docs/web-ui.md index 079bc6137f02..cdf62e0d8ec0 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -380,7 +380,7 @@ operator shows the number of bytes written by a shuffle. Here is the list of SQL metrics: -<table class="table table-striped"> +<table> <thead><tr><th>SQL metrics</th><th>Meaning</th><th>Operators</th></tr></thead> <tr><td> <code>number of output rows</code> </td><td> the number of output rows of the operator </td><td> Aggregate operators, Join operators, Sample, Range, Scan operators, Filter, etc.</td></tr> <tr><td> <code>data size</code> </td><td> the size of broadcast/shuffled/collected data of the operator </td><td> BroadcastExchange, ShuffleExchange, Subquery </td></tr> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org