[spark] branch master updated (5d5866b -> b80309b)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d5866b [SPARK-31672][SQL] Fix loading of timestamps before 1582-10-15 from dictionary encoded Parquet columns add b80309b [SPARK-31674][CORE][DOCS] Make Prometheus metric endpoints experimental No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala | 3 +++ .../scala/org/apache/spark/status/api/v1/PrometheusResource.scala | 3 +++ docs/monitoring.md| 4 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5d5866b -> b80309b)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d5866b [SPARK-31672][SQL] Fix loading of timestamps before 1582-10-15 from dictionary encoded Parquet columns add b80309b [SPARK-31674][CORE][DOCS] Make Prometheus metric endpoints experimental No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala | 3 +++ .../scala/org/apache/spark/status/api/v1/PrometheusResource.scala | 3 +++ docs/monitoring.md| 4 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31674][CORE][DOCS] Make Prometheus metric endpoints experimental
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new e2bf140 [SPARK-31674][CORE][DOCS] Make Prometheus metric endpoints experimental e2bf140 is described below commit e2bf140c68ef38216167e0872b964c3964ca0d9f Author: Dongjoon Hyun AuthorDate: Sun May 10 22:32:26 2020 -0700 [SPARK-31674][CORE][DOCS] Make Prometheus metric endpoints experimental ### What changes were proposed in this pull request? This PR aims to new Prometheus-format metric endpoints experimental in Apache Spark 3.0.0. ### Why are the changes needed? Although the new metrics are disabled by default, we had better make it experimental explicitly in Apache Spark 3.0.0 since the output format is still not fixed. We can finalize it in Apache Spark 3.1.0. ### Does this PR introduce _any_ user-facing change? Only doc-change is visible to the users. ### How was this patch tested? Manually check the code since this is a documentation and class annotation change. Closes #28495 from dongjoon-hyun/SPARK-31674. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit b80309bdb4d26556bd3da6a61cac464cdbdd1fe1) Signed-off-by: Dongjoon Hyun --- .../main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala | 3 +++ .../scala/org/apache/spark/status/api/v1/PrometheusResource.scala | 3 +++ docs/monitoring.md| 4 ++-- 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala b/core/src/main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala index 7c33bce..011c7bc 100644 --- a/core/src/main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala +++ b/core/src/main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala @@ -24,15 +24,18 @@ import com.codahale.metrics.MetricRegistry import org.eclipse.jetty.servlet.ServletContextHandler import org.apache.spark.{SecurityManager, SparkConf} +import org.apache.spark.annotation.Experimental import org.apache.spark.ui.JettyUtils._ /** + * :: Experimental :: * This exposes the metrics of the given registry with Prometheus format. * * The output is consistent with /metrics/json result in terms of item ordering * and with the previous result of Spark JMX Sink + Prometheus JMX Converter combination * in terms of key string format. */ +@Experimental private[spark] class PrometheusServlet( val property: Properties, val registry: MetricRegistry, diff --git a/core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala b/core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala index f9fb78e..2a5f151 100644 --- a/core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala +++ b/core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala @@ -23,15 +23,18 @@ import org.eclipse.jetty.servlet.{ServletContextHandler, ServletHolder} import org.glassfish.jersey.server.ServerProperties import org.glassfish.jersey.servlet.ServletContainer +import org.apache.spark.annotation.Experimental import org.apache.spark.ui.SparkUI /** + * :: Experimental :: * This aims to expose Executor metrics like REST API which is documented in * *https://spark.apache.org/docs/3.0.0/monitoring.html#executor-metrics * * Note that this is based on ExecutorSummary which is different from ExecutorSource. */ +@Experimental @Path("/executors") private[v1] class PrometheusResource extends ApiRequestContext { @GET diff --git a/docs/monitoring.md b/docs/monitoring.md index 7e41c9d..4da0f8e 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -715,7 +715,7 @@ A list of the available metrics, with a short description: Executor-level metrics are sent from each executor to the driver as part of the Heartbeat to describe the performance metrics of Executor itself like JVM heap memory, GC information. Executor metric values and their measured memory peak values per executor are exposed via the REST API in JSON format and in Prometheus format. The JSON end point is exposed at: `/applications/[app-id]/executors`, and the Prometheus endpoint at: `/metrics/executors/prometheus`. -The Prometheus endpoint is conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`). +The Prometheus endpoint is experimental and conditional to a configuration parameter: `spark.ui.prometheus.enabled=true` (the default is `false`). In addition, aggregated per-stage peak values of the executor memory metrics are written to the event log if `spark.eventLog.logStageExecutorMetrics` is true. Executor
[spark] branch branch-3.0 updated: [SPARK-31672][SQL] Fix loading of timestamps before 1582-10-15 from dictionary encoded Parquet columns
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 5c6a4fc [SPARK-31672][SQL] Fix loading of timestamps before 1582-10-15 from dictionary encoded Parquet columns 5c6a4fc is described below commit 5c6a4fc8a71fcca9110c8c18ebd44d935514fcc1 Author: Max Gekk AuthorDate: Mon May 11 04:58:08 2020 + [SPARK-31672][SQL] Fix loading of timestamps before 1582-10-15 from dictionary encoded Parquet columns Modified the `decodeDictionaryIds()` method of `VectorizedColumnReader` to handle especially `TimestampType` when the passed parameter `rebaseDateTime` is true. In that case, decoded milliseconds/microseconds are rebased from the hybrid calendar to Proleptic Gregorian calendar using `RebaseDateTime`.`rebaseJulianToGregorianMicros()`. This fixes the bug of loading timestamps before the cutover day from dictionary encoded column in parquet files. The code below forces dictionary encoding: ```scala spark.conf.set("spark.sql.legacy.parquet.rebaseDateTimeInWrite.enabled", true) scala> spark.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS") scala> Seq.tabulate(8)(_ => "1001-01-01 01:02:03.123").toDF("tsS") .select($"tsS".cast("timestamp").as("ts")).repartition(1) .write .option("parquet.enable.dictionary", true) .parquet(path) ``` Load the dates back: ```scala scala> spark.read.parquet(path).show(false) +---+ |ts | +---+ |1001-01-07 00:32:20.123| ... |1001-01-07 00:32:20.123| +---+ ``` Expected values **must be 1001-01-01 01:02:03.123** but not 1001-01-07 00:32:20.123. Yes. After the changes: ```scala scala> spark.read.parquet(path).show(false) +---+ |ts | +---+ |1001-01-01 01:02:03.123| ... |1001-01-01 01:02:03.123| +---+ ``` Modified the test `SPARK-31159: rebasing timestamps in write` in `ParquetIOSuite` to checked reading dictionary encoded dates. Closes #28489 from MaxGekk/fix-ts-rebase-parquet-dict-enc. Authored-by: Max Gekk Signed-off-by: Wenchen Fan (cherry picked from commit 5d5866be12259c40972f7404f64d830cab87401f) Signed-off-by: Wenchen Fan --- .../parquet/VectorizedColumnReader.java| 31 +-- .../datasources/parquet/ParquetIOSuite.scala | 65 -- 2 files changed, 64 insertions(+), 32 deletions(-) diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java index 03056f5..11ce11d 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java @@ -159,7 +159,11 @@ public class VectorizedColumnReader { isSupported = originalType != OriginalType.DATE || !rebaseDateTime; break; case INT64: -isSupported = originalType != OriginalType.TIMESTAMP_MILLIS; +if (originalType == OriginalType.TIMESTAMP_MICROS) { + isSupported = !rebaseDateTime; +} else { + isSupported = originalType != OriginalType.TIMESTAMP_MILLIS; +} break; case FLOAT: case DOUBLE: @@ -313,17 +317,36 @@ public class VectorizedColumnReader { case INT64: if (column.dataType() == DataTypes.LongType || DecimalType.is64BitDecimalType(column.dataType()) || -originalType == OriginalType.TIMESTAMP_MICROS) { +(originalType == OriginalType.TIMESTAMP_MICROS && !rebaseDateTime)) { for (int i = rowId; i < rowId + num; ++i) { if (!column.isNullAt(i)) { column.putLong(i, dictionary.decodeToLong(dictionaryIds.getDictId(i))); } } } else if (originalType == OriginalType.TIMESTAMP_MILLIS) { + if (rebaseDateTime) { +for (int i = rowId; i < rowId + num; ++i) { + if (!column.isNullAt(i)) { +long julianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i)); +long julianMicros = DateTimeUtils.fromMillis(julianMillis); +long gregorianMicros = RebaseDateTime.rebaseJulianToGregorianMicros(julianMicros); +column.putLong(i, gregorianMicros); + } +} + } else { +for (int i = rowId; i < rowId + num; ++i) { + if (!column.isNullAt(i)) { +
[spark] branch master updated (9f768fa -> 5d5866b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9f768fa [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps add 5d5866b [SPARK-31672][SQL] Fix loading of timestamps before 1582-10-15 from dictionary encoded Parquet columns No new revisions were added by this update. Summary of changes: .../parquet/VectorizedColumnReader.java| 31 +-- .../datasources/parquet/ParquetIOSuite.scala | 65 -- 2 files changed, 64 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9f768fa -> 5d5866b)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9f768fa [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps add 5d5866b [SPARK-31672][SQL] Fix loading of timestamps before 1582-10-15 from dictionary encoded Parquet columns No new revisions were added by this update. Summary of changes: .../parquet/VectorizedColumnReader.java| 31 +-- .../datasources/parquet/ParquetIOSuite.scala | 65 -- 2 files changed, 64 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a75dc80 -> 9f768fa)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a75dc80 [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference add 9f768fa [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/RandomDataGenerator.scala | 23 +++--- 1 file changed, 20 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6f7c719 [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps 6f7c719 is described below commit 6f7c71947073f147bc35da196139d5ceb6fbdf45 Author: Max Gekk AuthorDate: Sun May 10 14:22:12 2020 -0500 [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps ### What changes were proposed in this pull request? Shift non-existing dates in Proleptic Gregorian calendar by 1 day. The reason for that is `RowEncoderSuite` generates random dates/timestamps in the hybrid calendar, and some dates/timestamps don't exist in Proleptic Gregorian calendar like 1000-02-29 because 1000 is not leap year in Proleptic Gregorian calendar. ### Why are the changes needed? This makes RowEncoderSuite much stable. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running RowEncoderSuite and set non-existing date manually: ```scala val date = new java.sql.Date(1000 - 1900, 1, 29) Try { date.toLocalDate; date }.getOrElse(new Date(date.getTime + MILLIS_PER_DAY)) ``` Closes #28486 from MaxGekk/fix-RowEncoderSuite. Authored-by: Max Gekk Signed-off-by: Sean Owen (cherry picked from commit 9f768fa9916dec3cc695e3f28ec77148d81d335f) Signed-off-by: Sean Owen --- .../org/apache/spark/sql/RandomDataGenerator.scala | 23 +++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala index a7c20c3..5a4d23d 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala @@ -18,9 +18,10 @@ package org.apache.spark.sql import java.math.MathContext +import java.sql.{Date, Timestamp} import scala.collection.mutable -import scala.util.Random +import scala.util.{Random, Try} import org.apache.spark.sql.catalyst.CatalystTypeConverters import org.apache.spark.sql.catalyst.util.DateTimeConstants.MILLIS_PER_DAY @@ -172,7 +173,15 @@ object RandomDataGenerator { // January 1, 1970, 00:00:00 GMT for "-12-31 23:59:59.99". milliseconds = rand.nextLong() % 25340232959L } -DateTimeUtils.toJavaDate((milliseconds / MILLIS_PER_DAY).toInt) +val date = DateTimeUtils.toJavaDate((milliseconds / MILLIS_PER_DAY).toInt) +// The generated `date` is based on the hybrid calendar Julian + Gregorian since +// 1582-10-15 but it should be valid in Proleptic Gregorian calendar too which is used +// by Spark SQL since version 3.0 (see SPARK-26651). We try to convert `date` to +// a local date in Proleptic Gregorian calendar to satisfy this requirement. +// Some years are leap years in Julian calendar but not in Proleptic Gregorian calendar. +// As the consequence of that, 29 February of such years might not exist in Proleptic +// Gregorian calendar. When this happens, we shift the date by one day. +Try { date.toLocalDate; date }.getOrElse(new Date(date.getTime + MILLIS_PER_DAY)) } Some(generator) case TimestampType => @@ -188,7 +197,15 @@ object RandomDataGenerator { milliseconds = rand.nextLong() % 25340232959L } // DateTimeUtils.toJavaTimestamp takes microsecond. -DateTimeUtils.toJavaTimestamp(milliseconds * 1000) +val ts = DateTimeUtils.toJavaTimestamp(milliseconds * 1000) +// The generated `ts` is based on the hybrid calendar Julian + Gregorian since +// 1582-10-15 but it should be valid in Proleptic Gregorian calendar too which is used +// by Spark SQL since version 3.0 (see SPARK-26651). We try to convert `ts` to +// a local timestamp in Proleptic Gregorian calendar to satisfy this requirement. +// Some years are leap years in Julian calendar but not in Proleptic Gregorian calendar. +// As the consequence of that, 29 February of such years might not exist in Proleptic +// Gregorian calendar. When this happens, we shift the timestamp `ts` by one day. +Try { ts.toLocalDateTime; ts }.getOrElse(new Timestamp(ts.getTime + MILLIS_PER_DAY)) } Some(generator) case CalendarIntervalType => Some(() => { - To unsubscribe, e-mail:
[spark] branch master updated (ce63bef -> a75dc80)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ce63bef [SPARK-31662][SQL] Fix loading of dates before 1582-10-15 from dictionary encoded Parquet columns add a75dc80 [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 20 +- docs/sql-ref-ansi-compliance.md| 18 +- docs/sql-ref-datatypes.md | 4 +- docs/sql-ref-functions-builtin.md | 2 +- docs/sql-ref-functions-udf-aggregate.md| 101 docs/sql-ref-functions-udf-hive.md | 12 +- docs/sql-ref-functions-udf-scalar.md | 28 +- docs/sql-ref-identifier.md | 37 ++- docs/sql-ref-literals.md | 282 + docs/sql-ref-null-semantics.md | 44 ++-- docs/sql-ref-syntax-aux-analyze-table.md | 64 ++--- docs/sql-ref-syntax-aux-cache-cache-table.md | 98 +++ docs/sql-ref-syntax-aux-cache-clear-cache.md | 16 +- docs/sql-ref-syntax-aux-cache-refresh.md | 24 +- docs/sql-ref-syntax-aux-cache-uncache-table.md | 31 +-- docs/sql-ref-syntax-aux-conf-mgmt-reset.md | 10 +- docs/sql-ref-syntax-aux-conf-mgmt-set.md | 31 +-- docs/sql-ref-syntax-aux-describe-database.md | 21 +- docs/sql-ref-syntax-aux-describe-function.md | 30 +-- docs/sql-ref-syntax-aux-describe-query.md | 44 ++-- docs/sql-ref-syntax-aux-describe-table.md | 62 ++--- docs/sql-ref-syntax-aux-refresh-table.md | 31 +-- docs/sql-ref-syntax-aux-resource-mgmt-add-file.md | 21 +- docs/sql-ref-syntax-aux-resource-mgmt-add-jar.md | 21 +- docs/sql-ref-syntax-aux-resource-mgmt-list-file.md | 14 +- docs/sql-ref-syntax-aux-resource-mgmt-list-jar.md | 14 +- docs/sql-ref-syntax-aux-show-columns.md| 2 +- docs/sql-ref-syntax-aux-show-create-table.md | 27 +- docs/sql-ref-syntax-aux-show-databases.md | 32 +-- docs/sql-ref-syntax-aux-show-functions.md | 60 ++--- docs/sql-ref-syntax-aux-show-partitions.md | 47 ++-- docs/sql-ref-syntax-aux-show-table.md | 60 ++--- docs/sql-ref-syntax-aux-show-tables.md | 41 ++- docs/sql-ref-syntax-aux-show-tblproperties.md | 51 ++-- docs/sql-ref-syntax-aux-show-views.md | 45 ++-- docs/sql-ref-syntax-aux-show.md| 4 +- docs/sql-ref-syntax-ddl-alter-database.md | 17 +- docs/sql-ref-syntax-ddl-alter-table.md | 256 --- docs/sql-ref-syntax-ddl-alter-view.md | 124 - docs/sql-ref-syntax-ddl-create-database.md | 39 +-- docs/sql-ref-syntax-ddl-create-function.md | 85 +++ docs/sql-ref-syntax-ddl-create-table-datasource.md | 100 docs/sql-ref-syntax-ddl-create-table-hiveformat.md | 99 docs/sql-ref-syntax-ddl-create-table-like.md | 73 +++--- docs/sql-ref-syntax-ddl-create-table.md| 10 +- docs/sql-ref-syntax-ddl-create-view.md | 82 +++--- docs/sql-ref-syntax-ddl-drop-database.md | 42 ++- docs/sql-ref-syntax-ddl-drop-function.md | 55 ++-- docs/sql-ref-syntax-ddl-drop-table.md | 45 ++-- docs/sql-ref-syntax-ddl-drop-view.md | 49 ++-- docs/sql-ref-syntax-ddl-repair-table.md| 25 +- docs/sql-ref-syntax-ddl-truncate-table.md | 43 ++-- docs/sql-ref-syntax-dml-insert-into.md | 90 +++ ...f-syntax-dml-insert-overwrite-directory-hive.md | 75 +++--- ...ql-ref-syntax-dml-insert-overwrite-directory.md | 74 +++--- docs/sql-ref-syntax-dml-insert-overwrite-table.md | 87 +++ docs/sql-ref-syntax-dml-insert.md | 8 +- docs/sql-ref-syntax-dml-load.md| 67 ++--- docs/sql-ref-syntax-dml.md | 4 +- docs/sql-ref-syntax-qry-explain.md | 58 ++--- docs/sql-ref-syntax-qry-sampling.md| 20 +- docs/sql-ref-syntax-qry-select-clusterby.md| 33 ++- docs/sql-ref-syntax-qry-select-cte.md | 35 ++- docs/sql-ref-syntax-qry-select-distribute-by.md| 33 ++- docs/sql-ref-syntax-qry-select-groupby.md | 261 ++- docs/sql-ref-syntax-qry-select-having.md | 54 ++-- docs/sql-ref-syntax-qry-select-hints.md| 56 ++-- docs/sql-ref-syntax-qry-select-inline-table.md | 35 +-- docs/sql-ref-syntax-qry-select-join.md | 185 ++ docs/sql-ref-syntax-qry-select-like.md | 51 ++-- docs/sql-ref-syntax-qry-select-limit.md| 41 ++- docs/sql-ref-syntax-qry-select-orderby.md