[spark] branch master updated (88fc48f5e7e -> 5a762687f41)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 88fc48f5e7e [SPARK-41431][CORE][SQL][UI] Protobuf serializer for `SQLExecutionUIData` add 5a762687f41 [SPARK-41430][UI] Protobuf serializer for ProcessSummaryWrapper No new revisions were added by this update. Summary of changes: .../apache/spark/status/protobuf/store_types.proto | 14 .../org.apache.spark.status.protobuf.ProtobufSerDe | 1 + .../protobuf/ProcessSummaryWrapperSerializer.scala | 77 ++ .../protobuf/KVStoreProtobufSerializerSuite.scala | 27 4 files changed, 119 insertions(+) create mode 100644 core/src/main/scala/org/apache/spark/status/protobuf/ProcessSummaryWrapperSerializer.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41431][CORE][SQL][UI] Protobuf serializer for `SQLExecutionUIData`
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 88fc48f5e7e [SPARK-41431][CORE][SQL][UI] Protobuf serializer for `SQLExecutionUIData` 88fc48f5e7e is described below commit 88fc48f5e7e907c25d082a7b35231744ccef2c7e Author: yangjie01 AuthorDate: Fri Dec 23 15:53:40 2022 -0800 [SPARK-41431][CORE][SQL][UI] Protobuf serializer for `SQLExecutionUIData` ### What changes were proposed in this pull request? Add Protobuf serializer for `SQLExecutionUIData` ### Why are the changes needed? Support fast and compact serialization/deserialization for `SQLExecutionUIData` over RocksDB. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new UT Closes #39139 from LuciferYang/SPARK-41431. Authored-by: yangjie01 Signed-off-by: Gengliang Wang --- .../apache/spark/status/protobuf/store_types.proto | 21 + sql/core/pom.xml | 5 ++ .../org.apache.spark.status.protobuf.ProtobufSerDe | 18 + .../sql/SQLExecutionUIDataSerializer.scala | 90 ++ .../protobuf/sql/SQLPlanMetricSerializer.scala | 36 + .../sql/KVStoreProtobufSerializerSuite.scala | 88 + 6 files changed, 258 insertions(+) diff --git a/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto b/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto index 7cf5c2921cb..cb0dea540bd 100644 --- a/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto +++ b/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto @@ -355,3 +355,24 @@ message ExecutorSummary { message ExecutorSummaryWrapper { ExecutorSummary info = 1; } + +message SQLPlanMetric { + string name = 1; + int64 accumulator_id = 2; + string metric_type = 3; +} + +message SQLExecutionUIData { + int64 execution_id = 1; + string description = 2; + string details = 3; + string physical_plan_description = 4; + map modified_configs = 5; + repeated SQLPlanMetric metrics = 6; + int64 submission_time = 7; + optional int64 completion_time = 8; + optional string error_message = 9; + map jobs = 10; + repeated int64 stages = 11; + map metric_values = 12; +} diff --git a/sql/core/pom.xml b/sql/core/pom.xml index cfcf7455ad0..71c57f8a7f7 100644 --- a/sql/core/pom.xml +++ b/sql/core/pom.xml @@ -147,6 +147,11 @@ org.apache.xbean xbean-asm9-shaded + + com.google.protobuf + protobuf-java + ${protobuf.version} + org.scalacheck scalacheck_${scala.binary.version} diff --git a/sql/core/src/main/resources/META-INF/services/org.apache.spark.status.protobuf.ProtobufSerDe b/sql/core/src/main/resources/META-INF/services/org.apache.spark.status.protobuf.ProtobufSerDe new file mode 100644 index 000..de5f2c2d05c --- /dev/null +++ b/sql/core/src/main/resources/META-INF/services/org.apache.spark.status.protobuf.ProtobufSerDe @@ -0,0 +1,18 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +org.apache.spark.status.protobuf.sql.SQLExecutionUIDataSerializer diff --git a/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala b/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala new file mode 100644 index 000..8dc28517ff0 --- /dev/null +++ b/sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *
[spark] branch master updated: [MINOR][SQL] Remove redundant comment in CTESubstitution
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f8ad5e307b4 [MINOR][SQL] Remove redundant comment in CTESubstitution f8ad5e307b4 is described below commit f8ad5e307b4686f4c22dc47ba8dc2866fe55476b Author: Reynold Xin AuthorDate: Sat Dec 24 08:37:49 2022 +0900 [MINOR][SQL] Remove redundant comment in CTESubstitution ### What changes were proposed in this pull request? I think the proposed removal is just a duplicate of the next point. I actually spent couple minutes trying to understand if there was a difference. ### Why are the changes needed? See above. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a code comment change. Closes #39197 from rxin/cte_substitution_comment. Authored-by: Reynold Xin Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala | 8 1 file changed, 8 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala index 6a4562450b9..77c687843c3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala @@ -157,14 +157,6 @@ object CTESubstitution extends Rule[LogicalPlan] { * SELECT * FROM t * ) * SELECT * FROM t2 - * - If a CTE definition contains a subquery that contains an inner WITH node then substitution - * of inner should take precedence because it can shadow an outer CTE definition. - * For example the following query should return 2: - * WITH t AS (SELECT 1 AS c) - * SELECT max(c) FROM ( - * WITH t AS (SELECT 2 AS c) - * SELECT * FROM t - * ) * - If a CTE definition contains a subquery expression that contains an inner WITH node then * substitution of inner should take precedence because it can shadow an outer CTE * definition. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new aa39b06462a [MINOR][TEST][SQL] Add a CTE subquery scope test case aa39b06462a is described below commit aa39b06462a98f37be59e239d12edd9f09a25b88 Author: Reynold Xin AuthorDate: Fri Dec 23 14:55:14 2022 -0800 [MINOR][TEST][SQL] Add a CTE subquery scope test case ### What changes were proposed in this pull request? I noticed we were missing a test case for this in SQL tests, so I added one. ### Why are the changes needed? To ensure we scope CTEs properly in subqueries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a test case change. Closes #39189 from rxin/cte_test. Authored-by: Reynold Xin Signed-off-by: Reynold Xin (cherry picked from commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b) Signed-off-by: Reynold Xin --- .../test/resources/sql-tests/inputs/cte-nested.sql | 10 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++ .../resources/sql-tests/results/cte-nested.sql.out | 28 ++ .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++ 4 files changed, 94 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql index 5f12388b9cb..e5ef2443417 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql @@ -17,6 +17,16 @@ SELECT ( SELECT * FROM t ); +-- Make sure CTE in subquery is scoped to that subquery rather than global +-- the 2nd half of the union should fail because the cte is scoped to the first half +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte; + -- CTE in CTE definition shadows outer WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out index 264b64ffe96..ebdd64c3ac8 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out index 2c622de3f36..b6e1793f7d7 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out index 283f5a54a42..546ab7ecb95 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out @@ -36,6 +36,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1),
[spark] branch master updated: [MINOR][TEST][SQL] Add a CTE subquery scope test case
This is an automated email from the ASF dual-hosted git repository. rxin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 24edf8ecb5e [MINOR][TEST][SQL] Add a CTE subquery scope test case 24edf8ecb5e is described below commit 24edf8ecb5e47af294f89552dfd9957a2d9f193b Author: Reynold Xin AuthorDate: Fri Dec 23 14:55:14 2022 -0800 [MINOR][TEST][SQL] Add a CTE subquery scope test case ### What changes were proposed in this pull request? I noticed we were missing a test case for this in SQL tests, so I added one. ### Why are the changes needed? To ensure we scope CTEs properly in subqueries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a test case change. Closes #39189 from rxin/cte_test. Authored-by: Reynold Xin Signed-off-by: Reynold Xin --- .../test/resources/sql-tests/inputs/cte-nested.sql | 10 .../resources/sql-tests/results/cte-legacy.sql.out | 28 ++ .../resources/sql-tests/results/cte-nested.sql.out | 28 ++ .../sql-tests/results/cte-nonlegacy.sql.out| 28 ++ 4 files changed, 94 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql index 5f12388b9cb..e5ef2443417 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql @@ -17,6 +17,16 @@ SELECT ( SELECT * FROM t ); +-- Make sure CTE in subquery is scoped to that subquery rather than global +-- the 2nd half of the union should fail because the cte is scoped to the first half +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte; + -- CTE in CTE definition shadows outer WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out index 013c5f27b50..65000471c75 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out index ed6d69b233e..2c67f2db56a 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), diff --git a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out index 6a48e1bec43..154ebd20223 100644 --- a/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out @@ -33,6 +33,34 @@ struct 1 +-- !query +SELECT * FROM + ( + WITH cte AS (SELECT * FROM range(10)) + SELECT * FROM cte WHERE id = 8 + ) a +UNION +SELECT * FROM cte +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException +{ + "errorClass" : "TABLE_OR_VIEW_NOT_FOUND", + "sqlState" : "42000", + "messageParameters" : { +"relationName" : "`cte`" + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 120, +"stopIndex" : 122, +"fragment" : "cte" + } ] +} + + -- !query WITH t AS (SELECT 1), - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands,
[spark] branch master updated: [SPARK-41697][CONNECT][TESTS] Enable test_df_show, test_drop, test_dropna, test_toDF_with_schema_string and test_with_columns_renamed
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 82a22606c59 [SPARK-41697][CONNECT][TESTS] Enable test_df_show, test_drop, test_dropna, test_toDF_with_schema_string and test_with_columns_renamed 82a22606c59 is described below commit 82a22606c5951e8d0a9c270595d63d6836f2d51b Author: Hyukjin Kwon AuthorDate: Fri Dec 23 21:52:58 2022 +0900 [SPARK-41697][CONNECT][TESTS] Enable test_df_show, test_drop, test_dropna, test_toDF_with_schema_string and test_with_columns_renamed ### What changes were proposed in this pull request? This PR enables the reused PySpark tests in Spark Connect that pass now. ### Why are the changes needed? To make sure on the test coverage. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran it in my local. Closes #39193 from HyukjinKwon/SPARK-41697. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../sql/tests/connect/test_parity_dataframe.py | 24 -- 1 file changed, 24 deletions(-) diff --git a/python/pyspark/sql/tests/connect/test_parity_dataframe.py b/python/pyspark/sql/tests/connect/test_parity_dataframe.py index ccb5dd45b54..7dfdc8de751 100644 --- a/python/pyspark/sql/tests/connect/test_parity_dataframe.py +++ b/python/pyspark/sql/tests/connect/test_parity_dataframe.py @@ -71,22 +71,10 @@ class DataFrameParityTests(DataFrameTestsMixin, ReusedSQLTestCase): def test_create_nan_decimal_dataframe(self): super().test_create_nan_decimal_dataframe() -@unittest.skip("Fails in Spark Connect, should enable.") -def test_df_show(self): -super().test_df_show() - -@unittest.skip("Fails in Spark Connect, should enable.") -def test_drop(self): -super().test_drop() - @unittest.skip("Fails in Spark Connect, should enable.") def test_drop_duplicates(self): super().test_drop_duplicates() -@unittest.skip("Fails in Spark Connect, should enable.") -def test_dropna(self): -super().test_dropna() - @unittest.skip("Fails in Spark Connect, should enable.") def test_duplicated_column_names(self): super().test_duplicated_column_names() @@ -99,10 +87,6 @@ class DataFrameParityTests(DataFrameTestsMixin, ReusedSQLTestCase): def test_fillna(self): super().test_fillna() -@unittest.skip("Fails in Spark Connect, should enable.") -def test_freqItems(self): -super().test_freqItems() - @unittest.skip("Fails in Spark Connect, should enable.") def test_generic_hints(self): super().test_generic_hints() @@ -163,10 +147,6 @@ class DataFrameParityTests(DataFrameTestsMixin, ReusedSQLTestCase): def test_to(self): super().test_to() -@unittest.skip("Fails in Spark Connect, should enable.") -def test_toDF_with_schema_string(self): -super().test_toDF_with_schema_string() - @unittest.skip("Fails in Spark Connect, should enable.") def test_to_local_iterator(self): super().test_to_local_iterator() @@ -219,10 +199,6 @@ class DataFrameParityTests(DataFrameTestsMixin, ReusedSQLTestCase): def test_unpivot(self): super().test_unpivot() -@unittest.skip("Fails in Spark Connect, should enable.") -def test_with_columns_renamed(self): -super().test_with_columns_renamed() - if __name__ == "__main__": import unittest - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (827ca9b8247 -> 391a268843e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 827ca9b8247 [SPARK-41498] Propagate metadata through Union add 391a268843e [SPARK-41698][CONNECT][TESTS] Enable 16 tests in pyspark.sql.tests.connect.test_parity_functions No new revisions were added by this update. Summary of changes: .../sql/tests/connect/test_parity_functions.py | 64 -- 1 file changed, 64 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c526741ed09 -> 827ca9b8247)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from c526741ed09 [MINOR][DOCS] Fix grammatical error in streaming programming guide add 827ca9b8247 [SPARK-41498] Propagate metadata through Union No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 20 +- .../plans/logical/basicLogicalOperators.scala | 58 -- .../sql/connector/catalog/InMemoryBaseTable.scala | 2 +- .../spark/sql/connector/MetadataColumnSuite.scala | 231 + 4 files changed, 291 insertions(+), 20 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][DOCS] Fix grammatical error in streaming programming guide
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c526741ed09 [MINOR][DOCS] Fix grammatical error in streaming programming guide c526741ed09 is described below commit c526741ed095ad88fdfb3f3c5233dcf483d3fe71 Author: Austin Wang <62589566+wangaus...@users.noreply.github.com> AuthorDate: Fri Dec 23 20:24:48 2022 +0900 [MINOR][DOCS] Fix grammatical error in streaming programming guide ### What changes were proposed in this pull request? This change fixes a grammatical error in the documentation. ### Why are the changes needed? There is a grammatical error in the documentation. ### Does this PR introduce _any_ user-facing change? Yes. Previously the sentence in question reads "This leads to two kinds of data in the system that need to recovered in the event of failures," but it should instead read "This leads to two kinds of data in the system that need to be recovered in the event of failures." ### How was this patch tested? Tests were not added. This was a simple text change that can be previewed immediately. Closes #39145 from wangaustin/patch-1. Authored-by: Austin Wang <62589566+wangaus...@users.noreply.github.com> Signed-off-by: Hyukjin Kwon --- docs/streaming-programming-guide.md | 38 ++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 4a104238a6d..0b8e55d84e6 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -725,7 +725,7 @@ of its creation, the new data will be picked up. In contrast, Object Stores such as Amazon S3 and Azure Storage usually have slow rename operations, as the data is actually copied. -Furthermore, renamed object may have the time of the `rename()` operation as its modification time, so +Furthermore, a renamed object may have the time of the `rename()` operation as its modification time, so may not be considered part of the window which the original create time implied they were. Careful testing is needed against the target object store to verify that the timestamp behavior @@ -1140,7 +1140,7 @@ said two parameters - windowLength and slideInterval. Join Operations {:.no_toc} -Finally, its worth highlighting how easily you can perform different kinds of joins in Spark Streaming. +Finally, it's worth highlighting how easily you can perform different kinds of joins in Spark Streaming. # Stream-stream joins @@ -1236,7 +1236,7 @@ For the Python API, see [DStream](api/python/reference/api/pyspark.streaming.DSt *** ## Output Operations on DStreams -Output operations allow DStream's data to be pushed out to external systems like a database or a file systems. +Output operations allow DStream's data to be pushed out to external systems like a database or a file system. Since the output operations actually allow the transformed data to be consumed by external systems, they trigger the actual execution of all the DStream transformations (similar to actions for RDDs). Currently, the following output operations are defined: @@ -1293,7 +1293,7 @@ Currently, the following output operations are defined: However, it is important to understand how to use this primitive correctly and efficiently. Some of the common mistakes to avoid are as follows. -Often writing data to external system requires creating a connection object +Often writing data to external systems requires creating a connection object (e.g. TCP connection to a remote server) and using it to send data to a remote system. For this purpose, a developer may inadvertently try creating a connection object at the Spark driver, and then try to use it in a Spark worker to save records in the RDDs. @@ -1481,7 +1481,7 @@ Note that the connections in the pool should be lazily created on demand and tim *** ## DataFrame and SQL Operations -You can easily use [DataFrames and SQL](sql-programming-guide.html) operations on streaming data. You have to create a SparkSession using the SparkContext that the StreamingContext is using. Furthermore, this has to done such that it can be restarted on driver failures. This is done by creating a lazily instantiated singleton instance of SparkSession. This is shown in the following example. It modifies the earlier [word count example](#a-quick-example) to generate word counts using DataF [...] +You can easily use [DataFrames and SQL](sql-programming-guide.html) operations on streaming data. You have to create a SparkSession using the SparkContext that the StreamingContext is using. Furthermore, this has to be done such that it can be restarted on driver
[spark] branch master updated: [SPARK-41407][SQL][FOLLOW-UP] Use string jobTrackerID for FileFormatWriter.executeTask
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3d5af030707 [SPARK-41407][SQL][FOLLOW-UP] Use string jobTrackerID for FileFormatWriter.executeTask 3d5af030707 is described below commit 3d5af030707e1dd8bd14a1bee244350329303943 Author: Hyukjin Kwon AuthorDate: Fri Dec 23 20:22:33 2022 +0900 [SPARK-41407][SQL][FOLLOW-UP] Use string jobTrackerID for FileFormatWriter.executeTask ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/38939 that fixes a logical conflict during merging PRs, see https://github.com/apache/spark/pull/38980 and https://github.com/apache/spark/pull/38939. ### Why are the changes needed? To recover the broken build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manually tested: ``` ./build/sbt -Phive clean package ``` Closes #39194 from HyukjinKwon/SPARK-41407. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/sql/execution/datasources/WriteFiles.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala index 39b7b252f6e..5bc8f9db32b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala @@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.datasources import java.util.Date import org.apache.spark.{SparkException, TaskContext} -import org.apache.spark.internal.io.FileCommitProtocol +import org.apache.spark.internal.io.{FileCommitProtocol, SparkHadoopWriterUtils} import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions.Attribute @@ -72,7 +72,7 @@ case class WriteFilesExec(child: SparkPlan) extends UnaryExecNode { val concurrentOutputWriterSpec = writeFilesSpec.concurrentOutputWriterSpecFunc(child) val description = writeFilesSpec.description val committer = writeFilesSpec.committer -val jobIdInstant = new Date().getTime +val jobTrackerID = SparkHadoopWriterUtils.createJobTrackerID(new Date()) rddWithNonEmptyPartitions.mapPartitionsInternal { iterator => val sparkStageId = TaskContext.get().stageId() val sparkPartitionId = TaskContext.get().partitionId() @@ -80,7 +80,7 @@ case class WriteFilesExec(child: SparkPlan) extends UnaryExecNode { val ret = FileFormatWriter.executeTask( description, -jobIdInstant, +jobTrackerID, sparkStageId, sparkPartitionId, sparkAttemptNumber, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41383][SPARK-41692][SPARK-41693] Implement `rollup`, `cube` and `pivot`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e1af3a992e0 [SPARK-41383][SPARK-41692][SPARK-41693] Implement `rollup`, `cube` and `pivot` e1af3a992e0 is described below commit e1af3a992e06aeb5185501db908dc272b449c62b Author: Ruifeng Zheng AuthorDate: Fri Dec 23 19:51:44 2022 +0900 [SPARK-41383][SPARK-41692][SPARK-41693] Implement `rollup`, `cube` and `pivot` ### What changes were proposed in this pull request? Implement `rollup`, `cube` and `pivot`: 1. `DataFrame.rollup` 2. `DataFrame.cube` 3. `DataFrame.groupBy.pivot` ### Why are the changes needed? for API coverage ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? added UT Closes #39191 from zhengruifeng/connect_groupby_refactor. Authored-by: Ruifeng Zheng Signed-off-by: Hyukjin Kwon --- .../main/protobuf/spark/connect/relations.proto| 34 +- .../org/apache/spark/sql/connect/dsl/package.scala | 50 +++- .../planner/LiteralValueProtoConverter.scala | 2 +- .../sql/connect/planner/SparkConnectPlanner.scala | 82 + .../connect/planner/SparkConnectPlannerSuite.scala | 3 +- .../connect/planner/SparkConnectProtoSuite.scala | 77 python/pyspark/sql/connect/dataframe.py| 48 +++- python/pyspark/sql/connect/group.py| 82 +++-- python/pyspark/sql/connect/plan.py | 63 +++--- python/pyspark/sql/connect/proto/relations_pb2.py | 112 + python/pyspark/sql/connect/proto/relations_pb2.pyi | 90 -- .../sql/tests/connect/test_connect_basic.py| 136 + 12 files changed, 667 insertions(+), 112 deletions(-) diff --git a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto index c4f040c03d6..912ee1fdc63 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto @@ -235,11 +235,39 @@ message Tail { // Relation of type [[Aggregate]]. message Aggregate { - // (Required) Input relation for a Aggregate. + // (Required) Input relation for a RelationalGroupedDataset. Relation input = 1; - repeated Expression grouping_expressions = 2; - repeated Expression result_expressions = 3; + // (Required) How the RelationalGroupedDataset was built. + GroupType group_type = 2; + + // (Required) Expressions for grouping keys + repeated Expression grouping_expressions = 3; + + // (Required) List of values that will be translated to columns in the output DataFrame. + repeated Expression aggregate_expressions = 4; + + // (Optional) Pivots a column of the current `DataFrame` and performs the specified aggregation. + Pivot pivot = 5; + + enum GroupType { +GROUP_TYPE_UNSPECIFIED = 0; +GROUP_TYPE_GROUPBY = 1; +GROUP_TYPE_ROLLUP = 2; +GROUP_TYPE_CUBE = 3; +GROUP_TYPE_PIVOT = 4; + } + + message Pivot { +// (Required) The column to pivot +Expression col = 1; + +// (Optional) List of values that will be translated to columns in the output DataFrame. +// +// Note that if it is empty, the server side will immediately trigger a job to collect +// the distinct values of the column. +repeated Expression.Literal values = 2; + } } // Relation of type [[Sort]]. diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala index b15e46293ab..e6d230d9eef 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala @@ -601,16 +601,64 @@ package object dsl { def groupBy(groupingExprs: Expression*)(aggregateExprs: Expression*): Relation = { val agg = Aggregate.newBuilder() agg.setInput(logicalPlan) +agg.setGroupType(proto.Aggregate.GroupType.GROUP_TYPE_GROUPBY) for (groupingExpr <- groupingExprs) { agg.addGroupingExpressions(groupingExpr) } for (aggregateExpr <- aggregateExprs) { - agg.addResultExpressions(aggregateExpr) + agg.addAggregateExpressions(aggregateExpr) } Relation.newBuilder().setAggregate(agg.build()).build() } + def rollup(groupingExprs: Expression*)(aggregateExprs: Expression*): Relation = { +val agg = Aggregate.newBuilder() +agg.setInput(logicalPlan) +
[spark] branch master updated (a1c727f3867 -> 2ffa8178df1)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a1c727f3867 [SPARK-41666][PYTHON] Support parameterized SQL by `sql()` add 2ffa8178df1 [SPARK-41407][SQL] Pull out v1 write to WriteFiles No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 2 +- .../spark/sql/errors/QueryExecutionErrors.scala| 2 +- .../org/apache/spark/sql/internal/WriteSpec.java} | 19 ++- .../org/apache/spark/sql/execution/SparkPlan.scala | 26 +++- .../spark/sql/execution/SparkStrategies.scala | 3 + .../execution/command/createDataSourceTables.scala | 6 + .../sql/execution/datasources/DataSource.scala | 29 ++-- .../execution/datasources/FileFormatWriter.scala | 154 - .../spark/sql/execution/datasources/V1Writes.scala | 21 ++- .../sql/execution/datasources/WriteFiles.scala | 104 ++ .../datasources/V1WriteCommandSuite.scala | 7 +- 11 files changed, 310 insertions(+), 63 deletions(-) copy sql/{catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/PartitionOffset.java => core/src/main/java/org/apache/spark/sql/internal/WriteSpec.java} (68%) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteFiles.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41666][PYTHON] Support parameterized SQL by `sql()`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a1c727f3867 [SPARK-41666][PYTHON] Support parameterized SQL by `sql()` a1c727f3867 is described below commit a1c727f386724156f680953fa34ec51bb35348a4 Author: Max Gekk AuthorDate: Fri Dec 23 12:30:30 2022 +0300 [SPARK-41666][PYTHON] Support parameterized SQL by `sql()` ### What changes were proposed in this pull request? In the PR, I propose to extend the `sql()` method in PySpark to support parameterized SQL queries, see https://github.com/apache/spark/pull/38864, and add new parameter - `args` of the type `Dict[str, str]`. This parameter maps named parameters that can occur in the input SQL query to SQL literals like 1, INTERVAL '1-1' YEAR TO MONTH, DATE'2022-12-22' (see [the doc ](https://spark.apache.org/docs/latest/sql-ref-literals.html)of supported literals). For example: ```python >>> spark.sql("SELECT * FROM range(10) WHERE id > :minId", args = {"minId" : "7"}) id 0 8 1 9 ``` Closes #39159 ### Why are the changes needed? To achieve feature parity with Scala/Java API, and provide PySpark users the same feature. ### Does this PR introduce _any_ user-facing change? No, it shouldn't. ### How was this patch tested? Checked the examples locally, and running the tests: ``` $ python/run-tests --modules=pyspark-sql --parallelism=1 ``` Closes #39183 from MaxGekk/parameterized-sql-pyspark-dict. Authored-by: Max Gekk Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 6 +++--- .../source/migration_guide/pyspark_3.3_to_3.4.rst | 2 ++ python/pyspark/pandas/sql_formatter.py | 20 +-- python/pyspark/sql/session.py | 23 ++ 4 files changed, 42 insertions(+), 9 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index ff235e80dbb..95db9005d02 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -813,7 +813,7 @@ }, "INVALID_SQL_ARG" : { "message" : [ - "The argument of `sql()` is invalid. Consider to replace it by a SQL literal statement." + "The argument of `sql()` is invalid. Consider to replace it by a SQL literal." ] }, "INVALID_SQL_SYNTAX" : { @@ -1164,7 +1164,7 @@ }, "UNBOUND_SQL_PARAMETER" : { "message" : [ - "Found the unbound parameter: . Please, fix `args` and provide a mapping of the parameter to a SQL literal statement." + "Found the unbound parameter: . Please, fix `args` and provide a mapping of the parameter to a SQL literal." ] }, "UNCLOSED_BRACKETED_COMMENT" : { @@ -5225,4 +5225,4 @@ "grouping() can only be used with GroupingSets/Cube/Rollup" ] } -} \ No newline at end of file +} diff --git a/python/docs/source/migration_guide/pyspark_3.3_to_3.4.rst b/python/docs/source/migration_guide/pyspark_3.3_to_3.4.rst index b3baa8345aa..ca942c54979 100644 --- a/python/docs/source/migration_guide/pyspark_3.3_to_3.4.rst +++ b/python/docs/source/migration_guide/pyspark_3.3_to_3.4.rst @@ -39,3 +39,5 @@ Upgrading from PySpark 3.3 to 3.4 * In Spark 3.4, the ``Series.concat`` sort parameter will be respected to follow pandas 1.4 behaviors. * In Spark 3.4, the ``DataFrame.__setitem__`` will make a copy and replace pre-existing arrays, which will NOT be over-written to follow pandas 1.4 behaviors. + +* In Spark 3.4, the ``SparkSession.sql`` and the Pandas on Spark API ``sql`` have got new parameter ``args`` which provides binding of named parameters to their SQL literals. diff --git a/python/pyspark/pandas/sql_formatter.py b/python/pyspark/pandas/sql_formatter.py index 45c615161d9..9103366c192 100644 --- a/python/pyspark/pandas/sql_formatter.py +++ b/python/pyspark/pandas/sql_formatter.py @@ -17,7 +17,7 @@ import os import string -from typing import Any, Optional, Union, List, Sequence, Mapping, Tuple +from typing import Any, Dict, Optional, Union, List, Sequence, Mapping, Tuple import uuid import warnings @@ -43,6 +43,7 @@ _CAPTURE_SCOPES = 3 def sql( query: str, index_col: Optional[Union[str, List[str]]] = None, +args: Dict[str, str] = {}, **kwargs: Any, ) -> DataFrame: """ @@ -57,6 +58,8 @@ def sql( * pandas Series * string +Also the method can bind named parameters to SQL literals from `args`. + Parameters -- query : str @@ -99,6 +102,12 @@ def sql( e f 3 6 Also note that the index name(s) should be matched to the existing name. +args : dict +
[spark] branch master updated: [SPARK-41422][UI] Protobuf serializer for ExecutorSummaryWrapper
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d0ec53cf953 [SPARK-41422][UI] Protobuf serializer for ExecutorSummaryWrapper d0ec53cf953 is described below commit d0ec53cf9531cba6a154b98d4593a0be50e712d8 Author: Sandeep Singh AuthorDate: Fri Dec 23 00:30:33 2022 -0800 [SPARK-41422][UI] Protobuf serializer for ExecutorSummaryWrapper ### What changes were proposed in this pull request? Add Protobuf serializer for ExecutorSummaryWrapper ### Why are the changes needed? Support fast and compact serialization/deserialization for ExecutorSummaryWrapper over RocksDB. ### Does this PR introduce any user-facing change? No ### How was this patch tested? New UT Closes #39141 from techaddict/SPARK-41422-ExecutorSummaryWrapper. Authored-by: Sandeep Singh Signed-off-by: Gengliang Wang --- .../apache/spark/status/protobuf/store_types.proto | 50 ++ .../org.apache.spark.status.protobuf.ProtobufSerDe | 1 + .../ExecutorSummaryWrapperSerializer.scala | 183 + .../protobuf/KVStoreProtobufSerializerSuite.scala | 131 ++- 4 files changed, 364 insertions(+), 1 deletion(-) diff --git a/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto b/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto index 5949ad63c84..7cf5c2921cb 100644 --- a/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto +++ b/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto @@ -305,3 +305,53 @@ message SpeculationStageSummaryWrapper { int32 stage_attempt_id = 2; SpeculationStageSummary info = 3; } + +message MemoryMetrics { + int64 used_on_heap_storage_memory = 1; + int64 used_off_heap_storage_memory = 2; + int64 total_on_heap_storage_memory = 3; + int64 total_off_heap_storage_memory = 4; +} + +message ResourceInformation { + string name = 1; + repeated string addresses = 2; +} + +message ExecutorSummary { + string id = 1; + string host_port = 2; + bool is_active = 3; + int32 rdd_blocks = 4; + int64 memory_used = 5; + int64 disk_used = 6; + int32 total_cores = 7; + int32 max_tasks = 8; + int32 active_tasks = 9; + int32 failed_tasks = 10; + int32 completed_tasks = 11; + int32 total_tasks = 12; + int64 total_duration = 13; + int64 total_gc_time = 14; + int64 total_input_bytes = 15; + int64 total_shuffle_read = 16; + int64 total_shuffle_write = 17; + bool is_blacklisted = 18; + int64 max_memory = 19; + int64 add_time = 20; + optional int64 remove_time = 21; + optional string remove_reason = 22; + map executor_logs = 23; + optional MemoryMetrics memory_metrics = 24; + repeated int64 blacklisted_in_stages = 25; + optional ExecutorMetrics peak_memory_metrics = 26; + map attributes = 27; + map resources = 28; + int32 resource_profile_id = 29; + bool is_excluded = 30; + repeated int64 excluded_in_stages = 31; +} + +message ExecutorSummaryWrapper { + ExecutorSummary info = 1; +} diff --git a/core/src/main/resources/META-INF/services/org.apache.spark.status.protobuf.ProtobufSerDe b/core/src/main/resources/META-INF/services/org.apache.spark.status.protobuf.ProtobufSerDe index 97c186206bf..b714ea73b36 100644 --- a/core/src/main/resources/META-INF/services/org.apache.spark.status.protobuf.ProtobufSerDe +++ b/core/src/main/resources/META-INF/services/org.apache.spark.status.protobuf.ProtobufSerDe @@ -25,3 +25,4 @@ org.apache.spark.status.protobuf.TaskDataWrapperSerializer org.apache.spark.status.protobuf.JobDataWrapperSerializer org.apache.spark.status.protobuf.ResourceProfileWrapperSerializer org.apache.spark.status.protobuf.SpeculationStageSummaryWrapperSerializer +org.apache.spark.status.protobuf.ExecutorSummaryWrapperSerializer diff --git a/core/src/main/scala/org/apache/spark/status/protobuf/ExecutorSummaryWrapperSerializer.scala b/core/src/main/scala/org/apache/spark/status/protobuf/ExecutorSummaryWrapperSerializer.scala new file mode 100644 index 000..03a810157d7 --- /dev/null +++ b/core/src/main/scala/org/apache/spark/status/protobuf/ExecutorSummaryWrapperSerializer.scala @@ -0,0 +1,183 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is