[GitHub] [spark] SparkQA commented on issue #25251: [MINOR] Trivial cleanups
SparkQA commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515335905 **[Test build #108191 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108191/testReport)** for PR 25251 at commit [`5cbeaf0`](https://github.com/apache/spark/commit/5cbeaf02cafdd627d6f31c8a41396a37ef971a7d). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
SparkQA commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515335903 **[Test build #108196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108196/testReport)** for PR 25259 at commit [`8158d5e`](https://github.com/apache/spark/commit/8158d5e27fce8e4bc5877ed7bb4f7c3876007c13). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
SparkQA commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-515335910 **[Test build #108194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108194/testReport)** for PR 25085 at commit [`e71c4d2`](https://github.com/apache/spark/commit/e71c4d2878fff642d34abbc71b9dff65354dafe5). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307610800 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct +-- !query 32 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 33 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 33 schema +struct +-- !query 33 output +0 A +0 B +1 AC +1 BC +2 ACC +2 BCC +3 ACCC +3 BCCC + + +-- !query 34 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 34 schema +struct +-- !query 34 output +0 A +1 AB +1 AC +2 ABB +2 ABC +2 ACB +2 ACC +3 ABBC +3 ABCC +3 ACBC +3 ACCC + + +-- !query 35 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 35 schema +struct +-- !query 35 output +0 A +0 B +1 AC +1 AD +1 BC +1 BD +2 ACC +2 ACD +2 ADC +2 ADD +2 BCC +2 BCD +2 BDC +2 BDD +3 ACCD +3 ACDD +3 ADCD +3 ADDD +3 BCCD +3 BCDD +3 BDCD +3 BDDD + + +-- !query 36 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 36 schema +struct<> +-- !query 36 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 37 +WITH RECURSIVE r(level) AS ( + VALUES (0), (0) + UNION + SELECT (level + 1) % 10 FROM r +) +SELECT * FROM r +-- !query 37 schema +struct +-- !query 37 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 38 +WITH RECURSIVE r(level) AS ( + VALUES (0) + INTERSECT + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r +-- !query 38 schema +struct<> +-- !query 38 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 39 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE (SELECT SUM(level) FROM r) < 10 +) +SELECT * FROM r +-- !query 39 schema +struct<> +
[GitHub] [spark] SparkQA commented on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance
SparkQA commented on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance URL: https://github.com/apache/spark/pull/22290#issuecomment-515335911 **[Test build #108198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108198/testReport)** for PR 22290 at commit [`45cfa21`](https://github.com/apache/spark/commit/45cfa2146dbb3ca6f6530c0147246dafd4ada762). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException
AmplabJenkins commented on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException URL: https://github.com/apache/spark/pull/24947#issuecomment-515335972 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException
AmplabJenkins commented on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException URL: https://github.com/apache/spark/pull/24947#issuecomment-515335978 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108197/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance
AmplabJenkins commented on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance URL: https://github.com/apache/spark/pull/22290#issuecomment-515335941 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108198/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API
s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#discussion_r307610854 ## File path: graph/api/src/main/scala/org/apache/spark/graph/api/GraphElementFrame.scala ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.graph.api + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.DataFrame + +/** + * A [[PropertyGraph]] is created from GraphElementFrames. + * + * A graph element is either a node or a relationship. + * A GraphElementFrame wraps a DataFrame and describes how it maps to graph elements. + * + * @since 3.0.0 + */ +abstract class GraphElementFrame { + + /** + * Initial DataFrame that can still contain unmapped, arbitrarily ordered columns. + * + * @since 3.0.0 + */ + def df: DataFrame + + /** + * Name of the column that contains the graph element identifier. + * + * @since 3.0.0 + */ + def idColumn: String + + /** + * Name of all columns that contain graph element identifiers. + * + * @since 3.0.0 + */ + def idColumns: Seq[String] = Seq(idColumn) + + /** + * Mapping from graph element property keys to the columns that contain the corresponding property + * values. + * + * @since 3.0.0 + */ + def properties: Map[String, String] + +} + +object NodeFrame { + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @since 3.0.0 + */ + def create(df: DataFrame, idColumn: String, labelSet: Set[String]): NodeFrame = { +val properties = (df.columns.toSet - idColumn) + .map(columnName => columnName -> columnName) + .toMap +create(df, idColumn, labelSet, properties) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @param properties mapping from property keys to corresponding columns + * @since 3.0.0 + */ + def create( + df: DataFrame, + idColumn: String, + labelSet: Set[String], + properties: Map[String, String]): NodeFrame = { Review comment: Mh, I think having convenience methods for that is actually very helpful for the user. Imho, the documentation makes the purpose of that Map clear, we could however rename the field to `propertiesToColumns` or `propertyColumns`? (I prefer the latter). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
AmplabJenkins commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515335948 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108196/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307606058 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 27 +-- Number of queries: 63 Review comment: All right, then I believe only the most simple case should be allowed in this PR, which is 1 non-recursive term then `UNION ALL` then 1 recursive term within a recursive CTE is allowed. (And maybe follow-up PRs can deal with advanced constructs.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
SparkQA commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515335908 **[Test build #108192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108192/testReport)** for PR 25074 at commit [`65d4100`](https://github.com/apache/spark/commit/65d41002f19acbced86c1cf49f0a443d6450ac74). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait BooleanTest extends UnaryExpression with Predicate with ExpectsInputTypes ` * `case class IsTrue(child: Expression) extends BooleanTest ` * `case class IsNotTrue(child: Expression) extends BooleanTest ` * `case class IsFalse(child: Expression) extends BooleanTest ` * `case class IsNotFalse(child: Expression) extends BooleanTest ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
peter-toth commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307610800 ## File path: sql/core/src/test/resources/sql-tests/results/cte.sql.out ## @@ -328,16 +328,891 @@ struct -- !query 25 -DROP VIEW IF EXISTS t +WITH r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 25 schema struct<> -- !query 25 output - +org.apache.spark.sql.AnalysisException +Table or view not found: r; line 4 pos 24 -- !query 26 -DROP VIEW IF EXISTS t2 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r -- !query 26 schema -struct<> +struct -- !query 26 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 27 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r +-- !query 27 schema +struct<> +-- !query 27 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 28 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT * FROM r LIMIT 10 +-- !query 28 schema +struct +-- !query 28 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 29 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r LIMIT 10 +-- !query 29 schema +struct +-- !query 29 output +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 + + +-- !query 30 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r +) +SELECT level, level FROM r ORDER BY level LIMIT 10 +-- !query 30 schema +struct<> +-- !query 30 output +org.apache.spark.SparkException +Recursion level limit 100 reached but query has not exhausted, try increasing spark.sql.cte.recursion.level.limit + + +-- !query 31 +WITH RECURSIVE r(c) AS ( + SELECT 'a' + UNION ALL + SELECT c || ' b' FROM r WHERE LENGTH(c) < 10 +) +SELECT * FROM r +-- !query 31 schema +struct +-- !query 31 output +a +a b +a b b +a b b b +a b b b b +a b b b b b + + +-- !query 32 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 10 + UNION ALL + VALUES (0) +) +SELECT * FROM r +-- !query 32 schema +struct +-- !query 32 output +0 +1 +10 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 33 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 33 schema +struct +-- !query 33 output +0 A +0 B +1 AC +1 BC +2 ACC +2 BCC +3 ACCC +3 BCCC + + +-- !query 34 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + SELECT level + 1, data || 'B' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 34 schema +struct +-- !query 34 output +0 A +1 AB +1 AC +2 ABB +2 ABC +2 ACB +2 ACC +3 ABBC +3 ABCC +3 ACBC +3 ACCC + + +-- !query 35 +WITH RECURSIVE r(level, data) AS ( + VALUES (0, 'A') + UNION ALL + VALUES (0, 'B') + UNION ALL + SELECT level + 1, data || 'C' FROM r WHERE level < 2 + UNION ALL + SELECT level + 1, data || 'D' FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 35 schema +struct +-- !query 35 output +0 A +0 B +1 AC +1 AD +1 BC +1 BD +2 ACC +2 ACD +2 ADC +2 ADD +2 BCC +2 BCD +2 BDC +2 BDD +3 ACCD +3 ACDD +3 ADCD +3 ADDD +3 BCCD +3 BCDD +3 BDCD +3 BDDD + + +-- !query 36 +WITH RECURSIVE r(level) AS ( + SELECT level + 1 FROM r WHERE level < 3 +) +SELECT * FROM r +-- !query 36 schema +struct<> +-- !query 36 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 37 +WITH RECURSIVE r(level) AS ( + VALUES (0), (0) + UNION + SELECT (level + 1) % 10 FROM r +) +SELECT * FROM r +-- !query 37 schema +struct +-- !query 37 output +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + + +-- !query 38 +WITH RECURSIVE r(level) AS ( + VALUES (0) + INTERSECT + SELECT level + 1 FROM r WHERE level < 10 +) +SELECT * FROM r +-- !query 38 schema +struct<> +-- !query 38 output +org.apache.spark.sql.AnalysisException +Recursive query r should contain UNION or UNION ALL statements only. This error can also be caused by ORDER BY or LIMIT keywords used on result of UNION or UNION ALL.; + + +-- !query 39 +WITH RECURSIVE r(level) AS ( + VALUES (0) + UNION ALL + SELECT level + 1 FROM r WHERE (SELECT SUM(level) FROM r) < 10 +) +SELECT * FROM r +-- !query 39 schema +struct<> +
[GitHub] [spark] gatorsmile commented on issue #25217: [SPARK-28463][SQL][test-hadoop3.2] Thriftserver throws BigDecimal incompatible with HiveDecimal
gatorsmile commented on issue #25217: [SPARK-28463][SQL][test-hadoop3.2] Thriftserver throws BigDecimal incompatible with HiveDecimal URL: https://github.com/apache/spark/pull/25217#issuecomment-515334020 cc @gengliangwang @juliuszsompolski This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
AmplabJenkins commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-515336058 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108194/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance
AmplabJenkins commented on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance URL: https://github.com/apache/spark/pull/22290#issuecomment-515335933 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
AmplabJenkins commented on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-515336051 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
SparkQA commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515335907 **[Test build #108193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108193/testReport)** for PR 25074 at commit [`ebd2dcf`](https://github.com/apache/spark/commit/ebd2dcfd63560758dda8407c0ee1e17a2eb77bdc). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515336116 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108192/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
AmplabJenkins commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515335940 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API
s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#discussion_r307612518 ## File path: graph/api/src/main/scala/org/apache/spark/graph/api/GraphElementFrame.scala ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.graph.api + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.DataFrame + +/** + * A [[PropertyGraph]] is created from GraphElementFrames. + * + * A graph element is either a node or a relationship. + * A GraphElementFrame wraps a DataFrame and describes how it maps to graph elements. + * + * @since 3.0.0 + */ +abstract class GraphElementFrame { + + /** + * Initial DataFrame that can still contain unmapped, arbitrarily ordered columns. + * + * @since 3.0.0 + */ + def df: DataFrame + + /** + * Name of the column that contains the graph element identifier. + * + * @since 3.0.0 + */ + def idColumn: String + + /** + * Name of all columns that contain graph element identifiers. + * + * @since 3.0.0 + */ + def idColumns: Seq[String] = Seq(idColumn) + + /** + * Mapping from graph element property keys to the columns that contain the corresponding property + * values. + * + * @since 3.0.0 + */ + def properties: Map[String, String] + +} + +object NodeFrame { + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @since 3.0.0 + */ + def create(df: DataFrame, idColumn: String, labelSet: Set[String]): NodeFrame = { +val properties = (df.columns.toSet - idColumn) + .map(columnName => columnName -> columnName) + .toMap +create(df, idColumn, labelSet, properties) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @param properties mapping from property keys to corresponding columns + * @since 3.0.0 + */ + def create( + df: DataFrame, + idColumn: String, + labelSet: Set[String], + properties: Map[String, String]): NodeFrame = { +NodeFrame(df, idColumn, labelSet, properties) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @since 3.0.0 + */ + def create(df: DataFrame, idColumn: String, labelSet: java.util.Set[String]): NodeFrame = { +create(df, idColumn, labelSet.asScala.toSet) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @param properties mapping from property keys to corresponding columns + * @since 3.0.0 + */ + def create( + df: DataFrame, + idColumn: String, + labelSet: java.util.Set[String], + properties: java.util.Map[String, String]): NodeFrame = { +val scalaLabelSet = labelSet.asScala.toSet +val scalaProperties = properties.asScala.toMap +NodeFrame(df, idColumn, scalaLabelSet, scalaProperties) + } + +} + +/** + * Describes how to map a DataFrame to nodes. + * + * Each row in the DataFrame represents a node which has exactly the labels defined by the given + * label set. + * + * @param df
[GitHub] [spark] AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515336073 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515336079 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108193/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515336106 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException
SparkQA commented on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException URL: https://github.com/apache/spark/pull/24947#issuecomment-515335913 **[Test build #108197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108197/testReport)** for PR 24947 at commit [`a0363de`](https://github.com/apache/spark/commit/a0363de8932ad6b886d9deaa043354543da0ed56). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
AmplabJenkins removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515335940 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance
AmplabJenkins removed a comment on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance URL: https://github.com/apache/spark/pull/22290#issuecomment-515335933 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25251: [MINOR] Trivial cleanups
SparkQA removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515293715 **[Test build #108191 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108191/testReport)** for PR 25251 at commit [`5cbeaf0`](https://github.com/apache/spark/commit/5cbeaf02cafdd627d6f31c8a41396a37ef971a7d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515336106 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance
SparkQA removed a comment on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance URL: https://github.com/apache/spark/pull/22290#issuecomment-515332546 **[Test build #108198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108198/testReport)** for PR 22290 at commit [`45cfa21`](https://github.com/apache/spark/commit/45cfa2146dbb3ca6f6530c0147246dafd4ada762). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
SparkQA removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515302840 **[Test build #108192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108192/testReport)** for PR 25074 at commit [`65d4100`](https://github.com/apache/spark/commit/65d41002f19acbced86c1cf49f0a443d6450ac74). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515336073 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API
s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#discussion_r307612864 ## File path: graph/api/src/main/scala/org/apache/spark/graph/api/GraphElementFrame.scala ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.graph.api + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.DataFrame + +/** + * A [[PropertyGraph]] is created from GraphElementFrames. + * + * A graph element is either a node or a relationship. + * A GraphElementFrame wraps a DataFrame and describes how it maps to graph elements. + * + * @since 3.0.0 + */ +abstract class GraphElementFrame { + + /** + * Initial DataFrame that can still contain unmapped, arbitrarily ordered columns. + * + * @since 3.0.0 + */ + def df: DataFrame + + /** + * Name of the column that contains the graph element identifier. + * + * @since 3.0.0 + */ + def idColumn: String + + /** + * Name of all columns that contain graph element identifiers. + * + * @since 3.0.0 + */ + def idColumns: Seq[String] = Seq(idColumn) + + /** + * Mapping from graph element property keys to the columns that contain the corresponding property + * values. + * + * @since 3.0.0 + */ + def properties: Map[String, String] + +} + +object NodeFrame { + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @since 3.0.0 + */ + def create(df: DataFrame, idColumn: String, labelSet: Set[String]): NodeFrame = { +val properties = (df.columns.toSet - idColumn) + .map(columnName => columnName -> columnName) + .toMap +create(df, idColumn, labelSet, properties) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @param properties mapping from property keys to corresponding columns + * @since 3.0.0 + */ + def create( + df: DataFrame, + idColumn: String, + labelSet: Set[String], + properties: Map[String, String]): NodeFrame = { +NodeFrame(df, idColumn, labelSet, properties) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @since 3.0.0 + */ + def create(df: DataFrame, idColumn: String, labelSet: java.util.Set[String]): NodeFrame = { +create(df, idColumn, labelSet.asScala.toSet) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @param properties mapping from property keys to corresponding columns + * @since 3.0.0 + */ + def create( + df: DataFrame, + idColumn: String, + labelSet: java.util.Set[String], + properties: java.util.Map[String, String]): NodeFrame = { +val scalaLabelSet = labelSet.asScala.toSet +val scalaProperties = properties.asScala.toMap +NodeFrame(df, idColumn, scalaLabelSet, scalaProperties) + } + +} + +/** + * Describes how to map a DataFrame to nodes. + * + * Each row in the DataFrame represents a node which has exactly the labels defined by the given + * label set. + * + * @param df
[GitHub] [spark] AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515336363 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108191/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API
s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#discussion_r307612780 ## File path: graph/api/src/main/scala/org/apache/spark/graph/api/GraphElementFrame.scala ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.graph.api + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.DataFrame + +/** + * A [[PropertyGraph]] is created from GraphElementFrames. + * + * A graph element is either a node or a relationship. + * A GraphElementFrame wraps a DataFrame and describes how it maps to graph elements. + * + * @since 3.0.0 + */ +abstract class GraphElementFrame { + + /** + * Initial DataFrame that can still contain unmapped, arbitrarily ordered columns. + * + * @since 3.0.0 + */ + def df: DataFrame + + /** + * Name of the column that contains the graph element identifier. + * + * @since 3.0.0 + */ + def idColumn: String + + /** + * Name of all columns that contain graph element identifiers. + * + * @since 3.0.0 + */ + def idColumns: Seq[String] = Seq(idColumn) + + /** + * Mapping from graph element property keys to the columns that contain the corresponding property + * values. + * + * @since 3.0.0 + */ + def properties: Map[String, String] + +} + +object NodeFrame { + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @since 3.0.0 + */ + def create(df: DataFrame, idColumn: String, labelSet: Set[String]): NodeFrame = { +val properties = (df.columns.toSet - idColumn) + .map(columnName => columnName -> columnName) + .toMap +create(df, idColumn, labelSet, properties) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @param properties mapping from property keys to corresponding columns + * @since 3.0.0 + */ + def create( + df: DataFrame, + idColumn: String, + labelSet: Set[String], + properties: Map[String, String]): NodeFrame = { +NodeFrame(df, idColumn, labelSet, properties) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @since 3.0.0 + */ + def create(df: DataFrame, idColumn: String, labelSet: java.util.Set[String]): NodeFrame = { +create(df, idColumn, labelSet.asScala.toSet) + } + + /** + * Describes how to map an initial DataFrame to nodes. + * + * All columns apart from the given `idColumn` are mapped to node properties. + * + * @param dfDataFrame containing a single node in each row + * @param idColumn column that contains the node identifier + * @param labelSet labels that are assigned to all nodes + * @param properties mapping from property keys to corresponding columns + * @since 3.0.0 + */ + def create( + df: DataFrame, + idColumn: String, + labelSet: java.util.Set[String], + properties: java.util.Map[String, String]): NodeFrame = { +val scalaLabelSet = labelSet.asScala.toSet +val scalaProperties = properties.asScala.toMap +NodeFrame(df, idColumn, scalaLabelSet, scalaProperties) + } + +} + +/** + * Describes how to map a DataFrame to nodes. + * + * Each row in the DataFrame represents a node which has exactly the labels defined by the given + * label set. + * + * @param df
[GitHub] [spark] AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515336358 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
SparkQA removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-515309221 **[Test build #108194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108194/testReport)** for PR 25085 at commit [`e71c4d2`](https://github.com/apache/spark/commit/e71c4d2878fff642d34abbc71b9dff65354dafe5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
AmplabJenkins removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-515336051 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515336079 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108193/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
SparkQA removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515305438 **[Test build #108193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108193/testReport)** for PR 25074 at commit [`ebd2dcf`](https://github.com/apache/spark/commit/ebd2dcfd63560758dda8407c0ee1e17a2eb77bdc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException
SparkQA removed a comment on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException URL: https://github.com/apache/spark/pull/24947#issuecomment-515332547 **[Test build #108197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108197/testReport)** for PR 24947 at commit [`a0363de`](https://github.com/apache/spark/commit/a0363de8932ad6b886d9deaa043354543da0ed56). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
SparkQA removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515328592 **[Test build #108196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108196/testReport)** for PR 25259 at commit [`8158d5e`](https://github.com/apache/spark/commit/8158d5e27fce8e4bc5877ed7bb4f7c3876007c13). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException
AmplabJenkins removed a comment on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException URL: https://github.com/apache/spark/pull/24947#issuecomment-515335972 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515336358 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance
AmplabJenkins removed a comment on issue #22290: [SPARK-25285][CORE] Add startedTasks and finishedTasks to the metrics system in the executor instance URL: https://github.com/apache/spark/pull/22290#issuecomment-515335941 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108198/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
AmplabJenkins removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515335948 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108196/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException
AmplabJenkins removed a comment on issue #24947: [SPARK-28143][SQL] Expressions without proper constructors should throw AnalysisException URL: https://github.com/apache/spark/pull/24947#issuecomment-515335978 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108197/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type
AmplabJenkins removed a comment on issue #25085: [SPARK-28313][SQL] Spark sql null type incompatible with hive void type URL: https://github.com/apache/spark/pull/25085#issuecomment-515336058 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108194/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515336116 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108192/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515336363 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108191/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
gengliangwang commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515337307 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on issue #25251: [MINOR] Trivial cleanups
gengliangwang commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515337208 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API
s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#discussion_r307614116 ## File path: graph/api/src/main/scala/org/apache/spark/graph/api/CypherSession.scala ## @@ -0,0 +1,186 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.graph.api + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} + +object CypherSession { + val ID_COLUMN = "$ID" + val SOURCE_ID_COLUMN = "$SOURCE_ID" + val TARGET_ID_COLUMN = "$TARGET_ID" + val LABEL_COLUMN_PREFIX = ":" +} + +/** + * The entry point for using property graphs in Spark. + * + * Provides factory methods for creating [[PropertyGraph]] instances. + * + * Wraps a [[org.apache.spark.sql.SparkSession]]. + * + * @since 3.0.0 + */ +trait CypherSession { + + def sparkSession: SparkSession + + /** + * Executes a Cypher query on the given input graph. + * + * @param graph [[PropertyGraph]] on which the query is executed + * @param query Cypher query to execute + * @since 3.0.0 + */ + def cypher(graph: PropertyGraph, query: String): CypherResult + + /** + * Executes a Cypher query on the given input graph. + * + * @param graph [[PropertyGraph]] on which the query is executed + * @param query Cypher query to execute + * @param parameters parameters used by the Cypher query + * @since 3.0.0 + */ + def cypher(graph: PropertyGraph, query: String, parameters: Map[String, Any]): CypherResult + + /** + * Executes a Cypher query on the given input graph. + * + * @param graph [[PropertyGraph]] on which the query is executed + * @param query Cypher query to execute + * @param parameters parameters used by the Cypher query + * @since 3.0.0 + */ + def cypher(graph: PropertyGraph, + query: String, + parameters: java.util.Map[String, Object]): CypherResult = { +cypher(graph, query, parameters.asScala.toMap) + } + + /** + * Creates a [[PropertyGraph]] from a sequence of [[NodeFrame]]s and [[RelationshipFrame]]s. + * At least one [[NodeFrame]] has to be provided. + * + * For each label set and relationship type there can be at most one [[NodeFrame]] and at most one + * [[RelationshipFrame]], respectively. + * + * @param nodes NodeFrames that define the nodes in the graph + * @param relationships RelationshipFrames that define the relationships in the graph + * @since 3.0.0 + */ + def createGraph(nodes: Seq[NodeFrame], relationships: Seq[RelationshipFrame]): PropertyGraph + + /** + * Creates a [[PropertyGraph]] from a sequence of [[NodeFrame]]s and [[RelationshipFrame]]s. + * At least one [[NodeFrame]] has to be provided. + * + * For each label set and relationship type there can be at most one [[NodeFrame]] and at most one + * [[RelationshipFrame]], respectively. + * + * @param nodes NodeFrames that define the nodes in the graph + * @param relationships RelationshipFrames that define the relationships in the graph + * @since 3.0.0 + */ + def createGraph( + nodes: java.util.List[NodeFrame], + relationships: java.util.List[RelationshipFrame]): PropertyGraph = { +createGraph(nodes.asScala, relationships.asScala) + } + + /** + * Creates a [[PropertyGraph]] from nodes and relationships. + * + * The given DataFrames need to adhere to the following column naming conventions: + * + * {{{ + * Id column:`$ID`(nodes and relationships) + * SourceId column: `$SOURCE_ID` (relationships) + * TargetId column: `$TARGET_ID` (relationships) + * + * Label columns:`:{LABEL_NAME}` (nodes) + * RelType columns: `:{REL_TYPE}`(relationships) + * + * Property columns: `{Property_Key}` (nodes and relationships) + * }}} + * + * @see [[CypherSession]] + * @param nodes node DataFrame + * @param relationships relationship DataFrame + * @since 3.0.0 + */ + def createGraph(nodes: DataFrame, relationships: DataFrame): PropertyGraph = { Review comment: This is a big change. `spa
[GitHub] [spark] cloud-fan commented on a change in pull request #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor
cloud-fan commented on a change in pull request #25249: [SPARK-28237][SQL] Enforce Idempotence for Once batches in RuleExecutor URL: https://github.com/apache/spark/pull/25249#discussion_r307614296 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullupCorrelatedPredicatesSuite.scala ## @@ -27,6 +27,8 @@ import org.apache.spark.sql.catalyst.rules.RuleExecutor class PullupCorrelatedPredicatesSuite extends PlanTest { object Optimize extends RuleExecutor[LogicalPlan] { +override protected val blacklistedOnceBatches = Set("PullupCorrelatedPredicates") Review comment: shall we also change `Once` to `FixedPoint(1)` here instead of adding blacklist? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515338231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13301/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
AmplabJenkins removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515338206 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13300/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
AmplabJenkins commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515338206 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13300/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515338226 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
AmplabJenkins commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515338199 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
AmplabJenkins removed a comment on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515338199 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515338226 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups
AmplabJenkins removed a comment on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515338231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13301/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #25237: [SPARK-28489][SS] Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets
dongjoon-hyun closed pull request #25237: [SPARK-28489][SS] Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets URL: https://github.com/apache/spark/pull/25237 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize
SparkQA commented on issue #25259: [SPARK-28518][SQL][TEST] Refer to ChecksumFileSystem#isChecksumFile to fix StatisticsCollectionTestBase#getDataSize URL: https://github.com/apache/spark/pull/25259#issuecomment-515338730 **[Test build #108199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108199/testReport)** for PR 25259 at commit [`8158d5e`](https://github.com/apache/spark/commit/8158d5e27fce8e4bc5877ed7bb4f7c3876007c13). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25251: [MINOR] Trivial cleanups
SparkQA commented on issue #25251: [MINOR] Trivial cleanups URL: https://github.com/apache/spark/pull/25251#issuecomment-515338754 **[Test build #108200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108200/testReport)** for PR 25251 at commit [`5cbeaf0`](https://github.com/apache/spark/commit/5cbeaf02cafdd627d6f31c8a41396a37ef971a7d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
beliefer commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515339020 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test
AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515340394 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13302/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515340414 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test
AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515340389 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515340419 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13303/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test
AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515340389 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515340414 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test
AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515340394 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13302/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
AmplabJenkins removed a comment on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515340419 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13303/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API
s1ck commented on a change in pull request #24851: [SPARK-27303][GRAPH] Add Spark Graph API URL: https://github.com/apache/spark/pull/24851#discussion_r307617547 ## File path: graph/api/src/main/scala/org/apache/spark/graph/api/PropertyGraph.scala ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.graph.api + +import org.apache.spark.sql.{DataFrame, SaveMode} + +/** + * A Property Graph as defined by the openCypher Property Graph Data Model. + * + * A graph is always tied to and managed by a [[CypherSession]]. + * The lifetime of a graph is bound by the session lifetime. + * + * @see http://www.opencypher.org/";>openCypher project Review comment: I added one initially, but it exceeds the line length. The [Databricks Scala Style Guide](https://github.com/databricks/scala-style-guide#linelength) mentions that more than 100 chars are fine for URLs, but scalastyle still complains. Probably because it's behind an `@see` tag. I re-added it and let's see what CI says. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test
AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515341051 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108201/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test
AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515341044 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3]
SparkQA commented on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] URL: https://github.com/apache/spark/pull/24829#issuecomment-515341095 **[Test build #108203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108203/testReport)** for PR 24829 at commit [`0a425c4`](https://github.com/apache/spark/commit/0a425c41b26225512cb9d0e8cb58986d76513f6c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax
SparkQA commented on issue #25074: [SPARK-27924][SQL] Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#issuecomment-515341067 **[Test build #108202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108202/testReport)** for PR 25074 at commit [`ebd2dcf`](https://github.com/apache/spark/commit/ebd2dcfd63560758dda8407c0ee1e17a2eb77bdc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test
AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] always create a fresh copy of the SparkSession before each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515341044 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515342730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13304/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515342718 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515342730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13304/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515342718 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515341051 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108201/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
SparkQA commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515343305 **[Test build #108204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108204/testReport)** for PR 25243 at commit [`094e65b`](https://github.com/apache/spark/commit/094e65b2c571461cc0ef652f53f3004001ab6bdd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec
sarutak opened a new pull request #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec URL: https://github.com/apache/spark/pull/25260 Code is not generated for LocalTableScanExec although proper situations. If a LocalTableScanExec plan has the direct parent plan which supports WholeStageCodegen, the LocalTableScanExec plan also should be within a WholeStageCodegen domain. But code is not generated for LocalTableScanExec and InputAdapter is inserted for now. ``` val df1 = spark.createDataset(1 to 10).toDF val df2 = spark.createDataset(1 to 10).toDF val df3 = df1.join(df2, df1("value") === df2("value")) df3.explain(true) ... == Physical Plan == *(1) BroadcastHashJoin [value#1], [value#6], Inner, BuildRight :- LocalTableScan [value#1] // LocalTableScanExec is not within a WholeStageCodegen domain +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [value#6] ``` ``` scala> df3.queryExecution.executedPlan.children.head.children.head.getClass res4: Class[_ <: org.apache.spark.sql.execution.SparkPlan] = class org.apache.spark.sql.execution.InputAdapter ``` For the current implementation of LocalTableScanExec, codegen is enabled in case `parent` is not null but `parent` is set in `consume`, which is called after `insertInputAdapter` so it doesn't work as intended. After applying this cnahge, we can get following plan, which means LocalTableScanExec is within a WholeStageCodegen domain. ``` == Physical Plan == *(1) BroadcastHashJoin [value#63], [value#68], Inner, BuildRight :- *(1) LocalTableScan [value#63] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- LocalTableScan [value#68] ## How was this patch tested? New test cases are added into WholeStageCodegenSuite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec
AmplabJenkins commented on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec URL: https://github.com/apache/spark/pull/25260#issuecomment-515345052 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec
AmplabJenkins commented on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec URL: https://github.com/apache/spark/pull/25260#issuecomment-515345057 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13305/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec
SparkQA commented on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec URL: https://github.com/apache/spark/pull/25260#issuecomment-515345720 **[Test build #108205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108205/testReport)** for PR 25260 at commit [`24d51ba`](https://github.com/apache/spark/commit/24d51ba1c30472cffc4e44641ece2c4e76e54139). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec
AmplabJenkins removed a comment on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec URL: https://github.com/apache/spark/pull/25260#issuecomment-515345057 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13305/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec
AmplabJenkins removed a comment on issue #25260: [SPARK-28520][SQL] WholeStageCodegen does not work property for LocalTableScanExec URL: https://github.com/apache/spark/pull/25260#issuecomment-515345052 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mgaido91 commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query
mgaido91 commented on a change in pull request #23531: [SPARK-24497][SQL] Support recursive SQL query URL: https://github.com/apache/spark/pull/23531#discussion_r307624090 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ## @@ -228,6 +234,156 @@ case class FilterExec(condition: Expression, child: SparkPlan) override def outputPartitioning: Partitioning = child.outputPartitioning } +/** + * Physical plan node for a recursive table that encapsulates the physical plans of the anchor + * terms and the logical plans of the recursive terms and the maximum number of rows to return. + * + * Anchor terms are physical plans and they are used to initialize the query in the first run. + * Recursive terms are used to extend the result with new rows, They are logical plans and contain + * references to the result of the previous iteration or to the so far cumulated result. These + * references are updated with new statistics and compiled to physical plan and then updated to + * reflect the appropriate RDD before execution. + * + * The execution terminates once the anchor terms or the current iteration of the recursive terms + * return no rows or the number of cumulated rows reaches the limit. + * + * During the execution of a recursive query the previously computed results are reused multiple + * times. To avoid massive recomputation of these pieces of the final result, they are cached. + * + * @param name the name of the recursive table + * @param anchorTerms this child is used for initializing the query + * @param recursiveTerms this child is used for extending the set of results with new rows based on + * the results of the previous iteration (or the anchor in the first + * iteration) + * @param limit the maximum number of rows to return + */ +case class RecursiveTableExec( +name: String, +anchorTerms: Seq[SparkPlan], +@transient +val recursiveTerms: Seq[LogicalPlan], +limit: Option[Long]) extends SparkPlan { + override def children: Seq[SparkPlan] = anchorTerms + + override def output: Seq[Attribute] = anchorTerms.head.output.map(_.withNullability(true)) + + override def simpleString(maxFields: Int): String = +s"RecursiveTable $name${limit.map(", " + _).getOrElse("")}" + + override def innerChildren: Seq[QueryPlan[_]] = recursiveTerms ++ super.innerChildren + + override protected def doExecute(): RDD[InternalRow] = { +val storageLevel = StorageLevel.fromString(conf.getConf(SQLConf.RECURSION_CACHE_STORAGE_LEVEL)) Review comment: I remember in the past there were suggestions of using directly the conf when we use it only once by many people to avoid the proliferation of these methods...so that was the reason of my comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mgaido91 commented on a change in pull request #25253: [SPARK-28470][SQL] Cast to decimal throws ArithemticException on overflow
mgaido91 commented on a change in pull request #25253: [SPARK-28470][SQL] Cast to decimal throws ArithemticException on overflow URL: https://github.com/apache/spark/pull/25253#discussion_r307625834 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -498,22 +499,34 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toByte } + private val nullOnOverflow = SQLConf.get.decimalOperationsNullOnOverflow + /** * Change the precision / scale in a given decimal to those set in `decimalType` (if any), * returning null if it overflows or modifying `value` in-place and returning it if successful. * * NOTE: this modifies `value` in-place, so don't call it on external data. */ private[this] def changePrecision(value: Decimal, decimalType: DecimalType): Decimal = { -if (value.changePrecision(decimalType.precision, decimalType.scale)) value else null +if (value.changePrecision(decimalType.precision, decimalType.scale)) { + value +} else { Review comment: I agree with @gengliangwang but I am fine changing it. Please @HyukjinKwon let me know if you think we should change it, I'll do it. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mgaido91 commented on a change in pull request #25253: [SPARK-28470][SQL] Cast to decimal throws ArithemticException on overflow
mgaido91 commented on a change in pull request #25253: [SPARK-28470][SQL] Cast to decimal throws ArithemticException on overflow URL: https://github.com/apache/spark/pull/25253#discussion_r307626380 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -498,22 +499,34 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String b => x.numeric.asInstanceOf[Numeric[Any]].toInt(b).toByte } + private val nullOnOverflow = SQLConf.get.decimalOperationsNullOnOverflow + /** * Change the precision / scale in a given decimal to those set in `decimalType` (if any), * returning null if it overflows or modifying `value` in-place and returning it if successful. * * NOTE: this modifies `value` in-place, so don't call it on external data. */ private[this] def changePrecision(value: Decimal, decimalType: DecimalType): Decimal = { -if (value.changePrecision(decimalType.precision, decimalType.scale)) value else null +if (value.changePrecision(decimalType.precision, decimalType.scale)) { + value +} else { + if (nullOnOverflow) { +null + } else { +throw new ArithmeticException(s"${value.toDebugString} cannot be represented as " + + s"Decimal(${decimalType.precision}, ${decimalType.scale}).") Review comment: this is consistent with other similar error messages. We should change it in all cases, then. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515350300 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515350306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13306/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515350306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/13306/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
AmplabJenkins removed a comment on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515350300 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test
SparkQA commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test URL: https://github.com/apache/spark/pull/25243#issuecomment-515351097 **[Test build #108206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108206/testReport)** for PR 25243 at commit [`ec7b7bd`](https://github.com/apache/spark/commit/ec7b7bd0e58a06ac2800a824c25b051938db9b67). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org