[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054651 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) --- End diff -- `findEqual(target, children.tail, index=1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054605 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) --- End diff -- What is the purpose of these checks? Based on MySQL's `field` function, the type casting rules is described as ``` If all arguments to FIELD() are strings, all arguments are compared as strings. If all arguments are numbers, they are compared as numbers. Otherwise, the arguments are compared as double. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054575 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala --- @@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row) checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, row) } + + test("case field") { +val str1 = Literal("è±è±ä¸ç") +val str2 = Literal("a") +val str3 = Literal("b") +val str4 = Literal("") +val str5 = Literal("999") +val strNull = Literal.create(null, StringType) + +val bool1 = Literal(true) +val bool2 = Literal(false) + +val int1 = Literal(1) +val int2 = Literal(2) +val int3 = Literal(3) +val int4 = Literal(999) +val intNull = Literal.create(null, IntegerType) + +val double1 = Literal(1.221) +val double2 = Literal(1.222) +val double3 = Literal(1.224) + +val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1)) +val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1)) +val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1)) + +val date1 = Literal(new Date(1949, 1, 1)) +val date2 = Literal(new Date(1979, 1, 1)) +val date3 = Literal(new Date(1989, 1, 1)) + +checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3) +checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1) +checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1) +checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2) +checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3) +checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3) +checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, timeStamp1)), 3) +checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int4)), 6) +checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0) +checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, timeStamp2, int3)), 0) +checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0) --- End diff -- This is to test `null`. Could you add the description? ``` If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054547 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) --- End diff -- Could you fix the style, based on https://github.com/databricks/scala-style-guide#curly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054508 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 --- End diff -- More examples please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r95054387 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", --- End diff -- Can we use `expr1, expr2, expr3` here? The type can be any atomic type? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16493 **[Test build #71006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71006/testReport)** for PR 16493 at commit [`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/16493 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16493 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71005/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16493 **[Test build #71005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71005/testReport)** for PR 16493 at commit [`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16493 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16400#discussion_r95053274 --- Diff: docs/sql-programming-guide.md --- @@ -1362,6 +1362,13 @@ options. - Dataset and DataFrame API `explode` has been deprecated, alternatively, use `functions.explode()` with `select` or `flatMap` - Dataset and DataFrame API `registerTempTable` has been deprecated and replaced by `createOrReplaceTempView` + - Changes to `CREATE TABLE ... LOCATION` behavior. +- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE EXTERNAL TABLE ... LOCATION` + in order to prevent accidental dropping the existing data in the user-provided locations. + Please see [SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276) for details. +- As a result, `DROP TABLE` statements on those tables will not remove the data. + Note that this is different than the Hive behavior. --- End diff -- Now, we can remove this sentence. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16400#discussion_r95053222 --- Diff: docs/sql-programming-guide.md --- @@ -1362,6 +1362,13 @@ options. - Dataset and DataFrame API `explode` has been deprecated, alternatively, use `functions.explode()` with `select` or `flatMap` - Dataset and DataFrame API `registerTempTable` has been deprecated and replaced by `createOrReplaceTempView` + - Changes to `CREATE TABLE ... LOCATION` behavior. +- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE EXTERNAL TABLE ... LOCATION` + in order to prevent accidental dropping the existing data in the user-provided locations. --- End diff -- Also add two more sentences here. `That means, a Hive table created in Spark SQL with the user-specified location is a Hive external table. Users are not allowed to specify the location for Hive managed tables. ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16493 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71004/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16493 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16493 **[Test build #71004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71004/testReport)** for PR 16493 at commit [`f9f0b01`](https://github.com/apache/spark/commit/f9f0b01e5cf6e8a6a212324686e82b0c4bf1b5fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16488: [MINOR] Bump R version to 2.2.0.
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16488 Yeah I think it does get automatically updated during the release but its good to keep this in sync this just for the development builds etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16493 **[Test build #71005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71005/testReport)** for PR 16493 at commit [`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 @srowen @yanboliang I'm closing this PR since it does not seem to be very clean to integrate into the current GLM setup. I appreciate all the comments and discussions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16400#discussion_r95052715 --- Diff: docs/sql-programming-guide.md --- @@ -1362,6 +1362,13 @@ options. - Dataset and DataFrame API `explode` has been deprecated, alternatively, use `functions.explode()` with `select` or `flatMap` - Dataset and DataFrame API `registerTempTable` has been deprecated and replaced by `createOrReplaceTempView` + - Changes to `CREATE TABLE ... LOCATION` behavior. +- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE EXTERNAL TABLE ... LOCATION` + in order to prevent accidental dropping the existing data in the user-provided locations. --- End diff -- wait. I need to rephrase it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16400#discussion_r95052705 --- Diff: docs/sql-programming-guide.md --- @@ -1362,6 +1362,13 @@ options. - Dataset and DataFrame API `explode` has been deprecated, alternatively, use `functions.explode()` with `select` or `flatMap` - Dataset and DataFrame API `registerTempTable` has been deprecated and replaced by `createOrReplaceTempView` + - Changes to `CREATE TABLE ... LOCATION` behavior. +- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE EXTERNAL TABLE ... LOCATION` + in order to prevent accidental dropping the existing data in the user-provided locations. + Please see [SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276) for details. --- End diff -- Nit: No need to show the JIRA here. Please remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16400#discussion_r95052691 --- Diff: docs/sql-programming-guide.md --- @@ -1362,6 +1362,13 @@ options. - Dataset and DataFrame API `explode` has been deprecated, alternatively, use `functions.explode()` with `select` or `flatMap` - Dataset and DataFrame API `registerTempTable` has been deprecated and replaced by `createOrReplaceTempView` + - Changes to `CREATE TABLE ... LOCATION` behavior. +- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE EXTERNAL TABLE ... LOCATION` + in order to prevent accidental dropping the existing data in the user-provided locations. --- End diff -- Also add one more sentence here. `Users are not allowed to specify the location for managed tables.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user actuaryzhang closed the pull request at: https://github.com/apache/spark/pull/16344 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16400#discussion_r95052654 --- Diff: docs/sql-programming-guide.md --- @@ -1362,6 +1362,13 @@ options. - Dataset and DataFrame API `explode` has been deprecated, alternatively, use `functions.explode()` with `select` or `flatMap` - Dataset and DataFrame API `registerTempTable` has been deprecated and replaced by `createOrReplaceTempView` + - Changes to `CREATE TABLE ... LOCATION` behavior. --- End diff -- `behavior.` -> `behavior for Hive tables.` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-17931][CORE] taskScheduler has some unneeded seri...
Github user witgo commented on the issue: https://github.com/apache/spark/pull/15505 @kayousterhout Okay, I'll do the code revision this weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16439: [SPARK-19026]SPARK_LOCAL_DIRS(multiple directorie...
Github user zuotingbing commented on a diff in the pull request: https://github.com/apache/spark/pull/16439#discussion_r95052089 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -445,12 +445,24 @@ private[deploy] class Worker( // Create local dirs for the executor. These are passed to the executor via the // SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the // application finishes. - val appLocalDirs = appDirectories.getOrElse(appId, -Utils.getOrCreateLocalRootDirs(conf).map { dir => - val appDir = Utils.createDirectory(dir, namePrefix = "executor") - Utils.chmod700(appDir) - appDir.getAbsolutePath() -}.toSeq) + val appLocalDirs = appDirectories.getOrElse(appId, { +val dirs = Utils.getOrCreateLocalRootDirs(conf).flatMap { dir => + try { +val appDir = Utils.createDirectory(dir, namePrefix = "executor") +Utils.chmod700(appDir) +Some(appDir.getAbsolutePath()) + } catch { +case e: IOException => + logWarning(s"${e.getMessage}. Ignoring this directory.") + None + } +}.toSeq +if (dirs.isEmpty) { + throw new IOException("None subfolder can be created in " + +s"${Utils.getOrCreateLocalRootDirs(conf).mkString(",")}.") --- End diff -- Thanks vanzin. i will commit it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16439: [SPARK-19026]SPARK_LOCAL_DIRS(multiple directorie...
Github user zuotingbing commented on a diff in the pull request: https://github.com/apache/spark/pull/16439#discussion_r95052088 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala --- @@ -445,12 +445,24 @@ private[deploy] class Worker( // Create local dirs for the executor. These are passed to the executor via the // SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the // application finishes. - val appLocalDirs = appDirectories.getOrElse(appId, -Utils.getOrCreateLocalRootDirs(conf).map { dir => - val appDir = Utils.createDirectory(dir, namePrefix = "executor") - Utils.chmod700(appDir) - appDir.getAbsolutePath() -}.toSeq) + val appLocalDirs = appDirectories.getOrElse(appId, { +val dirs = Utils.getOrCreateLocalRootDirs(conf).flatMap { dir => + try { +val appDir = Utils.createDirectory(dir, namePrefix = "executor") +Utils.chmod700(appDir) +Some(appDir.getAbsolutePath()) + } catch { +case e: IOException => + logWarning(s"${e.getMessage}. Ignoring this directory.") + None + } +}.toSeq +if (dirs.isEmpty) { + throw new IOException("None subfolder can be created in " + --- End diff -- Thanks vanzin. i will commit it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95052038 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) + spark.catalog.uncacheTable("t1") +} + } + + test("SPARK-19093 scalar and nested predicate query") { +def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = { + plan collect { +case i: InMemoryRelation => i + } +} +withTempView("t1", "t2", "t3", "t4") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(2).toDF("c1").createOrReplaceTempView("t2") + Seq(1).toDF("c1").createOrReplaceTempView("t3") + Seq(1).toDF("c1").createOrReplaceTempView("t4") + spark.catalog.cacheTable("t1") + spark.catalog.cacheTable("t2") + spark.catalog.cacheTable("t3") + spark.catalog.cacheTable("t4") + + // Nested predicate subquery + val cachedPlan = +sql( +""" + |SELECT * FROM t1 + |WHERE + |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1)) +""".stripMargin).queryExecution.optimizedPlan + + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 3) + + // Scalar subquery and predicate subquery + val cachedPlan2 = +sql( + """ +|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1) +|WHERE +|c1 = (SELECT max(c1) FROM t2 GROUP BY c1) +|OR +|EXISTS (SELECT c1 FROM t3) +|OR +|c1 IN (SELECT c1 FROM t4) + """.stripMargin).queryExecution.optimizedPlan + + + val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]] + cachedRelations += getCachedPlans(cachedPlan2) + cachedPlan2 transformAllExpressions { +case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan) + e + } + assert(cachedRelations.flatten.size == 4) + + spark.catalog.uncacheTable("t1") + spark.catalog.uncacheTable("t2") + spark.catalog.uncacheTable("t3") + spark.catalog.uncacheTable("t4") --- End diff -- @gatorsmile sorry.. missed this one .. Will make the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95051560 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +577,67 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) + spark.catalog.uncacheTable("t1") +} + } + + test("SPARK-19093 scalar and nested predicate query") { + + --- End diff -- Nit: remove these two lines --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16495 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95051550 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +577,67 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) --- End diff -- The same here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16495 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71003/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16495 **[Test build #71003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71003/testReport)** for PR 16495 at commit [`d3e2dad`](https://github.com/apache/spark/commit/d3e2dadaa767bd3fec10ba329625ceaa5ccabcbb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95051546 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) + spark.catalog.uncacheTable("t1") +} + } + + test("SPARK-19093 scalar and nested predicate query") { +def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = { + plan collect { +case i: InMemoryRelation => i + } +} +withTempView("t1", "t2", "t3", "t4") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(2).toDF("c1").createOrReplaceTempView("t2") + Seq(1).toDF("c1").createOrReplaceTempView("t3") + Seq(1).toDF("c1").createOrReplaceTempView("t4") + spark.catalog.cacheTable("t1") + spark.catalog.cacheTable("t2") + spark.catalog.cacheTable("t3") + spark.catalog.cacheTable("t4") + + // Nested predicate subquery + val cachedPlan = +sql( +""" + |SELECT * FROM t1 + |WHERE + |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1)) +""".stripMargin).queryExecution.optimizedPlan + + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 3) + + // Scalar subquery and predicate subquery + val cachedPlan2 = +sql( + """ +|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1) +|WHERE +|c1 = (SELECT max(c1) FROM t2 GROUP BY c1) +|OR +|EXISTS (SELECT c1 FROM t3) +|OR +|c1 IN (SELECT c1 FROM t4) + """.stripMargin).queryExecution.optimizedPlan + + + val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]] + cachedRelations += getCachedPlans(cachedPlan2) + cachedPlan2 transformAllExpressions { +case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan) + e + } + assert(cachedRelations.flatten.size == 4) + + spark.catalog.uncacheTable("t1") + spark.catalog.uncacheTable("t2") + spark.catalog.uncacheTable("t3") + spark.catalog.uncacheTable("t4") --- End diff -- How about this? @dilipbiswal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16493 **[Test build #71004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71004/testReport)** for PR 16493 at commit [`f9f0b01`](https://github.com/apache/spark/commit/f9f0b01e5cf6e8a6a212324686e82b0c4bf1b5fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95051398 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -131,6 +132,12 @@ class CacheManager extends Logging { /** Replaces segments of the given logical plan with cached versions where possible. */ def useCachedData(plan: LogicalPlan): LogicalPlan = { +useCachedDataInternal(plan) transformAllExpressions { + case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan)) +} + } + + private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = { --- End diff -- @gatorsmile Thank you very much. I have addressed your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71001/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15119 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15119 **[Test build #71001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71001/testReport)** for PR 15119 at commit [`7a7b6ba`](https://github.com/apache/spark/commit/7a7b6ba213e57a705642d42220037a7b9a18e3a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95050805 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -131,6 +132,12 @@ class CacheManager extends Logging { /** Replaces segments of the given logical plan with cached versions where possible. */ def useCachedData(plan: LogicalPlan): LogicalPlan = { +useCachedDataInternal(plan) transformAllExpressions { + case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan)) +} + } + + private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = { --- End diff -- @gatorsmile Sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95050799 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) + spark.catalog.uncacheTable("t1") +} + } + + test("SPARK-19093 scalar and nested predicate query") { +def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = { + plan collect { +case i: InMemoryRelation => i + } +} +withTempView("t1", "t2", "t3", "t4") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(2).toDF("c1").createOrReplaceTempView("t2") + Seq(1).toDF("c1").createOrReplaceTempView("t3") + Seq(1).toDF("c1").createOrReplaceTempView("t4") + spark.catalog.cacheTable("t1") + spark.catalog.cacheTable("t2") + spark.catalog.cacheTable("t3") + spark.catalog.cacheTable("t4") + + // Nested predicate subquery + val cachedPlan = +sql( +""" + |SELECT * FROM t1 + |WHERE + |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1)) +""".stripMargin).queryExecution.optimizedPlan + + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 3) + + // Scalar subquery and predicate subquery + val cachedPlan2 = +sql( + """ +|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1) +|WHERE +|c1 = (SELECT max(c1) FROM t2 GROUP BY c1) +|OR +|EXISTS (SELECT c1 FROM t3) +|OR +|c1 IN (SELECT c1 FROM t4) + """.stripMargin).queryExecution.optimizedPlan + + + val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]] + cachedRelations += getCachedPlans(cachedPlan2) + cachedPlan2 transformAllExpressions { +case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan) + e + } + assert(cachedRelations.flatten.size == 4) --- End diff -- @gatorsmile Thanks... I will make the change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95050745 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -131,6 +132,12 @@ class CacheManager extends Logging { /** Replaces segments of the given logical plan with cached versions where possible. */ def useCachedData(plan: LogicalPlan): LogicalPlan = { +useCachedDataInternal(plan) transformAllExpressions { + case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan)) +} + } + + private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = { --- End diff -- After rethinking about it, we do not need to add a new function. We can combine them into a single function, like: ```Scala /** Replaces segments of the given logical plan with cached versions where possible. */ def useCachedData(plan: LogicalPlan): LogicalPlan = { val newPlan = plan transformDown { case currentFragment => lookupCachedData(currentFragment) .map(_.cachedRepresentation.withOutput(currentFragment.output)) .getOrElse(currentFragment) } newPlan transformAllExpressions { case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan)) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16485: [SPARK-19099] correct the wrong time display in history ...
Github user 351zyf commented on the issue: https://github.com/apache/spark/pull/16485 But the time display on history server web UI is not correct. It is 8 hours eralier than the actual time here. Am I using the wrong configuration ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95050711 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) + spark.catalog.uncacheTable("t1") +} + } + + test("SPARK-19093 scalar and nested predicate query") { +def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = { + plan collect { +case i: InMemoryRelation => i + } +} +withTempView("t1", "t2", "t3", "t4") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(2).toDF("c1").createOrReplaceTempView("t2") + Seq(1).toDF("c1").createOrReplaceTempView("t3") + Seq(1).toDF("c1").createOrReplaceTempView("t4") + spark.catalog.cacheTable("t1") + spark.catalog.cacheTable("t2") + spark.catalog.cacheTable("t3") + spark.catalog.cacheTable("t4") + + // Nested predicate subquery + val cachedPlan = +sql( +""" + |SELECT * FROM t1 + |WHERE + |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1)) +""".stripMargin).queryExecution.optimizedPlan + + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 3) --- End diff -- Then, this can be simplified to ```Scala assert (getNumInMemoryRelations(cachedPlan2) == 3) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95050708 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) + spark.catalog.uncacheTable("t1") +} + } + + test("SPARK-19093 scalar and nested predicate query") { +def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = { + plan collect { +case i: InMemoryRelation => i + } +} +withTempView("t1", "t2", "t3", "t4") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(2).toDF("c1").createOrReplaceTempView("t2") + Seq(1).toDF("c1").createOrReplaceTempView("t3") + Seq(1).toDF("c1").createOrReplaceTempView("t4") + spark.catalog.cacheTable("t1") + spark.catalog.cacheTable("t2") + spark.catalog.cacheTable("t3") + spark.catalog.cacheTable("t4") + + // Nested predicate subquery + val cachedPlan = +sql( +""" + |SELECT * FROM t1 + |WHERE + |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1)) +""".stripMargin).queryExecution.optimizedPlan + + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 3) + + // Scalar subquery and predicate subquery + val cachedPlan2 = +sql( + """ +|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1) +|WHERE +|c1 = (SELECT max(c1) FROM t2 GROUP BY c1) +|OR +|EXISTS (SELECT c1 FROM t3) +|OR +|c1 IN (SELECT c1 FROM t4) + """.stripMargin).queryExecution.optimizedPlan + + + val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]] + cachedRelations += getCachedPlans(cachedPlan2) + cachedPlan2 transformAllExpressions { +case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan) + e + } + assert(cachedRelations.flatten.size == 4) --- End diff -- Then, this can be simplified to ```Scala assert (getNumInMemoryRelations(cachedPlan2) == 4) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16493 In the test suite, we can have such a helper function to count `InMemoryRelation` ```Scala private def getNumInMemoryRelations(plan: LogicalPlan): Int = { var sum = plan.collect { case _: InMemoryRelation => 1 }.sum plan.transformAllExpressions { case e: SubqueryExpression => sum += getNumInMemoryRelations(e.plan) e } sum } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16480 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15413: [SPARK-17847][ML] Reduce shuffled data size of GaussianM...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15413 OK, I'll just wait so @sethah can make a final pass and so @yanboliang can merge the 2 tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/16454#discussion_r95050517 --- Diff: python/pyspark/sql/tests.py --- @@ -1886,6 +1887,28 @@ def test_hivecontext(self): self.assertTrue(os.path.exists(metastore_path)) +class SQLTests2(ReusedPySparkTestCase): --- End diff -- Is there any particular reason this is built on a `ReusedPySparkTestCase`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16480 Merging with master. Not backporting unless people request it since this memory leak is very minor. Thanks @sueann ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16495 **[Test build #71003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71003/testReport)** for PR 16495 at commit [`d3e2dad`](https://github.com/apache/spark/commit/d3e2dadaa767bd3fec10ba329625ceaa5ccabcbb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16495: SPARK-16920: Add a stress test for evaluateEachIt...
GitHub user mhmoudr opened a pull request: https://github.com/apache/spark/pull/16495 SPARK-16920: Add a stress test for evaluateEachIteration for 2000 trees ## What changes were proposed in this pull request? Just adding a test to prove error by tree is working for 2000 trees, the fix of SPARK-15858 before it was failing to do the calculation after long time ## How was this patch tested? Just run the test You can merge this pull request into a Git repository by running: $ git pull https://github.com/mhmoudr/spark SPARK-16920 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16495.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16495 commit d3e2dadaa767bd3fec10ba329625ceaa5ccabcbb Author: Mahmoud Rawas Date: 2017-01-07T02:35:46Z SPARK-16920: Add a stress test for calculating error by tree (evaluateEachIteration) for 2000 trees --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16493 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70999/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16493 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16493 **[Test build #70999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70999/testReport)** for PR 16493 at commit [`f733f90`](https://github.com/apache/spark/commit/f733f90325b975973e60272ba6708dff5059f9dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16492 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70998/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16492 **[Test build #70998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70998/testReport)** for PR 16492 at commit [`59a1161`](https://github.com/apache/spark/commit/59a11611999fddd0670218b16b991e691bcc574e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16138 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70997/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16138 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16138 **[Test build #70997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70997/testReport)** for PR 16138 at commit [`1eb2ad0`](https://github.com/apache/spark/commit/1eb2ad00f4d033134d1d66d5dda24eee8cd29489). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95049506 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) + spark.catalog.uncacheTable("t1") +} + } + + test("SPARK-19093 scalar and nested predicate query") { +def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = { + plan collect { +case i: InMemoryRelation => i + } +} +withTempView("t1", "t2", "t3", "t4") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(2).toDF("c1").createOrReplaceTempView("t2") + Seq(1).toDF("c1").createOrReplaceTempView("t3") + Seq(1).toDF("c1").createOrReplaceTempView("t4") + spark.catalog.cacheTable("t1") + spark.catalog.cacheTable("t2") + spark.catalog.cacheTable("t3") + spark.catalog.cacheTable("t4") + + // Nested predicate subquery + val cachedPlan = +sql( +""" + |SELECT * FROM t1 + |WHERE + |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1)) +""".stripMargin).queryExecution.optimizedPlan + + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 3) + + // Scalar subquery and predicate subquery + val cachedPlan2 = +sql( + """ +|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1) +|WHERE +|c1 = (SELECT max(c1) FROM t2 GROUP BY c1) +|OR +|EXISTS (SELECT c1 FROM t3) +|OR +|c1 IN (SELECT c1 FROM t4) + """.stripMargin).queryExecution.optimizedPlan + + + val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]] + cachedRelations += getCachedPlans(cachedPlan2) + cachedPlan2 transformAllExpressions { +case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan) + e + } + assert(cachedRelations.flatten.size == 4) + + spark.catalog.uncacheTable("t1") + spark.catalog.uncacheTable("t2") + spark.catalog.uncacheTable("t3") + spark.catalog.uncacheTable("t4") --- End diff -- ```Scala override def afterEach(): Unit = { try { clearCache() } finally { super.afterEach() } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16493#discussion_r95049495 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext case i: InMemoryRelation => i }.size == 1) } + + test("SPARK-19093 Caching in side subquery") { +withTempView("t1") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + spark.catalog.cacheTable("t1") + val cachedPlan = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin).queryExecution.optimizedPlan + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 2) + spark.catalog.uncacheTable("t1") +} + } + + test("SPARK-19093 scalar and nested predicate query") { +def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = { + plan collect { +case i: InMemoryRelation => i + } +} +withTempView("t1", "t2", "t3", "t4") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(2).toDF("c1").createOrReplaceTempView("t2") + Seq(1).toDF("c1").createOrReplaceTempView("t3") + Seq(1).toDF("c1").createOrReplaceTempView("t4") + spark.catalog.cacheTable("t1") + spark.catalog.cacheTable("t2") + spark.catalog.cacheTable("t3") + spark.catalog.cacheTable("t4") + + // Nested predicate subquery + val cachedPlan = +sql( +""" + |SELECT * FROM t1 + |WHERE + |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1)) +""".stripMargin).queryExecution.optimizedPlan + + assert( +cachedPlan.collect { + case i: InMemoryRelation => i +}.size == 3) + + // Scalar subquery and predicate subquery + val cachedPlan2 = +sql( + """ +|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1) +|WHERE +|c1 = (SELECT max(c1) FROM t2 GROUP BY c1) +|OR +|EXISTS (SELECT c1 FROM t3) +|OR +|c1 IN (SELECT c1 FROM t4) + """.stripMargin).queryExecution.optimizedPlan + + + val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]] + cachedRelations += getCachedPlans(cachedPlan2) + cachedPlan2 transformAllExpressions { +case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan) + e + } + assert(cachedRelations.flatten.size == 4) + + spark.catalog.uncacheTable("t1") + spark.catalog.uncacheTable("t2") + spark.catalog.uncacheTable("t3") + spark.catalog.uncacheTable("t4") --- End diff -- You can call `clearCache()` and then no need to uncache each table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16493 @dilipbiswal Could you post the nested subquery in the PR description? It can help the other reviewers understand the fix. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16494 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16493 Although the test cases can be improved, the code fix looks good to me. cc @JoshRosen @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71002/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16494 **[Test build #71002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71002/testReport)** for PR 16494 at commit [`0d1c475`](https://github.com/apache/spark/commit/0d1c475c80a6fd0373108610ca8e41f7af0e6d01). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14451: [SPARK-16848][SQL] Check schema validation for user-spec...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14451 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16387#discussion_r95048733 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -115,6 +115,7 @@ class ExternalAppendOnlyMap[K, V, C]( private val keyComparator = new HashComparator[K] private val ser = serializer.newInstance() + @volatile private var isReadingIterator: Boolean = false --- End diff -- Yeah, alternatively we can remove the assert and check if `readingIterator` is null or not. I just want to keep the original behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16430: [SPARK-17077] [SQL] Cardinality estimation for pr...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16430#discussion_r95048331 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/ProjectEstimationSuite.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.statsEstimation + +import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeMap, AttributeReference} +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils._ +import org.apache.spark.sql.types.IntegerType + + +class ProjectEstimationSuite extends StatsEstimationTestBase { + + test("estimate project with alias") { +val ar1 = AttributeReference("key1", IntegerType)() +val ar2 = AttributeReference("key2", IntegerType)() +val colStat1 = ColumnStat(2, Some(1), Some(2), 0, 4, 4) +val colStat2 = ColumnStat(1, Some(10), Some(10), 0, 4, 4) + +val child = StatsTestPlan( + outputList = Seq(ar1, ar2), + stats = Statistics( +sizeInBytes = 2 * (4 + 4), +rowCount = Some(2), +attributeStats = AttributeMap(Seq(ar1 -> colStat1, ar2 -> colStat2 + +val project = Project(Seq(ar1, Alias(ar2, "abc")()), child) +val expectedColStats = Seq("key1" -> colStat1, "abc" -> colStat2) +val expectedAttrStats = toAttributeMap(expectedColStats, project) +// The number of rows won't change for project. +val expectedStats = Statistics( + sizeInBytes = 2 * getRowSize(project.output, expectedAttrStats), --- End diff -- I tested getRowSize for int type. But yes, we should have a separate test for this method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...
Github user merlintang closed the pull request at: https://github.com/apache/spark/pull/15819 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...
Github user merlintang commented on the issue: https://github.com/apache/spark/pull/15819 Many thanks, Xiao. I learnt lots. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15819 @merlintang Can you close this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15819 Thanks! Merging to 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16492 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70995/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16492 **[Test build #70995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70995/testReport)** for PR 16492 at commit [`080a269`](https://github.com/apache/spark/commit/080a2698928366e4a17d165cebebf4f44c797f40). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16494 @jkbradley @vanzin @skyluc @luluorta @uncleGen @kanzhang Could you please take a look at this pull request to fix the method fromEdges in EdgeRDD class used by LDA? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16480 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel retu...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16491#discussion_r95047025 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -260,6 +260,14 @@ class LDASuite extends SparkFunSuite with MLlibTestSparkContext with DefaultRead Vectors.dense(model2.topicsMatrix.toArray) absTol 1e-6) assert(Vectors.dense(model.getDocConcentration) ~== Vectors.dense(model2.getDocConcentration) absTol 1e-6) + val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior + val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior + val trainingLogLikelihood = +model.asInstanceOf[DistributedLDAModel].trainingLogLikelihood + val trainingLogLikelihood2 = +model2.asInstanceOf[DistributedLDAModel].trainingLogLikelihood + assert(logPrior ~== logPrior2 absTol 1e-6) + assert(trainingLogLikelihood ~== trainingLogLikelihood2 absTol 1e-6) --- End diff -- Ok, I guess I remember this wrong because of the other PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16480 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70996/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16480 **[Test build #70996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70996/testReport)** for PR 16480 at commit [`0034461`](https://github.com/apache/spark/commit/00344616edfcc11d48fee5775186f26c3d49b118). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16494 **[Test build #71002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71002/testReport)** for PR 16494 at commit [`0d1c475`](https://github.com/apache/spark/commit/0d1c475c80a6fd0373108610ca8e41f7af0e6d01). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing w...
GitHub user imatiach-msft opened a pull request: https://github.com/apache/spark/pull/16494 [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with ClassCastException ## What changes were proposed in this pull request? LDA fails with a ClassCastException when run on a dataset with at least one row that contains an empty sparse vector. The error occurs in method fromEdges where one of the edges may already be an EdgeRDDImpl and it does not need to be converted. ## How was this patch tested? I first ran LDA on the dataset provided by the JIRA submitter and I was able to reproduce the issue. I then fixed the issue based on the submitter's suggestion and simplified the test case so that we wouldn't need to read in a file. You can merge this pull request into a Git repository by running: $ git pull https://github.com/imatiach-msft/spark ilmat/fix-EMLDA Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16494.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16494 commit 0a201b713276b92a20db330ccd20b9a562694f5a Author: Ilya Matiach Date: 2017-01-06T21:36:19Z adding test case to reproduce the error commit 66dbfea60fec23fb8b39e23adf1861cfa02d7d42 Author: Ilya Matiach Date: 2017-01-07T00:42:49Z [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with ClassCastException commit 0d1c475c80a6fd0373108610ca8e41f7af0e6d01 Author: Ilya Matiach Date: 2017-01-07T01:04:40Z Optimizing test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/15119 @vanzin and @themodernlife , I have this fixed up to use the additional repositories when loading ivy settings from a file. Added a new test for loading a settings and fixed up the docs for `spark.jars.ivy` - I agree that the name is confusing, but hopefully clear in the docs. Thanks @themodernlife for helping out with the docs too. If you're able to try out this latest revision too, that would be great! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16473 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16473 **[Test build #71000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71000/testReport)** for PR 16473 at commit [`a1c0e59`](https://github.com/apache/spark/commit/a1c0e59bd7c5b139c2a682603a1fc4ca8ad211b1). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16473 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71000/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel retu...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16491#discussion_r95045854 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -260,6 +260,14 @@ class LDASuite extends SparkFunSuite with MLlibTestSparkContext with DefaultRead Vectors.dense(model2.topicsMatrix.toArray) absTol 1e-6) assert(Vectors.dense(model.getDocConcentration) ~== Vectors.dense(model2.getDocConcentration) absTol 1e-6) + val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior + val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior + val trainingLogLikelihood = +model.asInstanceOf[DistributedLDAModel].trainingLogLikelihood + val trainingLogLikelihood2 = +model2.asInstanceOf[DistributedLDAModel].trainingLogLikelihood + assert(logPrior ~== logPrior2 absTol 1e-6) + assert(trainingLogLikelihood ~== trainingLogLikelihood2 absTol 1e-6) --- End diff -- `LocalLDAModel` doesn't extend `DistributedLDAModel` and vice versa. I am not clear how to check `trainingLogLikelihood ` and `logPrior` in `LocalLDAModel`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15119 **[Test build #71001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71001/testReport)** for PR 15119 at commit [`7a7b6ba`](https://github.com/apache/spark/commit/7a7b6ba213e57a705642d42220037a7b9a18e3a6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16493 **[Test build #70999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70999/testReport)** for PR 16493 at commit [`f733f90`](https://github.com/apache/spark/commit/f733f90325b975973e60272ba6708dff5059f9dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16473 **[Test build #71000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71000/testReport)** for PR 16473 at commit [`a1c0e59`](https://github.com/apache/spark/commit/a1c0e59bd7c5b139c2a682603a1fc4ca8ad211b1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16473 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/16493 [SPARK-19093][SQL] Cached tables are not used in SubqueryExpression ## What changes were proposed in this pull request? Consider the plans inside subquery expressions while looking up cache manager to make used of cached data. Currently CacheManager.useCachedData does not consider the subquery expressions in the plan. ## How was this patch tested? Added new tests in CachedTableSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark SPARK-19093 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16493.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16493 commit f733f90325b975973e60272ba6708dff5059f9dd Author: Dilip Biswal Date: 2017-01-07T00:18:23Z [SPARK-19093] Cached tables are not used in SubqueryExpression --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16387#discussion_r95044381 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala --- @@ -115,6 +115,7 @@ class ExternalAppendOnlyMap[K, V, C]( private val keyComparator = new HashComparator[K] private val ser = serializer.newInstance() + @volatile private var isReadingIterator: Boolean = false --- End diff -- I'm a little confused. Isn't this having the same effect as just removing the assert, since you're setting this to `true` right after instantiating `readingIterator`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16492 **[Test build #70998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70998/testReport)** for PR 16492 at commit [`59a1161`](https://github.com/apache/spark/commit/59a11611999fddd0670218b16b991e691bcc574e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16443: [SPARK-19042] Remove query string from jar url fo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16443#discussion_r95044096 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -34,12 +34,14 @@ import org.apache.spark.deploy.SparkHadoopUtil import org.apache.spark.internal.Logging import org.apache.spark.memory.TaskMemoryManager import org.apache.spark.rpc.RpcTimeout -import org.apache.spark.scheduler.{AccumulableInfo, DirectTaskResult, IndirectTaskResult, Task} +import org.apache.spark.scheduler.{DirectTaskResult, IndirectTaskResult, Task} import org.apache.spark.shuffle.FetchFailedException import org.apache.spark.storage.{StorageLevel, TaskResultBlockId} import org.apache.spark.util._ import org.apache.spark.util.io.ChunkedByteBuffer + --- End diff -- Don't add these. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16138 **[Test build #70997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70997/testReport)** for PR 16138 at commit [`1eb2ad0`](https://github.com/apache/spark/commit/1eb2ad00f4d033134d1d66d5dda24eee8cd29489). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16480 **[Test build #70996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70996/testReport)** for PR 16480 at commit [`0034461`](https://github.com/apache/spark/commit/00344616edfcc11d48fee5775186f26c3d49b118). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org