[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111712646 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -368,6 +369,8 @@ case class NullPropagation(conf: SQLConf) extends Rule[LogicalPlan] { case EqualNullSafe(Literal(null, _), r) => IsNull(r) case EqualNullSafe(l, Literal(null, _)) => IsNull(l) + case AssertNotNull(c, _) if !c.nullable => c --- End diff -- actually, I checked all the usage of `AssertNotNull`, we never use `AssertNotNull` to check a not nullable column/field, seems the document of `AssertNotNull` is wrong. Can you double check? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75850/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17568 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17568 **[Test build #75850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75850/testReport)** for PR 17568 at commit [`f695e50`](https://github.com/apache/spark/commit/f695e50e38bd329db3b75951dd7af52fea3b3dde). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111711655 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -368,6 +369,8 @@ case class NullPropagation(conf: SQLConf) extends Rule[LogicalPlan] { case EqualNullSafe(Literal(null, _), r) => IsNull(r) case EqualNullSafe(l, Literal(null, _)) => IsNull(l) + case AssertNotNull(c, _) if !c.nullable => c --- End diff -- ah good catch! sorry it was my mistake, but then seems we can not remove `MapObjects`, as the null check have to be done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the accumulator name to ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17596 seems this breaks python accumulator anyone know how python accumulator works? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17640 Based on my understanding, it does not directly solvethe 12360. This one just solves the serialization of a specific type `bigint` in struct field. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/17640 For `Inf` case, I used a very large number: 1380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013 80742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240138074279341524013807427934152401380742793415240 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17620: [SPARK-20305][Spark Core]Master may keep in the state of...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/17620 Execute me, Can this issue be closed or threre are some other problem? @jerryshao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17540 yea let's remove that test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIKE' patt...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15398 Re-checked the current change, I think it is in a good shape. Do we have unsolved issues or decisions on this? ping @jodersky Would you like to update this with master? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17568 **[Test build #75850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75850/testReport)** for PR 17568 at commit [`f695e50`](https://github.com/apache/spark/commit/f695e50e38bd329db3b75951dd7af52fea3b3dde). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111704431 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -368,6 +369,8 @@ case class NullPropagation(conf: SQLConf) extends Rule[LogicalPlan] { case EqualNullSafe(Literal(null, _), r) => IsNull(r) case EqualNullSafe(l, Literal(null, _)) => IsNull(l) + case AssertNotNull(c, _) if !c.nullable => c --- End diff -- I am not sure if @cloud-fan's no-op `AssertNotNull` is as the same as the case in `AssertNotNull`'s description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75847/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111704129 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala --- @@ -96,3 +98,30 @@ object CombineTypedFilters extends Rule[LogicalPlan] { } } } + +/** + * Removes MapObjects when the following conditions are satisfied + * 1. Mapobject(e) where e is lambdavariable(), which means types for input output + * are primitive types + * 2. no custom collection class specified + * representation of data item. For example back to back map operations. + */ +object EliminateMapObjects extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case _ @ DeserializeToObject(Invoke( +MapObjects(_, _, _, Cast(LambdaVariable(_, _, dataType, _), castDataType, _), + inputData, None), +funcName, returnType: ObjectType, arguments, propagateNull, returnNullable), +outputObjAttr, child) if dataType == castDataType => --- End diff -- I see --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17655 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17655 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75849/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111704118 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -368,6 +369,8 @@ case class NullPropagation(conf: SQLConf) extends Rule[LogicalPlan] { case EqualNullSafe(Literal(null, _), r) => IsNull(r) case EqualNullSafe(l, Literal(null, _)) => IsNull(l) + case AssertNotNull(c, _) if !c.nullable => c --- End diff -- I think that this is what @cloud-fan suggested in[ his comment](https://github.com/apache/spark/pull/17568#discussion_r111521892). Am my interpretation wrong? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #75847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75847/testReport)** for PR 15435 at commit [`053284d`](https://github.com/apache/spark/commit/053284da60d72a79eb1f94da6d2c7dda74a21af8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17655 **[Test build #75849 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75849/testReport)** for PR 17655 at commit [`65b0ff7`](https://github.com/apache/spark/commit/65b0ff76a2af83053e45948d1df60092fae118fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r111704017 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -68,7 +68,30 @@ trait StringRegexExpression extends ImplicitCastInputTypes { * Simple RegEx pattern matching function */ @ExpressionDescription( - usage = "str _FUNC_ pattern - Returns true if `str` matches `pattern`, or false otherwise.") + usage = "str _FUNC_ pattern - Returns true if str matches pattern, " + +"null if any arguments are null, false otherwise.", + extended = """ +Arguments: + str - a string expression + pattern - a string expression. The pattern is a string which is matched literally, with +exception to the following special symbols: + + _ matches any one character in the input (similar to . in posix regular expressions) + + % matches zero ore more characters in the input (similar to .* in posix regular --- End diff -- ore -> or? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75846/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r111703865 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -39,6 +39,32 @@ object SQLExecution { executionIdToQueryExecution.get(executionId) } + private val testing = sys.props.contains("spark.testing") + + private[sql] def checkSQLExecutionId(sparkSession: SparkSession): Unit = { --- End diff -- this is only called in `FileFormatWirter`, is there any other places we need to consider? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #75846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75846/testReport)** for PR 15435 at commit [`bd40098`](https://github.com/apache/spark/commit/bd40098912e28a42e2a9011c4a5d298ca737dc69). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17540#discussion_r111703744 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -180,9 +180,9 @@ class Dataset[T] private[sql]( // to happen right away to let these side effects take place eagerly. queryExecution.analyzed match { case c: Command => -LocalRelation(c.output, queryExecution.executedPlan.executeCollect()) +LocalRelation(c.output, withAction("collect", queryExecution)(_.executeCollect())) case u @ Union(children) if children.forall(_.isInstanceOf[Command]) => -LocalRelation(u.output, queryExecution.executedPlan.executeCollect()) +LocalRelation(u.output, withAction("collect", queryExecution)(_.executeCollect())) --- End diff -- shall we only add execution id for commands that will trigger execution? AFAIK there are 3 commands: `CreateDataSourceTableAsSelectCommand`, `CreateHiveTableAsSelectCommand` and `CacheTable`. We can call `SQLExecution.withNewExecutionId` inside these 3 commands. Then we don't need to worry about nested execution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17655 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17655 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75848/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17655 **[Test build #75848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75848/testReport)** for PR 17655 at commit [`47771e1`](https://github.com/apache/spark/commit/47771e1ce11107b62057c7bc4e9909c008b3fe58). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17641: [SPARK-20329][SQL] Make timezone aware expression...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17641#discussion_r111703152 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -99,12 +99,9 @@ case class ResolveInlineTables(conf: SQLConf) extends Rule[LogicalPlan] { val castedExpr = if (e.dataType.sameType(targetType)) { e } else { -Cast(e, targetType) +Cast(e, targetType, Some(conf.sessionLocalTimeZone)) } - castedExpr.transform { -case e: TimeZoneAwareExpression if e.timeZoneId.isEmpty => - e.withTimeZone(conf.sessionLocalTimeZone) - }.eval() + castedExpr.eval() --- End diff -- oh, right. I saw the changes to `TimeZoneAwareExpression`. :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17641: [SPARK-20329][SQL] Make timezone aware expression...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/17641#discussion_r111702719 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -99,12 +99,9 @@ case class ResolveInlineTables(conf: SQLConf) extends Rule[LogicalPlan] { val castedExpr = if (e.dataType.sameType(targetType)) { e } else { -Cast(e, targetType) +Cast(e, targetType, Some(conf.sessionLocalTimeZone)) } - castedExpr.transform { -case e: TimeZoneAwareExpression if e.timeZoneId.isEmpty => - e.withTimeZone(conf.sessionLocalTimeZone) - }.eval() + castedExpr.eval() --- End diff -- I guess now that `TimeZoneAwareExpression` is resolved if it has `timeZoneId`, so we don't need to transform child. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17623: [SPARK-20292][SQL] Clean up string representation of Tre...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17623 > What are the external impacts of these changes? Which commands are impacted? This patch mainly cleans up the definition of two string representation methods: `simpleString`, `verboseString`. `simpleString` doesn't show argument info anymore. `verboseString` doesn't show children info anymore. Due to above change, `Expression.treeString` is changed too. Previously we show duplicate children information, like the example shown in the pr description. Now the children info is shown only once as tree representation. We don't have too much similar mess in `QueryPlan`. Previously, `QueryPlan.verboseString` is the alias of `QueryPlan.simpleString`. Following the definition above, now `QueryPlan.simpleString` shows simple string representation without argument info. After this patch, in order to know the arguments of an expression/query plan, an user should use `verboseString`, instead of `simpleString`. In order to know the children of an expression/query plan, an user should use `treeString`, instead of `verboseString`. I think the only one command uses those string representations is `explain`. This patch won't cause change to its output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17641: [SPARK-20329][SQL] Make timezone aware expression...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17641#discussion_r111701221 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala --- @@ -99,12 +99,9 @@ case class ResolveInlineTables(conf: SQLConf) extends Rule[LogicalPlan] { val castedExpr = if (e.dataType.sameType(targetType)) { e } else { -Cast(e, targetType) +Cast(e, targetType, Some(conf.sessionLocalTimeZone)) } - castedExpr.transform { -case e: TimeZoneAwareExpression if e.timeZoneId.isEmpty => - e.withTimeZone(conf.sessionLocalTimeZone) - }.eval() + castedExpr.eval() --- End diff -- If there are nested expressions which are timezone aware, I think we still need to attach time zone to them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17149 @gatorsmile, Thanks for your pointer. There is a good discussion there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17149: [SPARK-19257][SQL]location for table/partition/database ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17149 Our parser might need a change regarding escape handling. We are having a related discussion in another PR: https://github.com/apache/spark/pull/15398 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17644 I'll review it after branch 2.2 is cut --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111699305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala --- @@ -96,3 +98,30 @@ object CombineTypedFilters extends Rule[LogicalPlan] { } } } + +/** + * Removes MapObjects when the following conditions are satisfied + * 1. Mapobject(e) where e is lambdavariable(), which means types for input output + * are primitive types + * 2. no custom collection class specified + * representation of data item. For example back to back map operations. --- End diff -- Is this comment broken? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111699178 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -368,6 +369,8 @@ case class NullPropagation(conf: SQLConf) extends Rule[LogicalPlan] { case EqualNullSafe(Literal(null, _), r) => IsNull(r) case EqualNullSafe(l, Literal(null, _)) => IsNull(l) + case AssertNotNull(c, _) if !c.nullable => c --- End diff -- Is this safe to do? According to the description of `AssertNotNull`, even `c` is non-nullable, we still need to add this assertion for some cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17375: [SPARK-19019][PYTHON][BRANCH-1.6] Fix hijacked `collecti...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17375 gentle ping ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17622: [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recom...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17622 LGTM except for a doc comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17622: [SPARK-20300][ML][PYSPARK] Python API for ALSMode...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17622#discussion_r111698372 --- Diff: python/pyspark/ml/recommendation.py --- @@ -384,6 +392,28 @@ def itemFactors(self): """ return self._call_java("itemFactors") +@since("2.2.0") +def recommendForAllUsers(self, numItems): +""" +Returns top `numItems` items recommended for each user, for all users. + +:param numItems: max number of recommendations for each user +:return: a DataFrame of (userCol, recommendations), where recommendations are + stored as an array of (itemCol, rating) Rows. +""" +return self._call_java("recommendForAllUsers", numItems) + +@since("2.2.0") +def recommendForAllItems(self, numUsers): +""" +Returns top `numUsers` users recommended for each item, for all items. + +:param numItems: max number of recommendations for each item --- End diff -- numItems -> numUsers --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111697961 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala --- @@ -96,3 +98,30 @@ object CombineTypedFilters extends Rule[LogicalPlan] { } } } + +/** + * Removes MapObjects when the following conditions are satisfied + * 1. Mapobject(e) where e is lambdavariable(), which means types for input output + * are primitive types + * 2. no custom collection class specified + * representation of data item. For example back to back map operations. + */ +object EliminateMapObjects extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case _ @ DeserializeToObject(Invoke( +MapObjects(_, _, _, Cast(LambdaVariable(_, _, dataType, _), castDataType, _), + inputData, None), +funcName, returnType: ObjectType, arguments, propagateNull, returnNullable), +outputObjAttr, child) if dataType == castDataType => + DeserializeToObject(Invoke( +inputData, funcName, returnType, arguments, propagateNull, returnNullable), +outputObjAttr, child) +case _ @ DeserializeToObject(Invoke( +MapObjects(_, _, _, LambdaVariable(_, _, dataType, _), inputData, None), --- End diff -- Ok, for safety, we can keep it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111697946 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala --- @@ -96,3 +98,30 @@ object CombineTypedFilters extends Rule[LogicalPlan] { } } } + +/** + * Removes MapObjects when the following conditions are satisfied + * 1. Mapobject(e) where e is lambdavariable(), which means types for input output + * are primitive types + * 2. no custom collection class specified + * representation of data item. For example back to back map operations. + */ +object EliminateMapObjects extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case _ @ DeserializeToObject(Invoke( +MapObjects(_, _, _, Cast(LambdaVariable(_, _, dataType, _), castDataType, _), + inputData, None), +funcName, returnType: ObjectType, arguments, propagateNull, returnNullable), +outputObjAttr, child) if dataType == castDataType => --- End diff -- The order does not matter. The batch will be run multiple times. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17655 **[Test build #75849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75849/testReport)** for PR 17655 at commit [`65b0ff7`](https://github.com/apache/spark/commit/65b0ff76a2af83053e45948d1df60092fae118fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17655 **[Test build #75848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75848/testReport)** for PR 17655 at commit [`47771e1`](https://github.com/apache/spark/commit/47771e1ce11107b62057c7bc4e9909c008b3fe58). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111697079 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala --- @@ -96,3 +98,30 @@ object CombineTypedFilters extends Rule[LogicalPlan] { } } } + +/** + * Removes MapObjects when the following conditions are satisfied + * 1. Mapobject(e) where e is lambdavariable(), which means types for input output + * are primitive types + * 2. no custom collection class specified + * representation of data item. For example back to back map operations. + */ +object EliminateMapObjects extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case _ @ DeserializeToObject(Invoke( +MapObjects(_, _, _, Cast(LambdaVariable(_, _, dataType, _), castDataType, _), + inputData, None), +funcName, returnType: ObjectType, arguments, propagateNull, returnNullable), +outputObjAttr, child) if dataType == castDataType => + DeserializeToObject(Invoke( +inputData, funcName, returnType, arguments, propagateNull, returnNullable), +outputObjAttr, child) +case _ @ DeserializeToObject(Invoke( +MapObjects(_, _, _, LambdaVariable(_, _, dataType, _), inputData, None), --- End diff -- As @cloud-fan pointed out in [this comment](https://github.com/apache/spark/pull/17568#discussion_r110510575) , it is necessary. `customCollectionCls` is introduced by #16541. This is not equal to `None` when `Seq()` is used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17655 cc @srowen @HyukjinKwon @cloud-fan @nihavend --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17655: [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLow...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17655 [SPARK-20156] [SQL] [FOLLOW-UP] Java String toLowerCase "Turkish locale bug" in Database and Table DDLs ### What changes were proposed in this pull request? Database and Table names conform the Hive standard ("[a-zA-z_0-9]+"), i.e. if this name only contains characters, numbers, and _. When calling `toLowerCase` on the names, we should add `Locale.ROOT` to the `toLowerCase`for avoiding inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem"). ### How was this patch tested? Added a test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark locale Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17655.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17655 commit 47771e1ce11107b62057c7bc4e9909c008b3fe58 Author: Xiao Li Date: 2017-04-17T01:33:54Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #75847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75847/testReport)** for PR 15435 at commit [`053284d`](https://github.com/apache/spark/commit/053284da60d72a79eb1f94da6d2c7dda74a21af8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17568#discussion_r111696732 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala --- @@ -96,3 +98,30 @@ object CombineTypedFilters extends Rule[LogicalPlan] { } } } + +/** + * Removes MapObjects when the following conditions are satisfied + * 1. Mapobject(e) where e is lambdavariable(), which means types for input output + * are primitive types + * 2. no custom collection class specified + * representation of data item. For example back to back map operations. + */ +object EliminateMapObjects extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case _ @ DeserializeToObject(Invoke( +MapObjects(_, _, _, Cast(LambdaVariable(_, _, dataType, _), castDataType, _), + inputData, None), +funcName, returnType: ObjectType, arguments, propagateNull, returnNullable), +outputObjAttr, child) if dataType == castDataType => --- End diff -- For now, as you pointed out, `Cast` has been removed by `SimplifyCasts`. I leave this for robustness. In the future, this optimization will be executed before `SimplifyCasts` by reordering. What do you think? cc: @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17649: [SPARK-20023][SQL][follow up] Output table commen...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/17649#discussion_r111696709 --- Diff: sql/core/src/test/resources/sql-tests/inputs/describe_tbleproperty_validation.sql --- @@ -0,0 +1,24 @@ +CREATE TABLE table_with_comment (a STRING, b INT) COMMENT 'actual comment'; + +DESC formatted table_with_comment; + +-- ALTER TABLE BY MODIFYING COMMENT +ALTER TABLE table_with_comment set tblproperties(comment = "modified comment"); + +DESC formatted table_with_comment; + +-- DROP TEST TABLE +DROP TABLE table_with_comment; + +-- CREATE TABLE WITHOUT COMMENT +CREATE TABLE table_comment (a STRING, b INT); + +DESC formatted table_comment; + +-- ALTER TABLE BY ADDING COMMENT +ALTER TABLE table_comment set tblproperties(comment = "added comment"); + +DESC formatted table_comment; + +-- DROP TEST TABLE +DROP TABLE table_comment; --- End diff -- sure, i will add a new jira for this problem and i will update the test suite name as per the suggestion, as you suggested i will also verify ALTER TABLE UNSET . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 @sethah Thanks! I have merged your updates and fix mima file conflicts. @yanboliang has just come back from trip and will help review and merge it into 2.2 so don't worry about it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17649: [SPARK-20023][SQL][follow up] Output table commen...
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/17649#discussion_r111696418 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -232,7 +232,9 @@ case class AlterTableSetPropertiesCommand( val table = catalog.getTableMetadata(tableName) DDLUtils.verifyAlterTableType(catalog, table, isView) // This overrides old properties -val newTable = table.copy(properties = table.properties ++ properties) +val newTable = table.copy( + properties = table.properties ++ properties, + comment = properties.get("comment")) --- End diff -- I will add a comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #75846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75846/testReport)** for PR 15435 at commit [`bd40098`](https://github.com/apache/spark/commit/bd40098912e28a42e2a9011c4a5d298ca737dc69). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17654: [SPARK-20351] [ML] Add trait hasTrainingSummary to repla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17654 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17654: [SPARK-20351] [ML] Add trait hasTrainingSummary to repla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17654 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75845/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17654: [SPARK-20351] [ML] Add trait hasTrainingSummary to repla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17654 **[Test build #75845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75845/testReport)** for PR 17654 at commit [`3bca3b1`](https://github.com/apache/spark/commit/3bca3b1429fe6da01b17c74634952009250457da). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17651 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75843/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17651 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17651 **[Test build #75843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75843/testReport)** for PR 17651 at commit [`0031804`](https://github.com/apache/spark/commit/00318043d0a5c6d1eb1404402fc390904d2ba2dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17654: [SPARK-20351] [ML] Add trait hasTrainingSummary to repla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17654 **[Test build #75845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75845/testReport)** for PR 17654 at commit [`3bca3b1`](https://github.com/apache/spark/commit/3bca3b1429fe6da01b17c74634952009250457da). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17654: [SPARK-20351] [ML] Add trait hasTrainingSummary t...
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/17654 [SPARK-20351] [ML] Add trait hasTrainingSummary to replace the duplicate code ## What changes were proposed in this pull request? Add a trait HasTrainingSummary to avoid code duplicate related to training summary. ## How was this patch tested? existing Java and Scala unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark hassummary Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17654.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17654 commit 3bca3b1429fe6da01b17c74634952009250457da Author: Yuhao Yang Date: 2017-04-16T23:29:27Z has summary trait --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17527 Yes. The codes have the bug. For example, when the locale is TR, users are unable to create a table with a table name containing `I`. This does not make sense to me. I believe we have more issues like this. I can submit a PR to fix this, but I do not think this is the only one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-19591][ML][MLlib] Add sample weights to decision ...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16722 Btw, I've been working on this and just posted some thoughts about one design choice here: https://issues.apache.org/jira/browse/SPARK-9478 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17653: [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17653 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17653: [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17653 **[Test build #75844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75844/testReport)** for PR 17653 at commit [`17d8190`](https://github.com/apache/spark/commit/17d819022de875777e158e94ad3ef1c8d6d2f3aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17653: [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17653 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75844/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17652: [SPARK-20335] [SQL] [BACKPORT-2.1] Children expressions ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75842/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17652: [SPARK-20335] [SQL] [BACKPORT-2.1] Children expressions ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17652 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17652: [SPARK-20335] [SQL] [BACKPORT-2.1] Children expressions ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17652 **[Test build #75842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75842/testReport)** for PR 17652 at commit [`68d1e4d`](https://github.com/apache/spark/commit/68d1e4d47c6479e899e59777c7a6e86f2d6e75dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15435 @WeichenXu123 I made a PR to your branch. Can you check it? I think you'll still need to update the Mima file. Also, this may not make 2.2, so then you'd have to update the since tags. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17653: [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17653 **[Test build #75844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75844/testReport)** for PR 17653 at commit [`17d8190`](https://github.com/apache/spark/commit/17d819022de875777e158e94ad3ef1c8d6d2f3aa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17653: [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17653 cc @felixcheung, this simply renames it to `as.json.array`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17653: [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to ...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17653 [SPARK-19828][R][FOLLOWUP] Rename asJsonArray to as.json.array in from_json function in R ## What changes were proposed in this pull request? This was suggested to be `as.json.array` at the first place in the PR to SPARK-19828 but we could not do this as the lint check emits an error for multiple dots in the variable names. After SPARK-20278, now we are able to use `multiple.dots.in.names`. `asJsonArray` in `from_json` function is still able to be changed as 2.2 is not released yet. So, this PR proposes to rename `asJsonArray` to `as.json.array`. ## How was this patch tested? Jenkins tests, local tests with `./R/run-tests.sh` and manual `./dev/lint-r`. Existing tests should cover this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-19828-followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17653.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17653 commit 17d819022de875777e158e94ad3ef1c8d6d2f3aa Author: hyukjinkwon Date: 2017-04-16T21:39:28Z Rename asJsonArray to as.json.array in from_json function in R --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17651 **[Test build #75843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75843/testReport)** for PR 17651 at commit [`0031804`](https://github.com/apache/spark/commit/00318043d0a5c6d1eb1404402fc390904d2ba2dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17651 cc @srowen, could you check if it makes sense to you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17650#discussion_r111692337 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -153,6 +153,11 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { case TrueLiteral Or _ => TrueLiteral case _ Or TrueLiteral => TrueLiteral + case a And b if Not(a).semanticEquals(b) => FalseLiteral + case a Or b if Not(a).semanticEquals(b) => TrueLiteral + case a And b if a.semanticEquals(Not(b)) => FalseLiteral --- End diff -- I meant something like this for `Not`: ``` override def semanticEquals(other: Expression): Boolean = other match { case Not(otherChild) => child.semanticEquals(otherChild) case _ => child match { case Not(innerChild) => // eliminate double negation innerChild.semanticEquals(other) case _ => super.semanticEquals(other) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17650#discussion_r111692327 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala --- @@ -160,4 +166,12 @@ class BooleanSimplificationSuite extends PlanTest with PredicateHelper { testRelation.where('a > 2 || ('b > 3 && 'b < 5))) comparePlans(actual, expected) } + + test("Complementation Laws") { --- End diff -- How about double negation ? ie. `'a && !(!'a)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17590: [SPARK-20278][R] Disable 'multiple_dots_linter' lint rul...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17590 Sure, thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17527 Ah, sorry, it was only about fixing tests. I thought we have bugs in the main codes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17650: [SPARK-20350] Add optimization rules to apply Com...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17650#discussion_r111692175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -153,6 +153,11 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { case TrueLiteral Or _ => TrueLiteral case _ Or TrueLiteral => TrueLiteral + case a And b if Not(a).semanticEquals(b) => FalseLiteral + case a Or b if Not(a).semanticEquals(b) => TrueLiteral + case a And b if a.semanticEquals(Not(b)) => FalseLiteral --- End diff -- Logically it feels like duplication of code from line 156 ... but unfortunately `Not` is not smart enough to realise that. I think if you override the `semanticEquals` in `Not` then you should be able to get rid of this line. The advantage being we would make the expression smart enough to figure this out by itself rather than handling this in outside code (which is possibly more places in the code). Same applies for line 159. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17651 I left a uesless comment and removed it back (I misunderstood). Yes, I will add a small comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17652: [SPARK-20335] [SQL] [BACKPORT-2.1] Children expressions ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17652 **[Test build #75842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75842/testReport)** for PR 17652 at commit [`68d1e4d`](https://github.com/apache/spark/commit/68d1e4d47c6479e899e59777c7a6e86f2d6e75dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17651 Yea, pom was the first try and it was kind if a failed. Please check out the discussion in https://github.com/apache/spark/pull/17642 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17652: [SPARK-20335] [SQL] [BACKPORT-2.1] Children expre...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17652 [SPARK-20335] [SQL] [BACKPORT-2.1] Children expressions of Hive UDF impacts the determinism of Hive UDF ### What changes were proposed in this pull request? This PR is to backport https://github.com/apache/spark/pull/17635 to Spark 2.1 --- ```JAVA /** * Certain optimizations should not be applied if UDF is not deterministic. * Deterministic UDF returns same result each time it is invoked with a * particular input. This determinism just needs to hold within the context of * a query. * * @return true if the UDF is deterministic */ boolean deterministic() default true; ``` Based on the definition of [UDFType](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java#L42-L50), when Hive UDF's children are non-deterministic, Hive UDF is also non-deterministic. ### How was this patch tested? Added test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark backport-17635 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17652.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17652 commit 68d1e4d47c6479e899e59777c7a6e86f2d6e75dd Author: Xiao Li Date: 2017-04-16T20:34:34Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17651: [SPARK-20343][BUILD] Force Avro 1.7.7 in sbt build to re...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17651 perhaps have a reference in pom.xml to this so they both change together the next time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r111691537 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME)) .map(col => col.getName).toSet -filters.collect { - case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) => -s"${a.name} ${op.symbol} $v" - case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) => -s"$v ${op.symbol} ${a.name}" - case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType)) - if !varcharKeys.contains(a.name) => -s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}""" - case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute) - if !varcharKeys.contains(a.name) => -s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}""" -}.mkString(" and ") +def isFoldable(expr: Expression): Boolean = + (expr.dataType.isInstanceOf[IntegralType] || expr.dataType.isInstanceOf[StringType]) && --- End diff -- Can this support all `AtomicType`'s ? From my understanding these are partition columns and can support other types besides int and string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17524: [SPARK-19235] [SQL] [TEST] [FOLLOW-UP] Enable Test Cases...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17524 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75841/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17524: [SPARK-19235] [SQL] [TEST] [FOLLOW-UP] Enable Test Cases...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17524 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17524: [SPARK-19235] [SQL] [TEST] [FOLLOW-UP] Enable Test Cases...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17524 **[Test build #75841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75841/testReport)** for PR 17524 at commit [`427741f`](https://github.com/apache/spark/commit/427741f548ff4469d62906546655f7ec96564ced). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaImputerExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17527 Yes you have a point. It is minor in that it is just a test that is now locale sensitive and supporting the locale in tests is much less important. However ideally whatever fails should be fixed as I suspect it would be some trivial piece we missed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17527 Sorry, my previous comment is to @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/17644 cc @cloud-fan @hvanhovell @sameeragarwal for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user nihavend commented on the issue: https://github.com/apache/spark/pull/17527 maybe --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17650: [SPARK-20350] Add optimization rules to apply Complement...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75839/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17650: [SPARK-20350] Add optimization rules to apply Complement...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17650 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17650: [SPARK-20350] Add optimization rules to apply Complement...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17650 **[Test build #75839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75839/testReport)** for PR 17650 at commit [`688b2f0`](https://github.com/apache/spark/commit/688b2f0696f1d1d867e872f43506e52f95f46362). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17557: [SPARK-20208][R][DOCS] Document R fpGrowth suppor...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17557#discussion_r111690507 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -906,6 +910,37 @@ predicted <- predict(model, df) head(predicted) ``` + FP-growth + +`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on a `SparkDataFrame`. `itemsCol` should be an array of values. + +```{r} +items <- selectExpr(createDataFrame(data.frame(items = c( + "T,R,U", "T,S", "V,R", "R,U,T,V", "R,S", "V,S,U", "U,R", "S,T", "V,R", "V,U,S", + "T,V,U", "R,V", "T,S", "T,S", "S,T", "S,U", "T,R", "V,R", "S,V", "T,S,U" +))), "split(items, ',') AS items") --- End diff -- perhaps it's slightly less clear, since there are 3 references to "items" (or really, just the SparkDataFrame and its column name), which "items" L923 is referring to? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17557: [SPARK-20208][R][DOCS] Document R fpGrowth suppor...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17557#discussion_r111690515 --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd --- @@ -906,6 +910,37 @@ predicted <- predict(model, df) head(predicted) ``` + FP-growth + +`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on a `SparkDataFrame`. `itemsCol` should be an array of values. + +```{r} +items <- selectExpr(createDataFrame(data.frame(items = c( + "T,R,U", "T,S", "V,R", "R,U,T,V", "R,S", "V,S,U", "U,R", "S,T", "V,R", "V,U,S", + "T,V,U", "R,V", "T,S", "T,S", "S,T", "S,U", "T,R", "V,R", "S,V", "T,S,U" +))), "split(items, ',') AS items") --- End diff -- I like the approach you have there https://github.com/apache/spark/pull/17557/files#diff-1d0d34d8ea18a9340f0a02c6befe6269R30 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org