[GitHub] spark pull request #17953: [SPARK-20680][SQL] Spark-sql do not support for v...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17953#discussion_r116157223 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1504,6 +1504,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { case ("decimal", precision :: Nil) => DecimalType(precision.getText.toInt, 0) case ("decimal", precision :: scale :: Nil) => DecimalType(precision.getText.toInt, scale.getText.toInt) + case ("void", Nil) => NullType --- End diff -- Hive 2.x disables it. Could you add some test cases by reading and writing the tables? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17959 merged to master/2.2, please send a follow-up PR to address @gatorsmile 's comments, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17711#discussion_r116156736 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -32,3 +32,11 @@ select 1 - 2; select 2 * 5; select 5 % 3; select pmod(-7, 3); + +-- check operator precedence (We follow Oracle operator precedence: https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691) --- End diff -- ok! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17711 @maropu The solution using `tailrec` looks more straightforward. Could you submit the PR based on that? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17711#discussion_r116156556 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -32,3 +32,11 @@ select 1 - 2; select 2 * 5; select 5 % 3; select pmod(-7, 3); + +-- check operator precedence (We follow Oracle operator precedence: https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691) --- End diff -- The link could be ineffective in the future. Could you also copy the table contents here? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17959 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17959 How about `HiveTableScanExec`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17959#discussion_r116156087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -519,8 +519,18 @@ case class FileSourceScanExec( relation, output.map(QueryPlan.normalizeExprId(_, output)), requiredSchema, - partitionFilters.map(QueryPlan.normalizeExprId(_, output)), - dataFilters.map(QueryPlan.normalizeExprId(_, output)), + canonicalizeFilters(partitionFilters, output), + canonicalizeFilters(dataFilters, output), None) } + + private def canonicalizeFilters(filters: Seq[Expression], output: Seq[Attribute]) --- End diff -- Add a function description? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116155996 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,32 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { + !child.output.exists(a => resolver(a.name, attrName)) +} + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { case agg @ Aggregate(groups, aggs, child) if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedAttribute]) => -// This is a strict check though, we put this to apply the rule only in alias expressions -def notResolvableByChild(attrName: String): Boolean = - !child.output.exists(a => resolver(a.name, attrName)) -agg.copy(groupingExpressions = groups.map { - case u: UnresolvedAttribute if notResolvableByChild(u.name) => +groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { _.transform { +case u: UnresolvedAttribute if notResolvableByChild(u.name, child) => + aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u) + } +}) + + case gs @ GroupingSets(selectedGroups, groups, child, aggs) + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && +(selectedGroups :+ groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) => +def mayResolveAttrByAggregateExprs(exprs: Seq[Expression]): Seq[Expression] = exprs.map { --- End diff -- I think we should do `exprs.map { _.transform { ...` like above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14582 We could ask it to mailing-list if you strongly feel about this. For example, `from_json` function was also asked too to mailing list before getting merged. I think we should not add all the variants just for consistency and this is why I asked more interests. There are many variants for language-specific and application-specific and I usually stay against if there is an easy workaround and looks a kind of variant. I wouldn't stay against if there are more demands or interests for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116155780 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,32 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { + !child.output.exists(a => resolver(a.name, attrName)) +} + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { case agg @ Aggregate(groups, aggs, child) if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedAttribute]) => -// This is a strict check though, we put this to apply the rule only in alias expressions -def notResolvableByChild(attrName: String): Boolean = - !child.output.exists(a => resolver(a.name, attrName)) -agg.copy(groupingExpressions = groups.map { - case u: UnresolvedAttribute if notResolvableByChild(u.name) => +groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { _.transform { +case u: UnresolvedAttribute if notResolvableByChild(u.name, child) => + aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u) + } +}) + + case gs @ GroupingSets(selectedGroups, groups, child, aggs) + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && +(selectedGroups :+ groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) => --- End diff -- `groups` should cover `selectedGroups`. So we may not need to add `selectedGroups` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17948 LGTM too. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...
Github user ghoto commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116155652 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,20 @@ object Matrices { new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose) case sm: BSM[Double] => // There is no isTranspose flag for sparse matrices in Breeze -new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) + +// Some Breeze CSCMatrices may have extra trailing zeros in +// .rowIndices and .data, which are added after some matrix +// operations for efficiency. +// +// Therefore the last element of sm.colPtrs would no longer be +// coherent with the size of sm.rowIndices and sm.data +// despite sm being a valid CSCMatrix. +// We need to truncate both arrays (rowIndices, data) +// to the real size of the vector sm.activeSize to allow valid conversion + +val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize) +val truncData = sm.data.slice(0, sm.activeSize) --- End diff -- I'm implementing both suggestions, however, wouldn't be the sm.copy more expensive than just doing those 2 slices? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76843/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17959 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17959 **[Test build #76843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76843/testReport)** for PR 17959 at commit [`9ec86ec`](https://github.com/apache/spark/commit/9ec86ec1941bf0c329f4c6a1fb75271e91e51660). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait DataSourceScanExec extends LeafExecNode with CodegenSupport with PredicateHelper ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17957 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17957 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76842/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17957 **[Test build #76842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76842/testReport)** for PR 17957 at commit [`4032940`](https://github.com/apache/spark/commit/40329404299ece70aef7ef245704978fb9d1e6f9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16985 @cloud-fan : I have made suggested change(s). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17956 LGTM except for a minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116154314 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { assert(errMsg.startsWith("The field for corrupt records must be string type and nullable")) } } + + test("SPARK-18772: Parse special floats correctly") { +// positive cases +val jsons = Seq( + """{"a": "-INF"}""", + """{"a": "INF"}""", + """{"a": "-INF"}""", + """{"a": "NaN"}""", + """{"a": "Infinity"}""", + """{"a": "+Infinity"}""", + """{"a": "-Infinity"}""") + +val checks: Seq[Double => Boolean] = Seq( + _.isNegInfinity, + _.isPosInfinity, + _.isNegInfinity, + _.isNaN, + _.isPosInfinity, + _.isPosInfinity, + _.isNegInfinity) + +Seq(FloatType, DoubleType).foreach { dt => + jsons.zip(checks).foreach { case (json, check) => +val ds = spark.read + .schema(StructType(Seq(StructField("a", dt + .json(Seq(json).toDS()) + .select($"a".cast(DoubleType)).as[Double] +assert(check(ds.first())) + } +} + +// negative case +Seq(FloatType, DoubleType).foreach { dt => + val e = intercept[SparkException] { +spark.read + .option("mode", "FAILFAST") + .schema(StructType(Seq(StructField("a", dt + .json(Seq( """{"a": "nan"}""").toDS()) --- End diff -- Shall we also test other negative cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17052 Yea, I just wanted to check if it is in progress in any way. Thanks for your input. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...
Github user lalinsky commented on the issue: https://github.com/apache/spark/pull/14582 You were asking for more interest in the feature, there was no way I could answer that. :) Regarding the change itself, the system can already auto cast integer to a timestamp, but not a floating point number. Floating number timestamps are pretty common in a Python ecosystem, more so than integer ones. From my point of view, that's an inconsistent and surprising behavior, that I wanted to correct. I wouldn't send the patch if it didn't work for any number, but having it done for just one number type seemed wrong to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17711 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17711 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76840/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17711 **[Test build #76840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76840/testReport)** for PR 17711 at commit [`089db30`](https://github.com/apache/spark/commit/089db30958d2d78b131ed10eea0b733a18056bf7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17052 @HyukjinKwon Sorry! Busy for this period of time. Let me resolve this conflict. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76841/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17956 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17956 **[Test build #76841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)** for PR 17956 at commit [`660a284`](https://github.com/apache/spark/commit/660a2843050d99b15ca7676ead8b8be4117267f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17935: [SPARK-20690][SQL] Analyzer shouldn't add missing...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17935#discussion_r116153501 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -868,6 +868,29 @@ class SubquerySuite extends QueryTest with SharedSQLContext { Row(3, 3.0, 2, 3.0) :: Row(3, 3.0, 2, 3.0) :: Nil) } + test("SPARK-20690: Do not add missing attributes through subqueries") { +withTempView("onerow") { + Seq(1).toDF("c1").createOrReplaceTempView("onerow") + + val e = intercept[AnalysisException] { +sql( + """ +| select 1 +| from (select 1 from onerow t1 LIMIT 1) --- End diff -- I'm surprised we support this syntax, I think the FROM clause must have an alias. I checked with postgres, it will throw exception `subquery in FROM must have an alias`, can you check with other databases? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17948 **[Test build #76849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76849/testReport)** for PR 17948 at commit [`a809274`](https://github.com/apache/spark/commit/a8092742b99c9d43b04b4a4941345f179996a50f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17960#discussion_r116153263 --- Diff: sql/core/src/test/resources/sql-tests/inputs/limit.sql --- @@ -1,23 +1,27 @@ -- limit on various data types -select * from testdata limit 2; -select * from arraydata limit 2; -select * from mapdata limit 2; +SELECT * FROM testdata LIMIT 2; --- End diff -- I just wonder why these should be upper-cased just for curiosity. Is this way preferred? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116153034 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,31 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = + !child.output.exists(a => resolver(a.name, attrName)) --- End diff -- Thanks! Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17644#discussion_r116152814 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -871,6 +886,23 @@ private[hive] object HiveClientImpl { hiveTable.setViewOriginalText(t) hiveTable.setViewExpandedText(t) } + +table.bucketSpec match { + case Some(bucketSpec) => +hiveTable.setNumBuckets(bucketSpec.numBuckets) +hiveTable.setBucketCols(bucketSpec.bucketColumnNames.toList.asJava) --- End diff -- For data source table, which can be created by `CREATE TABLE src(...) USING parquet ...`, the bucketing information is in table properties, and hive will always read this table as a non-bucketed table. After your PR, for bucketed data source tables written by Spark, Hive will read them as bucketed tables and cause problems, because the bucket hashing function is different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116152643 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,31 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = + !child.output.exists(a => resolver(a.name, attrName)) --- End diff -- Nit: style ```Scala private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { !child.output.exists(a => resolver(a.name, attrName)) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17959 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17948 **[Test build #76848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76848/testReport)** for PR 17948 at commit [`0163656`](https://github.com/apache/spark/commit/0163656b8e5325cda7b80e0c0268c24608e9b871). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17960: [SPARK-20719] [SQL] Support LIMIT ALL
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17960 **[Test build #76847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76847/testReport)** for PR 17960 at commit [`b4a4b0a`](https://github.com/apache/spark/commit/b4a4b0aee836cd6f8944716bc84323487527fc19). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17960 [SPARK-20719] [SQL] Support LIMIT ALL ### What changes were proposed in this pull request? `LIMIT ALL` is the same as omitting the `LIMIT` clause. It is supported by both PrestgreSQL and Presto. This PR is to support it by adding it in the parser. ### How was this patch tested? Added a test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark LimitAll Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17960.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17960 commit b4a4b0aee836cd6f8944716bc84323487527fc19 Author: Xiao LiDate: 2017-05-12T04:34:37Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116151791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,30 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only in alias expressions --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17948 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116150714 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,30 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only in alias expressions --- End diff -- `... only if the expression is not resolvable by child` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17936: [SPARK-20638][Core][WIP]Optimize the CartesianRDD to red...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17936 The cluster test result. The `RDD.cartesian` is used in Spark mllib ALS algorithm, and compared with the latest spark master branch. Environments: Spark on Yarn with 9 executors(10 cores & 30 GB Mem) on three nodes. Test Data: The Data: User 480,000, and Item 17,000. Test Case: ``` object TestNetflixlib { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("Test Netflix mlib") val sc = new SparkContext(conf) val data = sc.textFile("hdfs://10.1.2.173:9000/nf_training_set.txt") val ratings = data.map(_.split("::") match { case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble) }) val rank = 0 val numIterations = 10 val train_start = System.nanoTime() val model = ALS.train(ratings, rank, numIterations, 0.01) val training_time = (System.nanoTime() - train_start)/ 1e9 println(s"Training time(s): $training_time") val rec_start = System.nanoTime() val userRec = model.recommendProductsForUsers(20) println(userRec.count()) val rec_time = (System.nanoTime() - rec_start) / 1e9 println(s"Recommend time(s): $rec_time") } } ``` Test Results: | Master Branch | Improved Branch | Percentage of ascension | | --| -- | -- | | 139.934s | 162.597s | 16 % | | 148.138s | 157.597s | 6% | | 157.899s | 189.580s | 20% | | 135.520s | 152.486s | 13% | | 166.101s | 184.485s | 11 % | --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16697 Yes. If inside, you are right - only the first will be logged ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/16697 you mean outside of https://github.com/apache/spark/pull/16697/files#diff-ca0fe05a42fd5edcab8a1bdaa8e58db9R210? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17942#discussion_r116148995 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -115,26 +115,33 @@ private[spark] abstract class Task[T]( case t: Throwable => e.addSuppressed(t) } +context.markTaskCompleted(Some(e)) throw e } finally { - // Call the task completion callbacks. - context.markTaskCompleted() try { -Utils.tryLogNonFatalError { - // Release memory used by this thread for unrolling blocks - SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP) - SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.OFF_HEAP) - // Notify any tasks waiting for execution memory to be freed to wake up and try to - // acquire memory again. This makes impossible the scenario where a task sleeps forever - // because there are no other tasks left to notify it. Since this is safe to do but may - // not be strictly necessary, we should revisit whether we can remove this in the future. - val memoryManager = SparkEnv.get.memoryManager - memoryManager.synchronized { memoryManager.notifyAll() } -} +// Call the task completion callbacks. If "markTaskCompleted" is called twice, the second +// one is no-op. --- End diff -- Missed this comment. LGTM. Thanks for clarifying @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17942#discussion_r116148841 --- Diff: core/src/main/scala/org/apache/spark/util/taskListeners.scala --- @@ -55,14 +55,16 @@ class TaskCompletionListenerException( extends RuntimeException { override def getMessage: String = { -if (errorMessages.size == 1) { --- End diff -- Thx for clarifying ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17956 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17906: [SPARK-20665][SQL]"Bround" and "Round" function return N...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17906 thanks, merging to master/2.2/2.1/2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17906: [SPARK-20665][SQL]"Bround" and "Round" function r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17906 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17959 also cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17956 **[Test build #76846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76846/testReport)** for PR 17956 at commit [`aa7c658`](https://github.com/apache/spark/commit/aa7c6580a733c0d964cabd1fcabf1f2730227f10). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17958 **[Test build #76845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76845/testReport)** for PR 17958 at commit [`1e36134`](https://github.com/apache/spark/commit/1e361344101ccaf3d8a9ddebe6767b610f0916ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/17958 @zsxwing @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17958 **[Test build #76844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76844/testReport)** for PR 17958 at commit [`e10101e`](https://github.com/apache/spark/commit/e10101eafe2329031d079977ab3f3e0aaee98908). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17959 **[Test build #76843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76843/testReport)** for PR 17959 at commit [`9ec86ec`](https://github.com/apache/spark/commit/9ec86ec1941bf0c329f4c6a1fb75271e91e51660). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17959 cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/17959 [SPARK-20718][SQL] FileSourceScanExec with different filter orders should be the same after canonicalization ## What changes were proposed in this pull request? Since `constraints` in `QueryPlan` is a set, the order of filters can differ. Usually this is ok because of canonicalization. However, in `FileSourceScanExec`, its data filters and partition filters are sequences, and their orders are not canonicalized. So `def sameResult` returns different results for different orders of data/partition filters. This leads to, e.g. different decision for `ReuseExchange`, and thus results in unstable performance. ## How was this patch tested? Added a new test for `FileSourceScanExec.sameResult`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark canonicalizeFileSourceScanExec Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17959.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17959 commit 9ec86ec1941bf0c329f4c6a1fb75271e91e51660 Author: wangzhenhuaDate: 2017-05-12T03:11:34Z same result for FileSourceScanExec with different filter orders --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17958: [SPARK-20716][SS] StateStore.abort() should not t...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/17958 [SPARK-20716][SS] StateStore.abort() should not throw exceptions ## What changes were proposed in this pull request? StateStore.abort() should do a best effort attempt to clean up temporary resources. It should not throw errors, especially because its called in a TaskCompletionListener, because this error could hide previous real errors in the task. ## How was this patch tested? No unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-20716 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17958.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17958 commit e10101eafe2329031d079977ab3f3e0aaee98908 Author: Tathagata DasDate: 2017-05-12T03:21:54Z Ignored exceptions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17887: [SPARK-20399][SQL] Add a config to fallback strin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17887 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17887 Thanks @cloud-fan @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17887 thanks, merging to master/2.2! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17954: [SPARK-20714] [SS] Fix match error when watermark is set...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/17954 @marmbrus @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116145813 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -127,13 +126,15 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { + if (value.equals("NaN") || +value.equals("Infinity") || +value.equals("+Infinity") || +value.equals("-Infinity")) { value.toFloat + } else if (value.equals("+INF") || value.equals("INF")) { --- End diff -- how about ``` parser.getText match { case "NaN" => Float.NaN case "+INF" | "INF" | "+Infinity" | "Infinity" => Float.PositiveInfinity case "-INF" | "-Infinity" => Float.NegativeInfinity case other => throw new RuntimeException(s"Cannot parse $other as FloatType.") } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17957 **[Test build #76842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76842/testReport)** for PR 17957 at commit [`4032940`](https://github.com/apache/spark/commit/40329404299ece70aef7ef245704978fb9d1e6f9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/17957 @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWi...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/17957 [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior ## What changes were proposed in this pull request? Timeout and state data are two independent entities and should be settable independently. Therefore, in the same call of the user-defined function, one should be able to set the timeout before initializing the state and also after removing the state. Whether timeouts can be set or not, should not depend on the current state, and vice versa. However, a limitation of the current implementation is that state cannot be null while timeout is set. This is checked lazily after the function call has completed. ## How was this patch tested? - Updated existing unit tests that test the behavior of GroupState.setTimeout*** wrt to the current state - Added new tests that verify the disallowed cases where state is undefined but timeout is set. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-20717 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17957.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17957 commit 3cfd446b0a926aad115acdf5bd8dc8cd03ee743a Author: Tathagata DasDate: 2017-05-12T02:31:15Z Tweaks to the mapGroupsWithState behavior commit 40329404299ece70aef7ef245704978fb9d1e6f9 Author: Tathagata Das Date: 2017-05-12T03:06:52Z More tweaks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17858 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76839/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17858 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17858 **[Test build #76839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76839/testReport)** for PR 17858 at commit [`639d63a`](https://github.com/apache/spark/commit/639d63a20e94523f3443bc83b272fc60c1f5627a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r116145107 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala --- @@ -20,16 +20,19 @@ package org.apache.spark.sql.hive.execution import scala.collection.JavaConverters._ import scala.util.Random +import _root_.test.org.apache.spark.sql.MyDoubleAvg +import _root_.test.org.apache.spark.sql.MyDoubleSum --- End diff -- `_root_`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17955 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76838/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17955 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17955 **[Test build #76838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76838/testReport)** for PR 17955 at commit [`e3da298`](https://github.com/apache/spark/commit/e3da298d59c764388ec6ca93ec23ba3eb8de96d3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17222#discussion_r116144809 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -491,20 +491,42 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends case 21 => register(name, udf.asInstanceOf[UDF20[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) case 22 => register(name, udf.asInstanceOf[UDF21[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) case 23 => register(name, udf.asInstanceOf[UDF22[_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType) -case n => logError(s"UDF class with ${n} type arguments is not supported ") +case n => + throw new IOException(s"UDF class with ${n} type arguments is not supported.") --- End diff -- why IOException? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116144531 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -127,13 +126,15 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { + if (value.equals("NaN") || --- End diff -- https://github.com/apache/spark/pull/9759#r63321521 > "infinity".toDouble, "inf".toDouble are not legal. These non-numeric numbers are case-sensitive, both for Jackson and Scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17956 cc @NathanHowell, @cloud-fan and @viirya. (I just want to note this will not change any input/output but just the exception type and avoid additional conversion try.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17956 **[Test build #76841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)** for PR 17956 at commit [`660a284`](https://github.com/apache/spark/commit/660a2843050d99b15ca7676ead8b8be4117267f1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17956 [SPARK-18772][SQL] Avoid unnecessary conversion try for special floats in JSON and add related tests ## What changes were proposed in this pull request? This PR is based on https://github.com/apache/spark/pull/16199 and extracts the valid change from https://github.com/apache/spark/pull/9759 to resolve SPARK-18772 It hardly changes input/output at all but just avoid additional conversion try with `toFloat` and `toDouble`. **Before** ```scala scala> import org.apache.spark.sql.types._ import org.apache.spark.sql.types._ scala> spark.read.schema(StructType(Seq(StructField("a", DoubleType.option("mode", "FAILFAST").json(Seq("""{"a": "nan"}""").toDS).show() 17/05/12 11:30:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.NumberFormatException: For input string: "nan" ... ``` **After** ```scala scala> import org.apache.spark.sql.types._ import org.apache.spark.sql.types._ scala> spark.read.schema(StructType(Seq(StructField("a", DoubleType.option("mode", "FAILFAST").json(Seq("""{"a": "nan"}""").toDS).show() 17/05/12 11:44:30 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.RuntimeException: Cannot parse nan as DoubleType. ... ``` ## How was this patch tested? Unit tests added in `JsonSuite`. Closes #16199 You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-18772 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17956.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17956 commit f83eec7a4c8fce223c5fd28b411fe2d1ae1da8dd Author: Nathan HowellDate: 2016-12-07T23:32:14Z [SPARK-18772][SQL] NaN/Infinite float parsing in JSON is inconsistent commit 660a2843050d99b15ca7676ead8b8be4117267f1 Author: hyukjinkwon Date: 2017-05-12T02:22:27Z Avoid unnecessary cast try for special floats in JSON and add related tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/17942#discussion_r116143769 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -115,26 +115,33 @@ private[spark] abstract class Task[T]( case t: Throwable => e.addSuppressed(t) } +context.markTaskCompleted(Some(e)) throw e } finally { - // Call the task completion callbacks. - context.markTaskCompleted() try { -Utils.tryLogNonFatalError { - // Release memory used by this thread for unrolling blocks - SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP) - SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.OFF_HEAP) - // Notify any tasks waiting for execution memory to be freed to wake up and try to - // acquire memory again. This makes impossible the scenario where a task sleeps forever - // because there are no other tasks left to notify it. Since this is safe to do but may - // not be strictly necessary, we should revisit whether we can remove this in the future. - val memoryManager = SparkEnv.get.memoryManager - memoryManager.synchronized { memoryManager.notifyAll() } -} +// Call the task completion callbacks. If "markTaskCompleted" is called twice, the second +// one is no-op. +context.markTaskCompleted(None) --- End diff -- @mridulm there is a `completed` flag in `markTaskCompleted`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 Hey guys. Just a quick update. I made good progress on implementing multi-version testing today, however it's not quite ready. I'm going to be on leave from tomorrow through the rest of next week, so I'm kind of doubtful I'll push anything new until May 22nd. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17711 I quickly brushed up the Optimizer code based on your advice: Using `Stack`: https://github.com/apache/spark/commit/a17d933aaad5c38a9d9ff20f978c1bad6c774fb1#diff-a1acb054bc376603ef510e6d0ee0R551 Using `tailrec`: https://github.com/apache/spark/compare/master...maropu:SPARK-19951-3#diff-a1acb054bc376603ef510e6d0ee0R552 I checked the spark style-guide and I probably think we'd better to use more readable one. So, `tailrec` is better? I'll submit the `tailrec` one after this merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17887 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76837/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17887 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17887: [SPARK-20399][SQL] Add a config to fallback string liter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17887 **[Test build #76837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76837/testReport)** for PR 17887 at commit [`375eb9c`](https://github.com/apache/spark/commit/375eb9cd747cc75d2f51da1dabe824dbbce52790). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17711 **[Test build #76840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76840/testReport)** for PR 17711 at commit [`089db30`](https://github.com/apache/spark/commit/089db30958d2d78b131ed10eea0b733a18056bf7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17942#discussion_r116143097 --- Diff: core/src/main/scala/org/apache/spark/util/taskListeners.scala --- @@ -55,14 +55,16 @@ class TaskCompletionListenerException( extends RuntimeException { override def getMessage: String = { -if (errorMessages.size == 1) { --- End diff -- It's a common pattern in scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17711#discussion_r116142572 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -32,3 +32,11 @@ select 1 - 2; select 2 * 5; select 5 % 3; select pmod(-7, 3); + +-- check operator precedence --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...
Github user sarutak commented on the issue: https://github.com/apache/spark/pull/14719 @HyukjinKwon Thanks for pinging me! I still think this issue should be fixed but I didn't notice @nsyca's last comment. I'll consider the problem which he mentioned soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16697 `onDropEvent` is invoked for every dropped event : not just the first. If all you need is a way to find out what the dropped events were - simply enable trace logging for the class after addition of the log line. With everything else being the same, you will see TRACE log message with all the dropped events. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17906: [SPARK-20665][SQL]"Bround" and "Round" function return N...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/17906 @cloud-fan Spark 2.0 and Spark 2.1 have the same issue. I have updated the affected versions in the JIRA. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13621: [SPARK-10408] [ML] Implement stacked autoencoder
Github user JeremyNixon commented on the issue: https://github.com/apache/spark/pull/13621 I ran the Keras experiment with code up at [[GitHub link] ](https://github.com/JeremyNixon/autoencoder) if anyone wants to build on this or replicate it. Running Sethâs example on the training data set, I was able to get the results below. ![screen shot 2017-05-11 at 10 08 37 pm](https://cloud.githubusercontent.com/assets/4738024/25979615/9567a8bc-3697-11e7-81ed-be3fd073f4c5.png) I agree that we should add modern activation functions. More importantly, we should add improved optimizers and a modular API to make this valuable to real users. Iâm going to do a code review here and at scalable-deeplearning in the next few days regardless of the decision we make around this. I think that these improvements (activation functions, optimizers) should be a part of a flexible modular library if we want to give users a modern experience. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17869: [SPARK-20609][CORE]Run the SortShuffleSuite unit tests h...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/17869 @HyukjinKwon I would like to suggest that modify ALSCleanerSuite with another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17948 @cloud-fan Could you check this? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17952: [SPARK-20705][WEB-UI]The sort function can not be used i...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/17952 Obviously you said it was right. I have modified as requested. I have been manually tested, it is ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116139174 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,20 @@ object Matrices { new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose) case sm: BSM[Double] => // There is no isTranspose flag for sparse matrices in Breeze -new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) + +// Some Breeze CSCMatrices may have extra trailing zeros in +// .rowIndices and .data, which are added after some matrix +// operations for efficiency. +// +// Therefore the last element of sm.colPtrs would no longer be +// coherent with the size of sm.rowIndices and sm.data +// despite sm being a valid CSCMatrix. +// We need to truncate both arrays (rowIndices, data) +// to the real size of the vector sm.activeSize to allow valid conversion + +val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize) +val truncData = sm.data.slice(0, sm.activeSize) --- End diff -- This is the same as calling compact(). But the good thing is that it won't impact the original matrix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116139610 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala --- @@ -46,6 +46,26 @@ class MatricesSuite extends SparkFunSuite { } } + test("Test Breeze Conversion Bug - SPARK-20687") { --- End diff -- specific name: Test FromBreeze when Breeze.CSCMatrix.rowIndices has trailing zeros. And move the test after another unit test "fromBreeze with sparse matrix" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116139038 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,20 @@ object Matrices { new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose) case sm: BSM[Double] => // There is no isTranspose flag for sparse matrices in Breeze -new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) + +// Some Breeze CSCMatrices may have extra trailing zeros in +// .rowIndices and .data, which are added after some matrix +// operations for efficiency. +// +// Therefore the last element of sm.colPtrs would no longer be +// coherent with the size of sm.rowIndices and sm.data +// despite sm being a valid CSCMatrix. +// We need to truncate both arrays (rowIndices, data) +// to the real size of the vector sm.activeSize to allow valid conversion + --- End diff -- Maybe we can add some if else here, since slice will copy the array and often that's not needed. Please refer to https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/linalg/CSCMatrix.scala#L130 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org