[GitHub] spark issue #17931: [SPARK-12837][SPARK-20666][CORE][FOLLOWUP] getting name ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17931 at the time we received `SQLMetrics` in `SQLListener` with task end event, the registered accumulator may already be GCed, then there is no way to retrieve the accumulator names, except we sending accumulator names to executor side, so that when executor can send back accumulators to driver side with names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17961: [SPARK-20720][WEB-UI]'Executor Summary' should show the ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17961 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17961: [SPARK-20720][WEB-UI]'Executor Summary' should sh...
GitHub user guoxiaolongzte opened a pull request: https://github.com/apache/spark/pull/17961 [SPARK-20720][WEB-UI]'Executor Summary' should show the exact number, 'Removed Executors' should display the specific number, in the Application Page ## What changes were proposed in this pull request? When the number of spark worker executors is large, if the specific number is displayed, will better help us to analyze and observe by spark ui. Although this is a small improvement, but it is indeed very valuable. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guoxiaolongzte/spark SPARK-20720 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17961.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17961 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17959 @gatorsmile Right, thanks for pointing this out! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/17959#discussion_r116162790 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -519,8 +519,18 @@ case class FileSourceScanExec( relation, output.map(QueryPlan.normalizeExprId(_, output)), requiredSchema, - partitionFilters.map(QueryPlan.normalizeExprId(_, output)), - dataFilters.map(QueryPlan.normalizeExprId(_, output)), + canonicalizeFilters(partitionFilters, output), + canonicalizeFilters(dataFilters, output), None) } + + private def canonicalizeFilters(filters: Seq[Expression], output: Seq[Attribute]) --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17956 This PR description is misleading. This PR is actually a bug fix, I think --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16985#discussion_r116162386 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala --- @@ -315,8 +317,14 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils { assert( joinOperator.right.find(_.isInstanceOf[SortExec]).isDefined == sortRight, s"expected sort in the right child to be $sortRight but found\n${joinOperator.right}") + +if (expectedResult.isDefined) { + checkAnswer(joined, expectedResult.get) --- End diff -- This will let us validate against some predefined expected output. L296 will recompute the result with the exact same `joinCondition`... but for my test case I don't want to reuse the same `joinCondition` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17948 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116161956 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,32 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { + !child.output.exists(a => resolver(a.name, attrName)) +} + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { case agg @ Aggregate(groups, aggs, child) if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedAttribute]) => -// This is a strict check though, we put this to apply the rule only in alias expressions -def notResolvableByChild(attrName: String): Boolean = - !child.output.exists(a => resolver(a.name, attrName)) -agg.copy(groupingExpressions = groups.map { - case u: UnresolvedAttribute if notResolvableByChild(u.name) => +groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { _.transform { +case u: UnresolvedAttribute if notResolvableByChild(u.name, child) => + aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u) + } +}) + + case gs @ GroupingSets(selectedGroups, groups, child, aggs) + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && +(selectedGroups :+ groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) => --- End diff -- Oh. nvm. It is of course. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116161811 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,32 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { + !child.output.exists(a => resolver(a.name, attrName)) +} + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { case agg @ Aggregate(groups, aggs, child) if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedAttribute]) => -// This is a strict check though, we put this to apply the rule only in alias expressions -def notResolvableByChild(attrName: String): Boolean = - !child.output.exists(a => resolver(a.name, attrName)) -agg.copy(groupingExpressions = groups.map { - case u: UnresolvedAttribute if notResolvableByChild(u.name) => +groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { _.transform { +case u: UnresolvedAttribute if notResolvableByChild(u.name, child) => + aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u) + } +}) + + case gs @ GroupingSets(selectedGroups, groups, child, aggs) + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && +(selectedGroups :+ groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) => --- End diff -- Are we sure that grouping expressions are all pure attributes? If not, this check might fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116161668 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -126,16 +125,11 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. - val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { -value.toFloat --- End diff -- This PR changes the behavior, right? Does your test case pass without the code changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116161266 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -126,16 +125,11 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. - val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { -value.toFloat --- End diff -- Probably let me check and open JIRA/PR if there are some cases we should handle when I have some time. Let's leave out that here. It sounds that does not block this PR and I don't want extra changes do not hold off this PR like https://github.com/apache/spark/pull/17901. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17644 **[Test build #76852 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76852/testReport)** for PR 17644 at commit [`239beee`](https://github.com/apache/spark/commit/239beee4f74488445c4e60c5aa17f54004ce9658). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17901: [SPARK-20639][SQL] Add single argument support fo...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17901#discussion_r116160628 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2683,13 +2683,12 @@ object functions { def unix_timestamp(s: Column, p: String): Column = withExpr { UnixTimestamp(s.expr, Literal(p)) } /** - * Convert time string to a Unix timestamp (in seconds). - * Uses the pattern "-MM-dd HH:mm:ss" and will return null on failure. + * Convert time string to a Unix timestamp (in seconds) by casting rules to `TimestampType`. * @group datetime_funcs * @since 2.2.0 */ def to_timestamp(s: Column): Column = withExpr { -new ParseToTimestamp(s.expr, Literal("-MM-dd HH:mm:ss")) --- End diff -- @rxin and @cloud-fan, I would rather take out the change here if this holds off this PR. This is essentially orthogonal with this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116160516 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -126,16 +125,11 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. - val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { -value.toFloat --- End diff -- If possible, we want to be consistent with the others. Could you check Pandas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116160371 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -126,16 +125,11 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. - val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { -value.toFloat --- End diff -- Is JSON format itself related with CSV and Pandas? Please see the discussion in https://github.com/apache/spark/pull/9759#r63321521. Also, this does not change the existing behaviour. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116160121 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -126,16 +125,11 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. - val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { -value.toFloat --- End diff -- Not sure whether we should follow it. How about our CSV parsers? How about the Pandas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116159902 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,32 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { + !child.output.exists(a => resolver(a.name, attrName)) +} + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { case agg @ Aggregate(groups, aggs, child) if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedAttribute]) => -// This is a strict check though, we put this to apply the rule only in alias expressions -def notResolvableByChild(attrName: String): Boolean = - !child.output.exists(a => resolver(a.name, attrName)) -agg.copy(groupingExpressions = groups.map { - case u: UnresolvedAttribute if notResolvableByChild(u.name) => +groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { _.transform { +case u: UnresolvedAttribute if notResolvableByChild(u.name, child) => + aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u) + } +}) + + case gs @ GroupingSets(selectedGroups, groups, child, aggs) + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && +(selectedGroups :+ groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) => --- End diff -- Aha, I see. It looks reasonable. I'll update. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17956 **[Test build #76851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76851/testReport)** for PR 17956 at commit [`e858789`](https://github.com/apache/spark/commit/e85878917d6ff1fe107efc59bb1d403401a2441f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116159720 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -126,16 +125,11 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. - val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { -value.toFloat --- End diff -- For input/output, it is not a bug. Please see https://github.com/apache/spark/pull/17956#discussion_r116144531 and the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116159435 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -126,16 +125,11 @@ class JacksonParser( case VALUE_STRING => // Special case handling for NaN and Infinity. - val value = parser.getText - val lowerCaseValue = value.toLowerCase(Locale.ROOT) - if (lowerCaseValue.equals("nan") || -lowerCaseValue.equals("infinity") || -lowerCaseValue.equals("-infinity") || -lowerCaseValue.equals("inf") || -lowerCaseValue.equals("-inf")) { -value.toFloat --- End diff -- This sounds a bug fix, because `toFloat` is case sensitive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116159393 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { assert(errMsg.startsWith("The field for corrupt records must be string type and nullable")) } } + + test("SPARK-18772: Parse special floats correctly") { +// positive cases +val jsons = Seq( + """{"a": "-INF"}""", + """{"a": "INF"}""", + """{"a": "-INF"}""", --- End diff -- Yes ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17711 **[Test build #76850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76850/testReport)** for PR 17711 at commit [`de89791`](https://github.com/apache/spark/commit/de89791d2f531c966571ce8a37049f159856e38d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116159338 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,32 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { + !child.output.exists(a => resolver(a.name, attrName)) +} + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { case agg @ Aggregate(groups, aggs, child) if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedAttribute]) => -// This is a strict check though, we put this to apply the rule only in alias expressions -def notResolvableByChild(attrName: String): Boolean = - !child.output.exists(a => resolver(a.name, attrName)) -agg.copy(groupingExpressions = groups.map { - case u: UnresolvedAttribute if notResolvableByChild(u.name) => +groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { _.transform { +case u: UnresolvedAttribute if notResolvableByChild(u.name, child) => + aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u) + } +}) + + case gs @ GroupingSets(selectedGroups, groups, child, aggs) + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && +(selectedGroups :+ groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) => --- End diff -- oh, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116159278 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { assert(errMsg.startsWith("The field for corrupt records must be string type and nullable")) } } + + test("SPARK-18772: Parse special floats correctly") { +// positive cases +val jsons = Seq( + """{"a": "-INF"}""", + """{"a": "INF"}""", + """{"a": "-INF"}""", --- End diff -- Do we want to test `+INF`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116159220 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { assert(errMsg.startsWith("The field for corrupt records must be string type and nullable")) } } + + test("SPARK-18772: Parse special floats correctly") { +// positive cases +val jsons = Seq( + """{"a": "-INF"}""", + """{"a": "INF"}""", + """{"a": "-INF"}""", --- End diff -- What is difference between line 1995 and line 1997? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17711#discussion_r116159196 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -32,3 +32,11 @@ select 1 - 2; select 2 * 5; select 5 % 3; select pmod(-7, 3); + +-- check operator precedence (We follow Oracle operator precedence: https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691) --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17711 @rxin ok, thank for the suggestion! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17308#discussion_r116158469 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala --- @@ -30,14 +30,19 @@ private[kafka010] class KafkaSink( @volatile private var latestBatchId = -1L override def toString(): String = "KafkaSink" + private val kafkaParams = new ju.HashMap[String, Object](executorKafkaParams) override def addBatch(batchId: Long, data: DataFrame): Unit = { if (batchId <= latestBatchId) { logInfo(s"Skipping already committed batch $batchId") } else { KafkaWriter.write(sqlContext.sparkSession, -data.queryExecution, executorKafkaParams, topic) +data.queryExecution, kafkaParams, topic) latestBatchId = batchId } } + + override def stop(): Unit = { +CachedKafkaProducer.close(kafkaParams) --- End diff -- That's correct, I have understood, close requires a bit of rethinking, I am unable to see a straight forward way of doing it. If you agree, close related implementation can be taken out from this PR and be taken up in a separate JIRA and PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17945: [SPARK-20704][SPARKR] change CRAN test to run sin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17945 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16985#discussion_r116158389 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ReorderJoinPredicates.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.joins + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.SparkPlan + +/** + * When the physical operators are created for JOIN, the ordering of join keys is based on order + * in which the join keys appear in the user query. That might not match with the output + * partitioning of the join node's children (thus leading to extra sort / shuffle being + * introduced). This rule will change the ordering of the join keys to match with the + * partitioning of the join nodes' children. + */ +class ReorderJoinPredicates extends Rule[SparkPlan] { + private def reorderJoinKeys( + leftKeys: Seq[Expression], + rightKeys: Seq[Expression], + leftPartitioning: Partitioning, + rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) = { + +def reorder(expectedOrderOfKeys: Seq[Expression], +currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) = { + val leftKeysBuffer = ArrayBuffer[Expression]() + val rightKeysBuffer = ArrayBuffer[Expression]() + + expectedOrderOfKeys.foreach(expression => { +val index = currentOrderOfKeys.indexWhere(e => e.semanticEquals(expression)) +leftKeysBuffer.append(leftKeys(index)) +rightKeysBuffer.append(rightKeys(index)) + }) + (leftKeysBuffer, rightKeysBuffer) +} + +if (leftKeys.forall(_.deterministic) && rightKeys.forall(_.deterministic)) { + leftPartitioning match { +case HashPartitioning(leftExpressions, _) + if leftExpressions.length == leftKeys.length && --- End diff -- why do we need the same length? Let's say the child partitioning is `a, b, c, d` and the join key is `b, a`, we can reorder the join key to avoid shuffle, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17958 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76844/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17945: [SPARK-20704][SPARKR] change CRAN test to run single thr...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17945 merged to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17958 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17711 I feel both are pretty complicated. Can we just do something similar to CombineUnion: ``` /** * Combines all adjacent [[Union]] operators into a single [[Union]]. */ object CombineUnions extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformDown { case u: Union => flattenUnion(u, false) case Distinct(u: Union) => Distinct(flattenUnion(u, true)) } private def flattenUnion(union: Union, flattenDistinct: Boolean): Union = { val stack = mutable.Stack[LogicalPlan](union) val flattened = mutable.ArrayBuffer.empty[LogicalPlan] while (stack.nonEmpty) { stack.pop() match { case Distinct(Union(children)) if flattenDistinct => stack.pushAll(children.reverse) case Union(children) => stack.pushAll(children.reverse) case child => flattened += child } } Union(flattened) } }``` It's going to be simpler because you don't need to handle distinct here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17958 **[Test build #76844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76844/testReport)** for PR 17958 at commit [`e10101e`](https://github.com/apache/spark/commit/e10101eafe2329031d079977ab3f3e0aaee98908). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16985#discussion_r116158180 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ReorderJoinPredicates.scala --- @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.joins + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.catalyst.expressions.Expression +import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.SparkPlan + +/** + * When the physical operators are created for JOIN, the ordering of join keys is based on order + * in which the join keys appear in the user query. That might not match with the output + * partitioning of the join node's children (thus leading to extra sort / shuffle being + * introduced). This rule will change the ordering of the join keys to match with the + * partitioning of the join nodes' children. + */ +class ReorderJoinPredicates extends Rule[SparkPlan] { + private def reorderJoinKeys( + leftKeys: Seq[Expression], + rightKeys: Seq[Expression], + leftPartitioning: Partitioning, + rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) = { + +def reorder(expectedOrderOfKeys: Seq[Expression], --- End diff -- nit: ``` def reorder( expectedOrderOfKeys: Seq[Expression], currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17945: [SPARK-20704][SPARKR] change CRAN test to run single thr...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17945 AppVeyor was failing because of the `SparkListenerBus` issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16985#discussion_r116158076 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala --- @@ -315,8 +317,14 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils { assert( joinOperator.right.find(_.isInstanceOf[SortExec]).isDefined == sortRight, s"expected sort in the right child to be $sortRight but found\n${joinOperator.right}") + +if (expectedResult.isDefined) { + checkAnswer(joined, expectedResult.get) --- End diff -- In L296 we already have an `assert` to check the result, do we really need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17644#discussion_r116157920 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -871,6 +886,23 @@ private[hive] object HiveClientImpl { hiveTable.setViewOriginalText(t) hiveTable.setViewExpandedText(t) } + +table.bucketSpec match { + case Some(bucketSpec) => +hiveTable.setNumBuckets(bucketSpec.numBuckets) +hiveTable.setBucketCols(bucketSpec.bucketColumnNames.toList.asJava) --- End diff -- Got it. Adding a check if the table is hive and not data source table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17956 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76846/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17956 **[Test build #76846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76846/testReport)** for PR 17956 at commit [`aa7c658`](https://github.com/apache/spark/commit/aa7c6580a733c0d964cabd1fcabf1f2730227f10). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17958 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17958 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76845/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17958 **[Test build #76845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76845/testReport)** for PR 17958 at commit [`1e36134`](https://github.com/apache/spark/commit/1e361344101ccaf3d8a9ddebe6767b610f0916ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17953: [SPARK-20680][SQL] Spark-sql do not support for v...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17953#discussion_r116157223 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1504,6 +1504,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { case ("decimal", precision :: Nil) => DecimalType(precision.getText.toInt, 0) case ("decimal", precision :: scale :: Nil) => DecimalType(precision.getText.toInt, scale.getText.toInt) + case ("void", Nil) => NullType --- End diff -- Hive 2.x disables it. Could you add some test cases by reading and writing the tables? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17959 merged to master/2.2, please send a follow-up PR to address @gatorsmile 's comments, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17711#discussion_r116156736 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -32,3 +32,11 @@ select 1 - 2; select 2 * 5; select 5 % 3; select pmod(-7, 3); + +-- check operator precedence (We follow Oracle operator precedence: https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691) --- End diff -- ok! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17711 @maropu The solution using `tailrec` looks more straightforward. Could you submit the PR based on that? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17711#discussion_r116156556 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -32,3 +32,11 @@ select 1 - 2; select 2 * 5; select 5 % 3; select pmod(-7, 3); + +-- check operator precedence (We follow Oracle operator precedence: https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691) --- End diff -- The link could be ineffective in the future. Could you also copy the table contents here? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17959 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17959 How about `HiveTableScanExec`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17959#discussion_r116156087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -519,8 +519,18 @@ case class FileSourceScanExec( relation, output.map(QueryPlan.normalizeExprId(_, output)), requiredSchema, - partitionFilters.map(QueryPlan.normalizeExprId(_, output)), - dataFilters.map(QueryPlan.normalizeExprId(_, output)), + canonicalizeFilters(partitionFilters, output), + canonicalizeFilters(dataFilters, output), None) } + + private def canonicalizeFilters(filters: Seq[Expression], output: Seq[Attribute]) --- End diff -- Add a function description? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116155996 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,32 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { + !child.output.exists(a => resolver(a.name, attrName)) +} + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { case agg @ Aggregate(groups, aggs, child) if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedAttribute]) => -// This is a strict check though, we put this to apply the rule only in alias expressions -def notResolvableByChild(attrName: String): Boolean = - !child.output.exists(a => resolver(a.name, attrName)) -agg.copy(groupingExpressions = groups.map { - case u: UnresolvedAttribute if notResolvableByChild(u.name) => +groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { _.transform { +case u: UnresolvedAttribute if notResolvableByChild(u.name, child) => + aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u) + } +}) + + case gs @ GroupingSets(selectedGroups, groups, child, aggs) + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && +(selectedGroups :+ groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) => +def mayResolveAttrByAggregateExprs(exprs: Seq[Expression]): Seq[Expression] = exprs.map { --- End diff -- I think we should do `exprs.map { _.transform { ...` like above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14582 We could ask it to mailing-list if you strongly feel about this. For example, `from_json` function was also asked too to mailing list before getting merged. I think we should not add all the variants just for consistency and this is why I asked more interests. There are many variants for language-specific and application-specific and I usually stay against if there is an easy workaround and looks a kind of variant. I wouldn't stay against if there are more demands or interests for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116155780 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,32 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { + !child.output.exists(a => resolver(a.name, attrName)) +} + override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { case agg @ Aggregate(groups, aggs, child) if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedAttribute]) => -// This is a strict check though, we put this to apply the rule only in alias expressions -def notResolvableByChild(attrName: String): Boolean = - !child.output.exists(a => resolver(a.name, attrName)) -agg.copy(groupingExpressions = groups.map { - case u: UnresolvedAttribute if notResolvableByChild(u.name) => +groups.exists(!_.resolved) => +agg.copy(groupingExpressions = groups.map { _.transform { +case u: UnresolvedAttribute if notResolvableByChild(u.name, child) => + aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u) + } +}) + + case gs @ GroupingSets(selectedGroups, groups, child, aggs) + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && +(selectedGroups :+ groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) => --- End diff -- `groups` should cover `selectedGroups`. So we may not need to add `selectedGroups` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17948 LGTM too. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...
Github user ghoto commented on a diff in the pull request: https://github.com/apache/spark/pull/17940#discussion_r116155652 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala --- @@ -992,7 +992,20 @@ object Matrices { new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose) case sm: BSM[Double] => // There is no isTranspose flag for sparse matrices in Breeze -new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data) + +// Some Breeze CSCMatrices may have extra trailing zeros in +// .rowIndices and .data, which are added after some matrix +// operations for efficiency. +// +// Therefore the last element of sm.colPtrs would no longer be +// coherent with the size of sm.rowIndices and sm.data +// despite sm being a valid CSCMatrix. +// We need to truncate both arrays (rowIndices, data) +// to the real size of the vector sm.activeSize to allow valid conversion + +val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize) +val truncData = sm.data.slice(0, sm.activeSize) --- End diff -- I'm implementing both suggestions, however, wouldn't be the sm.copy more expensive than just doing those 2 slices? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76843/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17959 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17959 **[Test build #76843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76843/testReport)** for PR 17959 at commit [`9ec86ec`](https://github.com/apache/spark/commit/9ec86ec1941bf0c329f4c6a1fb75271e91e51660). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait DataSourceScanExec extends LeafExecNode with CodegenSupport with PredicateHelper ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17957 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17957 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76842/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17957 **[Test build #76842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76842/testReport)** for PR 17957 at commit [`4032940`](https://github.com/apache/spark/commit/40329404299ece70aef7ef245704978fb9d1e6f9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/16985 @cloud-fan : I have made suggested change(s). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17956 LGTM except for a minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17956#discussion_r116154314 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { assert(errMsg.startsWith("The field for corrupt records must be string type and nullable")) } } + + test("SPARK-18772: Parse special floats correctly") { +// positive cases +val jsons = Seq( + """{"a": "-INF"}""", + """{"a": "INF"}""", + """{"a": "-INF"}""", + """{"a": "NaN"}""", + """{"a": "Infinity"}""", + """{"a": "+Infinity"}""", + """{"a": "-Infinity"}""") + +val checks: Seq[Double => Boolean] = Seq( + _.isNegInfinity, + _.isPosInfinity, + _.isNegInfinity, + _.isNaN, + _.isPosInfinity, + _.isPosInfinity, + _.isNegInfinity) + +Seq(FloatType, DoubleType).foreach { dt => + jsons.zip(checks).foreach { case (json, check) => +val ds = spark.read + .schema(StructType(Seq(StructField("a", dt + .json(Seq(json).toDS()) + .select($"a".cast(DoubleType)).as[Double] +assert(check(ds.first())) + } +} + +// negative case +Seq(FloatType, DoubleType).foreach { dt => + val e = intercept[SparkException] { +spark.read + .option("mode", "FAILFAST") + .schema(StructType(Seq(StructField("a", dt + .json(Seq( """{"a": "nan"}""").toDS()) --- End diff -- Shall we also test other negative cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17052 Yea, I just wanted to check if it is in progress in any way. Thanks for your input. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...
Github user lalinsky commented on the issue: https://github.com/apache/spark/pull/14582 You were asking for more interest in the feature, there was no way I could answer that. :) Regarding the change itself, the system can already auto cast integer to a timestamp, but not a floating point number. Floating number timestamps are pretty common in a Python ecosystem, more so than integer ones. From my point of view, that's an inconsistent and surprising behavior, that I wanted to correct. I wouldn't send the patch if it didn't work for any number, but having it done for just one number type seemed wrong to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17711 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17711 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76840/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17711 **[Test build #76840 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76840/testReport)** for PR 17711 at commit [`089db30`](https://github.com/apache/spark/commit/089db30958d2d78b131ed10eea0b733a18056bf7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17052 @HyukjinKwon Sorry! Busy for this period of time. Let me resolve this conflict. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17956 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76841/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17956 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17956 **[Test build #76841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)** for PR 17956 at commit [`660a284`](https://github.com/apache/spark/commit/660a2843050d99b15ca7676ead8b8be4117267f1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17935: [SPARK-20690][SQL] Analyzer shouldn't add missing...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17935#discussion_r116153501 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -868,6 +868,29 @@ class SubquerySuite extends QueryTest with SharedSQLContext { Row(3, 3.0, 2, 3.0) :: Row(3, 3.0, 2, 3.0) :: Nil) } + test("SPARK-20690: Do not add missing attributes through subqueries") { +withTempView("onerow") { + Seq(1).toDF("c1").createOrReplaceTempView("onerow") + + val e = intercept[AnalysisException] { +sql( + """ +| select 1 +| from (select 1 from onerow t1 LIMIT 1) --- End diff -- I'm surprised we support this syntax, I think the FROM clause must have an alias. I checked with postgres, it will throw exception `subquery in FROM must have an alias`, can you check with other databases? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17948 **[Test build #76849 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76849/testReport)** for PR 17948 at commit [`a809274`](https://github.com/apache/spark/commit/a8092742b99c9d43b04b4a4941345f179996a50f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17960#discussion_r116153263 --- Diff: sql/core/src/test/resources/sql-tests/inputs/limit.sql --- @@ -1,23 +1,27 @@ -- limit on various data types -select * from testdata limit 2; -select * from arraydata limit 2; -select * from mapdata limit 2; +SELECT * FROM testdata LIMIT 2; --- End diff -- I just wonder why these should be upper-cased just for curiosity. Is this way preferred? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116153034 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,31 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = + !child.output.exists(a => resolver(a.name, attrName)) --- End diff -- Thanks! Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17644#discussion_r116152814 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -871,6 +886,23 @@ private[hive] object HiveClientImpl { hiveTable.setViewOriginalText(t) hiveTable.setViewExpandedText(t) } + +table.bucketSpec match { + case Some(bucketSpec) => +hiveTable.setNumBuckets(bucketSpec.numBuckets) +hiveTable.setBucketCols(bucketSpec.bucketColumnNames.toList.asJava) --- End diff -- For data source table, which can be created by `CREATE TABLE src(...) USING parquet ...`, the bucketing information is in table properties, and hive will always read this table as a non-bucketed table. After your PR, for bucketed data source tables written by Spark, Hive will read them as bucketed tables and cause problems, because the bucket hashing function is different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116152643 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,31 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only if the expression is not +// resolvable by child. +private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = + !child.output.exists(a => resolver(a.name, attrName)) --- End diff -- Nit: style ```Scala private def notResolvableByChild(attrName: String, child: LogicalPlan): Boolean = { !child.output.exists(a => resolver(a.name, attrName)) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17959 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17948 **[Test build #76848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76848/testReport)** for PR 17948 at commit [`0163656`](https://github.com/apache/spark/commit/0163656b8e5325cda7b80e0c0268c24608e9b871). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17960: [SPARK-20719] [SQL] Support LIMIT ALL
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17960 **[Test build #76847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76847/testReport)** for PR 17960 at commit [`b4a4b0a`](https://github.com/apache/spark/commit/b4a4b0aee836cd6f8944716bc84323487527fc19). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17960 [SPARK-20719] [SQL] Support LIMIT ALL ### What changes were proposed in this pull request? `LIMIT ALL` is the same as omitting the `LIMIT` clause. It is supported by both PrestgreSQL and Presto. This PR is to support it by adding it in the parser. ### How was this patch tested? Added a test case You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark LimitAll Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17960.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17960 commit b4a4b0aee836cd6f8944716bc84323487527fc19 Author: Xiao Li Date: 2017-05-12T04:34:37Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116151791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,30 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only in alias expressions --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17948 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17948#discussion_r116150714 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1003,18 +1003,30 @@ class Analyzer( */ object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] { +// This is a strict check though, we put this to apply the rule only in alias expressions --- End diff -- `... only if the expression is not resolvable by child` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17936: [SPARK-20638][Core][WIP]Optimize the CartesianRDD to red...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17936 The cluster test result. The `RDD.cartesian` is used in Spark mllib ALS algorithm, and compared with the latest spark master branch. Environments: Spark on Yarn with 9 executors(10 cores & 30 GB Mem) on three nodes. Test Data: The Data: User 480,000, and Item 17,000. Test Case: ``` object TestNetflixlib { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("Test Netflix mlib") val sc = new SparkContext(conf) val data = sc.textFile("hdfs://10.1.2.173:9000/nf_training_set.txt") val ratings = data.map(_.split("::") match { case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble) }) val rank = 0 val numIterations = 10 val train_start = System.nanoTime() val model = ALS.train(ratings, rank, numIterations, 0.01) val training_time = (System.nanoTime() - train_start)/ 1e9 println(s"Training time(s): $training_time") val rec_start = System.nanoTime() val userRec = model.recommendProductsForUsers(20) println(userRec.count()) val rec_time = (System.nanoTime() - rec_start) / 1e9 println(s"Recommend time(s): $rec_time") } } ``` Test Results: | Master Branch | Improved Branch | Percentage of ascension | | --| -- | -- | | 139.934s | 162.597s | 16 % | | 148.138s | 157.597s | 6% | | 157.899s | 189.580s | 20% | | 135.520s | 152.486s | 13% | | 166.101s | 184.485s | 11 % | --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16697 Yes. If inside, you are right - only the first will be logged ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/16697 you mean outside of https://github.com/apache/spark/pull/16697/files#diff-ca0fe05a42fd5edcab8a1bdaa8e58db9R210? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17942#discussion_r116148995 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -115,26 +115,33 @@ private[spark] abstract class Task[T]( case t: Throwable => e.addSuppressed(t) } +context.markTaskCompleted(Some(e)) throw e } finally { - // Call the task completion callbacks. - context.markTaskCompleted() try { -Utils.tryLogNonFatalError { - // Release memory used by this thread for unrolling blocks - SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP) - SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.OFF_HEAP) - // Notify any tasks waiting for execution memory to be freed to wake up and try to - // acquire memory again. This makes impossible the scenario where a task sleeps forever - // because there are no other tasks left to notify it. Since this is safe to do but may - // not be strictly necessary, we should revisit whether we can remove this in the future. - val memoryManager = SparkEnv.get.memoryManager - memoryManager.synchronized { memoryManager.notifyAll() } -} +// Call the task completion callbacks. If "markTaskCompleted" is called twice, the second +// one is no-op. --- End diff -- Missed this comment. LGTM. Thanks for clarifying @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17942#discussion_r116148841 --- Diff: core/src/main/scala/org/apache/spark/util/taskListeners.scala --- @@ -55,14 +55,16 @@ class TaskCompletionListenerException( extends RuntimeException { override def getMessage: String = { -if (errorMessages.size == 1) { --- End diff -- Thx for clarifying ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17956 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17906: [SPARK-20665][SQL]"Bround" and "Round" function return N...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17906 thanks, merging to master/2.2/2.1/2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17906: [SPARK-20665][SQL]"Bround" and "Round" function r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17906 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17959 also cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17958 **[Test build #76845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76845/testReport)** for PR 17958 at commit [`1e36134`](https://github.com/apache/spark/commit/1e361344101ccaf3d8a9ddebe6767b610f0916ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org