date:20170511

[GitHub] spark issue #17931: [SPARK-12837][SPARK-20666][CORE][FOLLOWUP] getting name ...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17931
  
at the time we received `SQLMetrics` in `SQLListener` with task end event, 
the registered accumulator may already be GCed, then there is no way to 
retrieve the accumulator names, except we sending accumulator names to executor 
side, so that when executor can send back accumulators to driver side with 
names.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17961: [SPARK-20720][WEB-UI]'Executor Summary' should show the ...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17961
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17961: [SPARK-20720][WEB-UI]'Executor Summary' should sh...

2017-05-11 Thread guoxiaolongzte

GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/17961

[SPARK-20720][WEB-UI]'Executor Summary' should show the exact number, 
'Removed Executors' should display the specific number, in the Application Page

## What changes were proposed in this pull request?

When the number of spark worker executors is large, if the specific number 
is displayed, will better help us to analyze and observe by spark ui.

Although this is a small improvement, but it is indeed very valuable.

## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-20720

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17961






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17959
  
@gatorsmile Right, thanks for pointing this out!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...

2017-05-11 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17959#discussion_r116162790
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -519,8 +519,18 @@ case class FileSourceScanExec(
   relation,
   output.map(QueryPlan.normalizeExprId(_, output)),
   requiredSchema,
-  partitionFilters.map(QueryPlan.normalizeExprId(_, output)),
-  dataFilters.map(QueryPlan.normalizeExprId(_, output)),
+  canonicalizeFilters(partitionFilters, output),
+  canonicalizeFilters(dataFilters, output),
   None)
   }
+
+  private def canonicalizeFilters(filters: Seq[Expression], output: 
Seq[Attribute])
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17956
  
This PR description is misleading. This PR is actually a bug fix, I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...

2017-05-11 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16985#discussion_r116162386
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -315,8 +317,14 @@ abstract class BucketedReadSuite extends QueryTest 
with SQLTestUtils {
 assert(
   joinOperator.right.find(_.isInstanceOf[SortExec]).isDefined == 
sortRight,
   s"expected sort in the right child to be $sortRight but 
found\n${joinOperator.right}")
+
+if (expectedResult.isDefined) {
+  checkAnswer(joined, expectedResult.get)
--- End diff --

This will let us validate against some predefined expected output. 
L296 will recompute the result with the exact same `joinCondition`... but 
for my test case I don't want to reuse the same `joinCondition`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17948
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116161956
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,32 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean = {
+  !child.output.exists(a => resolver(a.name, attrName))
+}
+
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperators {
   case agg @ Aggregate(groups, aggs, child)
   if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedAttribute]) =>
-// This is a strict check though, we put this to apply the rule 
only in alias expressions
-def notResolvableByChild(attrName: String): Boolean =
-  !child.output.exists(a => resolver(a.name, attrName))
-agg.copy(groupingExpressions = groups.map {
-  case u: UnresolvedAttribute if notResolvableByChild(u.name) =>
+groups.exists(!_.resolved) =>
+agg.copy(groupingExpressions = groups.map { _.transform {
+case u: UnresolvedAttribute if notResolvableByChild(u.name, 
child) =>
+  aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u)
+  }
+})
+
+  case gs @ GroupingSets(selectedGroups, groups, child, aggs)
+  if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
+(selectedGroups :+ 
groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) =>
--- End diff --

Oh. nvm. It is of course.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116161811
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,32 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean = {
+  !child.output.exists(a => resolver(a.name, attrName))
+}
+
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperators {
   case agg @ Aggregate(groups, aggs, child)
   if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedAttribute]) =>
-// This is a strict check though, we put this to apply the rule 
only in alias expressions
-def notResolvableByChild(attrName: String): Boolean =
-  !child.output.exists(a => resolver(a.name, attrName))
-agg.copy(groupingExpressions = groups.map {
-  case u: UnresolvedAttribute if notResolvableByChild(u.name) =>
+groups.exists(!_.resolved) =>
+agg.copy(groupingExpressions = groups.map { _.transform {
+case u: UnresolvedAttribute if notResolvableByChild(u.name, 
child) =>
+  aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u)
+  }
+})
+
+  case gs @ GroupingSets(selectedGroups, groups, child, aggs)
+  if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
+(selectedGroups :+ 
groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) =>
--- End diff --

Are we sure that grouping expressions are all pure attributes? If not, this 
check might fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116161668
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -126,16 +125,11 @@ class JacksonParser(
 
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
-  val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
-value.toFloat
--- End diff --

This PR changes the behavior, right? Does your test case pass without the 
code changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116161266
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -126,16 +125,11 @@ class JacksonParser(
 
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
-  val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
-value.toFloat
--- End diff --

Probably let me check and open JIRA/PR if there are some cases we should 
handle when I have some time. Let's leave out that here. It sounds that does 
not block this PR and I don't want extra changes do not hold off this PR like 
https://github.com/apache/spark/pull/17901.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17644: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17644
  
**[Test build #76852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76852/testReport)**
 for PR 17644 at commit 
[`239beee`](https://github.com/apache/spark/commit/239beee4f74488445c4e60c5aa17f54004ce9658).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17901: [SPARK-20639][SQL] Add single argument support fo...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17901#discussion_r116160628
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2683,13 +2683,12 @@ object functions {
   def unix_timestamp(s: Column, p: String): Column = withExpr { 
UnixTimestamp(s.expr, Literal(p)) }
 
   /**
-   * Convert time string to a Unix timestamp (in seconds).
-   * Uses the pattern "-MM-dd HH:mm:ss" and will return null on 
failure.
+   * Convert time string to a Unix timestamp (in seconds) by casting rules 
to `TimestampType`.
* @group datetime_funcs
* @since 2.2.0
*/
   def to_timestamp(s: Column): Column = withExpr {
-new ParseToTimestamp(s.expr, Literal("-MM-dd HH:mm:ss"))
--- End diff --

@rxin and @cloud-fan, I would rather take out the change here if this holds 
off this PR. This is  essentially orthogonal with this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116160516
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -126,16 +125,11 @@ class JacksonParser(
 
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
-  val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
-value.toFloat
--- End diff --

If possible, we want to be consistent with the others. Could you check 
Pandas? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116160371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -126,16 +125,11 @@ class JacksonParser(
 
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
-  val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
-value.toFloat
--- End diff --

Is JSON format itself related with CSV and Pandas? Please see the 
discussion in https://github.com/apache/spark/pull/9759#r63321521.

Also, this does not change the existing behaviour.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116160121
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -126,16 +125,11 @@ class JacksonParser(
 
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
-  val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
-value.toFloat
--- End diff --

Not sure whether we should follow it. How about our CSV parsers? How about 
the Pandas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116159902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,32 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean = {
+  !child.output.exists(a => resolver(a.name, attrName))
+}
+
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperators {
   case agg @ Aggregate(groups, aggs, child)
   if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedAttribute]) =>
-// This is a strict check though, we put this to apply the rule 
only in alias expressions
-def notResolvableByChild(attrName: String): Boolean =
-  !child.output.exists(a => resolver(a.name, attrName))
-agg.copy(groupingExpressions = groups.map {
-  case u: UnresolvedAttribute if notResolvableByChild(u.name) =>
+groups.exists(!_.resolved) =>
+agg.copy(groupingExpressions = groups.map { _.transform {
+case u: UnresolvedAttribute if notResolvableByChild(u.name, 
child) =>
+  aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u)
+  }
+})
+
+  case gs @ GroupingSets(selectedGroups, groups, child, aggs)
+  if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
+(selectedGroups :+ 
groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) =>
--- End diff --

Aha, I see. It looks reasonable. I'll update. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17956
  
**[Test build #76851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76851/testReport)**
 for PR 17956 at commit 
[`e858789`](https://github.com/apache/spark/commit/e85878917d6ff1fe107efc59bb1d403401a2441f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116159720
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -126,16 +125,11 @@ class JacksonParser(
 
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
-  val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
-value.toFloat
--- End diff --

For input/output, it is not a bug. Please see 
https://github.com/apache/spark/pull/17956#discussion_r116144531 and the PR 
description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116159435
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -126,16 +125,11 @@ class JacksonParser(
 
 case VALUE_STRING =>
   // Special case handling for NaN and Infinity.
-  val value = parser.getText
-  val lowerCaseValue = value.toLowerCase(Locale.ROOT)
-  if (lowerCaseValue.equals("nan") ||
-lowerCaseValue.equals("infinity") ||
-lowerCaseValue.equals("-infinity") ||
-lowerCaseValue.equals("inf") ||
-lowerCaseValue.equals("-inf")) {
-value.toFloat
--- End diff --

This sounds a bug fix, because `toFloat` is case sensitive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116159393
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   assert(errMsg.startsWith("The field for corrupt records must be 
string type and nullable"))
 }
   }
+
+  test("SPARK-18772: Parse special floats correctly") {
+// positive cases
+val jsons = Seq(
+  """{"a": "-INF"}""",
+  """{"a": "INF"}""",
+  """{"a": "-INF"}""",
--- End diff --

Yes ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17711
  
**[Test build #76850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76850/testReport)**
 for PR 17711 at commit 
[`de89791`](https://github.com/apache/spark/commit/de89791d2f531c966571ce8a37049f159856e38d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116159338
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,32 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean = {
+  !child.output.exists(a => resolver(a.name, attrName))
+}
+
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperators {
   case agg @ Aggregate(groups, aggs, child)
   if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedAttribute]) =>
-// This is a strict check though, we put this to apply the rule 
only in alias expressions
-def notResolvableByChild(attrName: String): Boolean =
-  !child.output.exists(a => resolver(a.name, attrName))
-agg.copy(groupingExpressions = groups.map {
-  case u: UnresolvedAttribute if notResolvableByChild(u.name) =>
+groups.exists(!_.resolved) =>
+agg.copy(groupingExpressions = groups.map { _.transform {
+case u: UnresolvedAttribute if notResolvableByChild(u.name, 
child) =>
+  aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u)
+  }
+})
+
+  case gs @ GroupingSets(selectedGroups, groups, child, aggs)
+  if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
+(selectedGroups :+ 
groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) =>
--- End diff --

oh, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116159278
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   assert(errMsg.startsWith("The field for corrupt records must be 
string type and nullable"))
 }
   }
+
+  test("SPARK-18772: Parse special floats correctly") {
+// positive cases
+val jsons = Seq(
+  """{"a": "-INF"}""",
+  """{"a": "INF"}""",
+  """{"a": "-INF"}""",
--- End diff --

Do we want to test `+INF`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116159220
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   assert(errMsg.startsWith("The field for corrupt records must be 
string type and nullable"))
 }
   }
+
+  test("SPARK-18772: Parse special floats correctly") {
+// positive cases
+val jsons = Seq(
+  """{"a": "-INF"}""",
+  """{"a": "INF"}""",
+  """{"a": "-INF"}""",
--- End diff --

What is difference between line 1995 and line 1997?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r116159196
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence (We follow Oracle operator precedence: 
https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691)
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17711
  
@rxin ok, thank for the suggestion!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...

2017-05-11 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/17308#discussion_r116158469
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSink.scala
 ---
@@ -30,14 +30,19 @@ private[kafka010] class KafkaSink(
   @volatile private var latestBatchId = -1L
 
   override def toString(): String = "KafkaSink"
+  private val kafkaParams = new ju.HashMap[String, 
Object](executorKafkaParams)
 
   override def addBatch(batchId: Long, data: DataFrame): Unit = {
 if (batchId <= latestBatchId) {
   logInfo(s"Skipping already committed batch $batchId")
 } else {
   KafkaWriter.write(sqlContext.sparkSession,
-data.queryExecution, executorKafkaParams, topic)
+data.queryExecution, kafkaParams, topic)
   latestBatchId = batchId
 }
   }
+
+  override def stop(): Unit = {
+CachedKafkaProducer.close(kafkaParams)
--- End diff --

That's correct, I have understood, close requires a bit of rethinking, I am 
unable to see a straight forward way of doing it. If you agree, close related 
implementation can be taken out from this PR and be taken up in a separate JIRA 
and PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17945: [SPARK-20704][SPARKR] change CRAN test to run sin...

2017-05-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17945


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16985#discussion_r116158389
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ReorderJoinPredicates.scala
 ---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.joins
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, 
Partitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.SparkPlan
+
+/**
+ * When the physical operators are created for JOIN, the ordering of join 
keys is based on order
+ * in which the join keys appear in the user query. That might not match 
with the output
+ * partitioning of the join node's children (thus leading to extra sort / 
shuffle being
+ * introduced). This rule will change the ordering of the join keys to 
match with the
+ * partitioning of the join nodes' children.
+ */
+class ReorderJoinPredicates extends Rule[SparkPlan] {
+  private def reorderJoinKeys(
+  leftKeys: Seq[Expression],
+  rightKeys: Seq[Expression],
+  leftPartitioning: Partitioning,
+  rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) 
= {
+
+def reorder(expectedOrderOfKeys: Seq[Expression],
+currentOrderOfKeys: Seq[Expression]): (Seq[Expression], 
Seq[Expression]) = {
+  val leftKeysBuffer = ArrayBuffer[Expression]()
+  val rightKeysBuffer = ArrayBuffer[Expression]()
+
+  expectedOrderOfKeys.foreach(expression => {
+val index = currentOrderOfKeys.indexWhere(e => 
e.semanticEquals(expression))
+leftKeysBuffer.append(leftKeys(index))
+rightKeysBuffer.append(rightKeys(index))
+  })
+  (leftKeysBuffer, rightKeysBuffer)
+}
+
+if (leftKeys.forall(_.deterministic) && 
rightKeys.forall(_.deterministic)) {
+  leftPartitioning match {
+case HashPartitioning(leftExpressions, _)
+  if leftExpressions.length == leftKeys.length &&
--- End diff --

why do we need the same length? Let's say the child partitioning is `a, b, 
c, d` and the join key is `b, a`, we can reorder the join key to avoid shuffle, 
right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17958
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76844/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17945: [SPARK-20704][SPARKR] change CRAN test to run single thr...

2017-05-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17945
  
merged to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17958
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17711
  
I feel both are pretty complicated. Can we just do something similar to 
CombineUnion:

```
/**
 * Combines all adjacent [[Union]] operators into a single [[Union]].
 */
object CombineUnions extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
case u: Union => flattenUnion(u, false)
case Distinct(u: Union) => Distinct(flattenUnion(u, true))
  }

  private def flattenUnion(union: Union, flattenDistinct: Boolean): Union = 
{
val stack = mutable.Stack[LogicalPlan](union)
val flattened = mutable.ArrayBuffer.empty[LogicalPlan]
while (stack.nonEmpty) {
  stack.pop() match {
case Distinct(Union(children)) if flattenDistinct =>
  stack.pushAll(children.reverse)
case Union(children) =>
  stack.pushAll(children.reverse)
case child =>
  flattened += child
  }
}
Union(flattened)
  }
}```

It's going to be simpler because you don't need to handle distinct here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17958
  
**[Test build #76844 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76844/testReport)**
 for PR 17958 at commit 
[`e10101e`](https://github.com/apache/spark/commit/e10101eafe2329031d079977ab3f3e0aaee98908).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16985#discussion_r116158180
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ReorderJoinPredicates.scala
 ---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.joins
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, 
Partitioning}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.SparkPlan
+
+/**
+ * When the physical operators are created for JOIN, the ordering of join 
keys is based on order
+ * in which the join keys appear in the user query. That might not match 
with the output
+ * partitioning of the join node's children (thus leading to extra sort / 
shuffle being
+ * introduced). This rule will change the ordering of the join keys to 
match with the
+ * partitioning of the join nodes' children.
+ */
+class ReorderJoinPredicates extends Rule[SparkPlan] {
+  private def reorderJoinKeys(
+  leftKeys: Seq[Expression],
+  rightKeys: Seq[Expression],
+  leftPartitioning: Partitioning,
+  rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) 
= {
+
+def reorder(expectedOrderOfKeys: Seq[Expression],
--- End diff --

nit:
```
def reorder(
expectedOrderOfKeys: Seq[Expression],
currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression])
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17945: [SPARK-20704][SPARKR] change CRAN test to run single thr...

2017-05-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17945
  
AppVeyor was failing because of the `SparkListenerBus` issue.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16985#discussion_r116158076
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -315,8 +317,14 @@ abstract class BucketedReadSuite extends QueryTest 
with SQLTestUtils {
 assert(
   joinOperator.right.find(_.isInstanceOf[SortExec]).isDefined == 
sortRight,
   s"expected sort in the right child to be $sortRight but 
found\n${joinOperator.right}")
+
+if (expectedResult.isDefined) {
+  checkAnswer(joined, expectedResult.get)
--- End diff --

In L296 we already have an `assert` to check the result, do we really need 
this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...

2017-05-11 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17644#discussion_r116157920
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -871,6 +886,23 @@ private[hive] object HiveClientImpl {
   hiveTable.setViewOriginalText(t)
   hiveTable.setViewExpandedText(t)
 }
+
+table.bucketSpec match {
+  case Some(bucketSpec) =>
+hiveTable.setNumBuckets(bucketSpec.numBuckets)
+hiveTable.setBucketCols(bucketSpec.bucketColumnNames.toList.asJava)
--- End diff --

Got it. Adding a check if the table is hive and not data source table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17956
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17956
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76846/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17956
  
**[Test build #76846 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76846/testReport)**
 for PR 17956 at commit 
[`aa7c658`](https://github.com/apache/spark/commit/aa7c6580a733c0d964cabd1fcabf1f2730227f10).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17958
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17958
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76845/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17958
  
**[Test build #76845 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76845/testReport)**
 for PR 17958 at commit 
[`1e36134`](https://github.com/apache/spark/commit/1e361344101ccaf3d8a9ddebe6767b610f0916ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17953: [SPARK-20680][SQL] Spark-sql do not support for v...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17953#discussion_r116157223
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1504,6 +1504,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case ("decimal", precision :: Nil) => 
DecimalType(precision.getText.toInt, 0)
   case ("decimal", precision :: scale :: Nil) =>
 DecimalType(precision.getText.toInt, scale.getText.toInt)
+  case ("void", Nil) => NullType
--- End diff --

Hive 2.x disables it. Could you add some test cases by reading and writing 
the tables? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17959
  
merged to master/2.2, please send a follow-up PR to address @gatorsmile 's 
comments, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r116156736
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence (We follow Oracle operator precedence: 
https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691)
--- End diff --

ok!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17711
  
@maropu The solution using `tailrec` looks more straightforward. Could you 
submit the PR based on that? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17711#discussion_r116156556
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -32,3 +32,11 @@ select 1 - 2;
 select 2 * 5;
 select 5 % 3;
 select pmod(-7, 3);
+
+-- check operator precedence (We follow Oracle operator precedence: 
https://docs.oracle.com/cd/A87860_01/doc/server.817/a85397/operator.htm#997691)
--- End diff --

The link could be ineffective in the future. Could you also copy the table 
contents here? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...

2017-05-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17959


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17959
  
How about `HiveTableScanExec`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17959: [SPARK-20718][SQL] FileSourceScanExec with differ...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17959#discussion_r116156087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -519,8 +519,18 @@ case class FileSourceScanExec(
   relation,
   output.map(QueryPlan.normalizeExprId(_, output)),
   requiredSchema,
-  partitionFilters.map(QueryPlan.normalizeExprId(_, output)),
-  dataFilters.map(QueryPlan.normalizeExprId(_, output)),
+  canonicalizeFilters(partitionFilters, output),
+  canonicalizeFilters(dataFilters, output),
   None)
   }
+
+  private def canonicalizeFilters(filters: Seq[Expression], output: 
Seq[Attribute])
--- End diff --

Add a function description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116155996
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,32 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean = {
+  !child.output.exists(a => resolver(a.name, attrName))
+}
+
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperators {
   case agg @ Aggregate(groups, aggs, child)
   if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedAttribute]) =>
-// This is a strict check though, we put this to apply the rule 
only in alias expressions
-def notResolvableByChild(attrName: String): Boolean =
-  !child.output.exists(a => resolver(a.name, attrName))
-agg.copy(groupingExpressions = groups.map {
-  case u: UnresolvedAttribute if notResolvableByChild(u.name) =>
+groups.exists(!_.resolved) =>
+agg.copy(groupingExpressions = groups.map { _.transform {
+case u: UnresolvedAttribute if notResolvableByChild(u.name, 
child) =>
+  aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u)
+  }
+})
+
+  case gs @ GroupingSets(selectedGroups, groups, child, aggs)
+  if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
+(selectedGroups :+ 
groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) =>
+def mayResolveAttrByAggregateExprs(exprs: Seq[Expression]): 
Seq[Expression] = exprs.map {
--- End diff --

I think we should do `exprs.map { _.transform { ...` like above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14582
  
We could ask it to mailing-list if you strongly feel about this. For 
example, `from_json` function was also asked too to mailing list before getting 
merged.

I think we should not add all the variants just for consistency and this is 
why I asked more interests. There are many variants for language-specific and 
application-specific and I usually stay against if there is an easy workaround 
and looks a kind of variant.

I wouldn't stay against if there are more demands or interests for this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116155780
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,32 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean = {
+  !child.output.exists(a => resolver(a.name, attrName))
+}
+
 override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperators {
   case agg @ Aggregate(groups, aggs, child)
   if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedAttribute]) =>
-// This is a strict check though, we put this to apply the rule 
only in alias expressions
-def notResolvableByChild(attrName: String): Boolean =
-  !child.output.exists(a => resolver(a.name, attrName))
-agg.copy(groupingExpressions = groups.map {
-  case u: UnresolvedAttribute if notResolvableByChild(u.name) =>
+groups.exists(!_.resolved) =>
+agg.copy(groupingExpressions = groups.map { _.transform {
+case u: UnresolvedAttribute if notResolvableByChild(u.name, 
child) =>
+  aggs.find(ne => resolver(ne.name, u.name)).getOrElse(u)
+  }
+})
+
+  case gs @ GroupingSets(selectedGroups, groups, child, aggs)
+  if conf.groupByAliases && child.resolved && 
aggs.forall(_.resolved) &&
+(selectedGroups :+ 
groups).exists(_.exists(_.isInstanceOf[UnresolvedAttribute])) =>
--- End diff --

`groups` should cover `selectedGroups`. So we may not need to add 
`selectedGroups` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17948
  
LGTM too.  : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17940: [SPARK-20687][MLLIB] mllib.Matrices.fromBreeze ma...

2017-05-11 Thread ghoto

Github user ghoto commented on a diff in the pull request:

https://github.com/apache/spark/pull/17940#discussion_r116155652
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
---
@@ -992,7 +992,20 @@ object Matrices {
 new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
   case sm: BSM[Double] =>
 // There is no isTranspose flag for sparse matrices in Breeze
-new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, 
sm.data)
+
+// Some Breeze CSCMatrices may have extra trailing zeros in
+// .rowIndices and .data, which are added after some matrix
+// operations for efficiency.
+//
+// Therefore the last element of sm.colPtrs would no longer be
+// coherent with the size of sm.rowIndices and sm.data
+// despite sm being a valid CSCMatrix.
+// We need to truncate both arrays (rowIndices, data)
+// to the real size of the vector sm.activeSize to allow valid 
conversion
+
+val truncRowIndices = sm.rowIndices.slice(0, sm.activeSize)
+val truncData = sm.data.slice(0, sm.activeSize)
--- End diff --

I'm implementing both suggestions, however, wouldn't be the sm.copy more 
expensive than just doing those 2 slices?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17959
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76843/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17959
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17959
  
**[Test build #76843 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76843/testReport)**
 for PR 17959 at commit 
[`9ec86ec`](https://github.com/apache/spark/commit/9ec86ec1941bf0c329f4c6a1fb75271e91e51660).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait DataSourceScanExec extends LeafExecNode with CodegenSupport with 
PredicateHelper `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17957
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17957
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76842/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17957: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17957
  
**[Test build #76842 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76842/testReport)**
 for PR 17957 at commit 
[`4032940`](https://github.com/apache/spark/commit/40329404299ece70aef7ef245704978fb9d1e6f9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...

2017-05-11 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16985
  
@cloud-fan : I have made suggested change(s).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17956
  
LGTM except for a minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

2017-05-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17956#discussion_r116154314
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   assert(errMsg.startsWith("The field for corrupt records must be 
string type and nullable"))
 }
   }
+
+  test("SPARK-18772: Parse special floats correctly") {
+// positive cases
+val jsons = Seq(
+  """{"a": "-INF"}""",
+  """{"a": "INF"}""",
+  """{"a": "-INF"}""",
+  """{"a": "NaN"}""",
+  """{"a": "Infinity"}""",
+  """{"a": "+Infinity"}""",
+  """{"a": "-Infinity"}""")
+
+val checks: Seq[Double => Boolean] = Seq(
+  _.isNegInfinity,
+  _.isPosInfinity,
+  _.isNegInfinity,
+  _.isNaN,
+  _.isPosInfinity,
+  _.isPosInfinity,
+  _.isNegInfinity)
+
+Seq(FloatType, DoubleType).foreach { dt =>
+  jsons.zip(checks).foreach { case (json, check) =>
+val ds = spark.read
+  .schema(StructType(Seq(StructField("a", dt
+  .json(Seq(json).toDS())
+  .select($"a".cast(DoubleType)).as[Double]
+assert(check(ds.first()))
+  }
+}
+
+// negative case
+Seq(FloatType, DoubleType).foreach { dt =>
+  val e = intercept[SparkException] {
+spark.read
+  .option("mode", "FAILFAST")
+  .schema(StructType(Seq(StructField("a", dt
+  .json(Seq( """{"a": "nan"}""").toDS())
--- End diff --

Shall we also test other negative cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17052
  
Yea, I just wanted to check if it is in progress in any way. Thanks for 
your input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14582: [SPARK-16997][SQL] Allow loading of JSON float values as...

2017-05-11 Thread lalinsky

Github user lalinsky commented on the issue:

https://github.com/apache/spark/pull/14582
  
You were asking for more interest in the feature, there was no way I could 
answer that. :)

Regarding the change itself, the system can already auto cast integer to a 
timestamp, but not a floating point number. Floating number timestamps are 
pretty common in a Python ecosystem, more so than integer ones. From my point 
of view, that's an inconsistent and surprising behavior, that I wanted to 
correct. I wouldn't send the patch if it didn't work for any number, but having 
it done for just one number type seemed wrong to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17711
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17711
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76840/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17711
  
**[Test build #76840 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76840/testReport)**
 for PR 17711 at commit 
[`089db30`](https://github.com/apache/spark/commit/089db30958d2d78b131ed10eea0b733a18056bf7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-05-11 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17052
  
@HyukjinKwon Sorry! Busy for this period of time. Let me resolve this 
conflict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17956
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76841/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17956
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17956
  
**[Test build #76841 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)**
 for PR 17956 at commit 
[`660a284`](https://github.com/apache/spark/commit/660a2843050d99b15ca7676ead8b8be4117267f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17935: [SPARK-20690][SQL] Analyzer shouldn't add missing...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17935#discussion_r116153501
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -868,6 +868,29 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   Row(3, 3.0, 2, 3.0) :: Row(3, 3.0, 2, 3.0) :: Nil)
   }
 
+  test("SPARK-20690: Do not add missing attributes through subqueries") {
+withTempView("onerow") {
+  Seq(1).toDF("c1").createOrReplaceTempView("onerow")
+
+  val e = intercept[AnalysisException] {
+sql(
+  """
+| select 1
+| from   (select 1 from onerow t1 LIMIT 1)
--- End diff --

I'm surprised we support this syntax, I think the FROM clause must have an 
alias.

I checked with postgres, it will throw exception `subquery in FROM must 
have an alias`, can you check with other databases? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17948
  
**[Test build #76849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76849/testReport)**
 for PR 17948 at commit 
[`a809274`](https://github.com/apache/spark/commit/a8092742b99c9d43b04b4a4941345f179996a50f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL

2017-05-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17960#discussion_r116153263
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/limit.sql ---
@@ -1,23 +1,27 @@
 
 -- limit on various data types
-select * from testdata limit 2;
-select * from arraydata limit 2;
-select * from mapdata limit 2;
+SELECT * FROM testdata LIMIT 2;
--- End diff --

I just wonder why these should be upper-cased just for curiosity. Is this 
way preferred?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116153034
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,31 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean =
+  !child.output.exists(a => resolver(a.name, attrName))
--- End diff --

Thanks! Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17644: [SPARK-17729] [SQL] Enable creating hive bucketed...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17644#discussion_r116152814
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -871,6 +886,23 @@ private[hive] object HiveClientImpl {
   hiveTable.setViewOriginalText(t)
   hiveTable.setViewExpandedText(t)
 }
+
+table.bucketSpec match {
+  case Some(bucketSpec) =>
+hiveTable.setNumBuckets(bucketSpec.numBuckets)
+hiveTable.setBucketCols(bucketSpec.bucketColumnNames.toList.asJava)
--- End diff --

For data source table, which can be created by `CREATE TABLE src(...) USING 
parquet ...`, the bucketing information is in table properties, and hive will 
always read this table as a non-bucketed table.

After your PR, for bucketed data source tables written by Spark, Hive will 
read them as bucketed tables and cause problems, because the bucket hashing 
function is different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116152643
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,31 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
if the expression is not
+// resolvable by child.
+private def notResolvableByChild(attrName: String, child: 
LogicalPlan): Boolean =
+  !child.output.exists(a => resolver(a.name, attrName))
--- End diff --

Nit: style
```Scala
private def notResolvableByChild(attrName: String, child: LogicalPlan): 
Boolean = {
  !child.output.exists(a => resolver(a.name, attrName))
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17959
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17948
  
**[Test build #76848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76848/testReport)**
 for PR 17948 at commit 
[`0163656`](https://github.com/apache/spark/commit/0163656b8e5325cda7b80e0c0268c24608e9b871).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17960: [SPARK-20719] [SQL] Support LIMIT ALL

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17960
  
**[Test build #76847 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76847/testReport)**
 for PR 17960 at commit 
[`b4a4b0a`](https://github.com/apache/spark/commit/b4a4b0aee836cd6f8944716bc84323487527fc19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL

2017-05-11 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/17960

[SPARK-20719] [SQL] Support LIMIT ALL

### What changes were proposed in this pull request?
`LIMIT ALL` is the same as omitting the `LIMIT` clause. It is supported by 
both PrestgreSQL and Presto. This PR is to support it by adding it in the 
parser. 

### How was this patch tested?
Added a test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark LimitAll

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17960


commit b4a4b0aee836cd6f8944716bc84323487527fc19
Author: Xiao Li 
Date:   2017-05-12T04:34:37Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116151791
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,30 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
in alias expressions
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP/GROUPI...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17948
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17948: [SPARK-20710][SQL] Support aliases in CUBE/ROLLUP...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17948#discussion_r116150714
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1003,18 +1003,30 @@ class Analyzer(
*/
   object ResolveAggAliasInGroupBy extends Rule[LogicalPlan] {
 
+// This is a strict check though, we put this to apply the rule only 
in alias expressions
--- End diff --

`... only if the expression is not resolvable by child`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17936: [SPARK-20638][Core][WIP]Optimize the CartesianRDD to red...

2017-05-11 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/17936
  
The cluster test result. The `RDD.cartesian` is used in Spark mllib ALS 
algorithm, and compared with the latest spark master branch. 

Environments:  Spark on Yarn with 9 executors(10 cores & 30 GB Mem) on 
three nodes.
Test Data: The Data: User 480,000, and Item 17,000.

Test Case:
```
object TestNetflixlib {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("Test Netflix mlib")
val sc = new SparkContext(conf)

val data = sc.textFile("hdfs://10.1.2.173:9000/nf_training_set.txt")

val ratings = data.map(_.split("::") match {
  case Array(user, item, rate) => Rating(user.toInt, item.toInt, 
rate.toDouble)
})

val rank = 0
val numIterations = 10
val train_start = System.nanoTime()
val model = ALS.train(ratings, rank, numIterations, 0.01)
val training_time = (System.nanoTime() - train_start)/ 1e9
println(s"Training time(s): $training_time")

val rec_start = System.nanoTime()
val userRec = model.recommendProductsForUsers(20)
println(userRec.count())
val rec_time = (System.nanoTime() - rec_start) / 1e9
println(s"Recommend time(s): $rec_time")
  }
}
```

Test Results:
| Master Branch | Improved Branch | Percentage of ascension |
| --| -- | -- |
| 139.934s | 162.597s | 16 % |
| 148.138s | 157.597s | 6% |
| 157.899s | 189.580s | 20% |
| 135.520s | 152.486s | 13% |
| 166.101s | 184.485s | 11 % |


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...

2017-05-11 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/16697
  
Yes. If inside, you are right - only the first will be logged !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...

2017-05-11 Thread CodingCat

Github user CodingCat commented on the issue:

https://github.com/apache/spark/pull/16697
  
you mean outside of 
https://github.com/apache/spark/pull/16697/files#diff-ca0fe05a42fd5edcab8a1bdaa8e58db9R210?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...

2017-05-11 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17942#discussion_r116148995
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -115,26 +115,33 @@ private[spark] abstract class Task[T](
   case t: Throwable =>
 e.addSuppressed(t)
 }
+context.markTaskCompleted(Some(e))
 throw e
 } finally {
-  // Call the task completion callbacks.
-  context.markTaskCompleted()
   try {
-Utils.tryLogNonFatalError {
-  // Release memory used by this thread for unrolling blocks
-  
SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP)
-  
SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.OFF_HEAP)
-  // Notify any tasks waiting for execution memory to be freed to 
wake up and try to
-  // acquire memory again. This makes impossible the scenario 
where a task sleeps forever
-  // because there are no other tasks left to notify it. Since 
this is safe to do but may
-  // not be strictly necessary, we should revisit whether we can 
remove this in the future.
-  val memoryManager = SparkEnv.get.memoryManager
-  memoryManager.synchronized { memoryManager.notifyAll() }
-}
+// Call the task completion callbacks. If "markTaskCompleted" is 
called twice, the second
+// one is no-op.
--- End diff --

Missed this comment.
LGTM. Thanks for clarifying @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...

2017-05-11 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17942#discussion_r116148841
  
--- Diff: core/src/main/scala/org/apache/spark/util/taskListeners.scala ---
@@ -55,14 +55,16 @@ class TaskCompletionListenerException(
   extends RuntimeException {
 
   override def getMessage: String = {
-if (errorMessages.size == 1) {
--- End diff --

Thx for clarifying !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17956
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17906: [SPARK-20665][SQL]"Bround" and "Round" function return N...

2017-05-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17906
  
thanks, merging to master/2.2/2.1/2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17906: [SPARK-20665][SQL]"Bround" and "Round" function r...

2017-05-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17906


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17959: [SPARK-20718][SQL] FileSourceScanExec with different fil...

2017-05-11 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17959
  
also cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17958: [SPARK-20716][SS] StateStore.abort() should not throw ex...

2017-05-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17958
  
**[Test build #76845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76845/testReport)**
 for PR 17958 at commit 
[`1e36134`](https://github.com/apache/spark/commit/1e361344101ccaf3d8a9ddebe6767b610f0916ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 630 matches

Mail list logo