date:20160715

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
The current grammar is unable to support multiple hints. For example, 
```SQL
SELECT /*+ LEADING(e2 e1) USE_NL(e1) INDEX(e1 emp_emp_id_pk) 
   USE_MERGE(j) FULL(j) */
e1.first_name, e1.last_name, j.job_id, sum(e2.salary) total_sal
  FROM employees e1, employees e2, job_history j
  WHERE e1.employee_id = e2.manager_id
AND e1.employee_id = j.employee_id
AND e1.hire_date = j.start_date
  GROUP BY e1.first_name, e1.last_name, j.job_id
  ORDER BY total_sal;
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-07-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14215
  
I see, yes I will think of a better way to fix the message. Yea it is still 
happening across other data sources and this implementation is very specific to 
Parquet.

I just wonder we can implement them step by step. Actually, I kind of put 
possible generalized things in `ParquetSchemaCompatibilit`. For example, ORC is 
doing very similarly with Parquet, 
[HiveInspectors.scala#L630-L649](https://github.com/apache/spark/blob/4f869f88ee96fa57be79f972f218111b6feac67f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala#L630-L649).

I just want to do this bit by bit rather than changing a bunch of codes..

BTW, does that look okay anyway (I mean converting the value before setting 
the value to the row).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
It sounds like we want to follow Oracle's syntax style, although I am not 
sure if it is OK. Below is the link to the documentation: 
https://docs.oracle.com/cd/B12037_01/server.101/b10752/hintsref.htm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #62370 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62370/consoleFull)**
 for PR 14045 at commit 
[`3b8c3ce`](https://github.com/apache/spark/commit/3b8c3ce36c1fcb2f201fe3588e2cbd3a4b32e7b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...

2016-07-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14215
  
For handling messages, I will open a separate PR soon!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14132
  
It's OK to not support multiple hints for now since we have really only one 
hint.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70930600
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
--- End diff --

can you add a test suite for this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14132
  
Would be great to add a unit test suite for the analyzer rule. Other than 
that this looks good to me.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70931050
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
--- End diff --

Oh, I missed that. Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70930995
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
+  }
+  resolvedChild
+
+// Remove unrecognized hint
+case Hint(name, _, child) =>
+  logDebug(s"Ignore Unknown Hint: $name")
--- End diff --

Sure. No problem. It was just added by @hvanhovell 's request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Thank you for @gatorsmile and @rxin .
I'll remote `logDebug` and add a unittest for Analyzer soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70931220
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -354,6 +354,14 @@ case class BroadcastHint(child: LogicalPlan) extends 
UnaryNode {
   override lazy val statistics: Statistics = 
super.statistics.copy(isBroadcastable = true)
 }
 
+/**
+ * A general hint for the child.
+ * a pair of (name, parameters).
+ */
+case class Hint(name: String, parameters: Seq[String], child: LogicalPlan) 
extends UnaryNode {
+  override def output: Seq[Attribute] = child.output
--- End diff --

Do we need to override `resolved`? Set it to `false`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14210: [SPARK-16556] [SPARK-16559] [SQL] Fix Two Bugs in...

2016-07-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14210#discussion_r70931496
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -75,7 +75,14 @@ private[sql] case class 
InsertIntoHadoopFsRelationCommand(
 case (x, ys) if ys.length > 1 => "\"" + x + "\""
   }.mkString(", ")
   throw new AnalysisException(s"Duplicate column(s) : 
$duplicateColumns found, " +
-  s"cannot save to file.")
+"cannot save to file.")
+}
+
+bucketSpec.foreach { spec =>
--- End diff --

```
if (bucketSpec.exists(_.numBuckets <= 0)) {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
All the test cases are for positive cases. We also should add more negative 
cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate unnecessary rounds of physi...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14214
  
**[Test build #62368 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62368/consoleFull)**
 for PR 14214 at commit 
[`9334105`](https://github.com/apache/spark/commit/933410558429ea82f063f21236b8c5c645650a78).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate unnecessary rounds of physi...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62368/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate unnecessary rounds of physi...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14214
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate unnecessary rounds of physi...

2016-07-15 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14214
  
@marmbrus @zsxwing could you take a look and share some ideas? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Sure, @gatorsmile !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Sure, @gatorsmile . But, could you give more specific examples what you 
mean? Currently,
- PlanParserSuite has a exception testcase.
- BroadcastJoinSuite matches all nodes of the plan. It means it tests the 
unmatched table, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70934669
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
--- End diff --

In Hive/Oracle, what happens if we are unable to find any 
`UnresolvedRelation` with the same name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

2016-07-15 Thread wuxianxingkong

Github user wuxianxingkong commented on a diff in the pull request:

https://github.com/apache/spark/pull/14191#discussion_r70935031
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -338,7 +338,7 @@ querySpecification
(RECORDREADER recordReader=STRING)?
fromClause?
(WHERE where=booleanExpression)?)
-| ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
+| ((kind=SELECT setQuantifier? namedExpressionSeq (intoClause? 
fromClause)?
--- End diff --

```sql
SELECT 1 
INTO newtable
```
This won't work because we need oldtable info to create newtable. So the 
sql should be
```sql
SELECT 1
INTO newtable 
FROM oldtable
```
The result from my test is: a new table called newtable was created, one 
column called 1 has the length of oldtable.rows.length and all elements are 1.
Did you mean there is no _FROM_?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70935210
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
--- End diff --

Always case sensitive? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70935292
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
--- End diff --

Hint is ignored.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70935434
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
--- End diff --

Sure. It sounds better. Currently, we turned of the case sensitivity, but 
that is not a reason not to do that. 
Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70935634
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
--- End diff --

I think we should add a test case for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70935836
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
--- End diff --

Hint are a kind of comment. All the exceptional cases are ignored silently. 
That is the basic policy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14191: [SPARK-16217][SQL] Support SELECT INTO statement

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14191#discussion_r70935956
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -338,7 +338,7 @@ querySpecification
(RECORDREADER recordReader=STRING)?
fromClause?
(WHERE where=booleanExpression)?)
-| ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
+| ((kind=SELECT setQuantifier? namedExpressionSeq (intoClause? 
fromClause)?
--- End diff --

In the Spark Shell, please run the followings.
```
sql("select 1")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70935879
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
--- End diff --

Yeah, `Analyzer` always consider case sensitivity issues. Please check how 
to use `resolver`: 
https://github.com/apache/spark/blob/8b5a8b25b9d29b7d0949d5663c7394b26154a836/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L68-L74


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70936038
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
--- End diff --

Yeah, but we can add a test case to verify the future changes will not 
break this policy


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70935529
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
+  }
+  resolvedChild
+
+// Remove unrecognized hint
+case Hint(name, _, child) =>
+  logDebug(s"Ignore Unknown Hint: $name")
--- End diff --

How about adding one more case in `BasicOperators` strategy?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14150#discussion_r70936983
  
--- Diff: mllib/src/test/java/org/apache/spark/ml/feature/JavaPCASuite.java 
---
@@ -107,7 +107,11 @@ public VectorPair call(Tuple2 pair) {
   .fit(df);
 List result = pca.transform(df).select("pca_features", 
"expected").toJavaRDD().collect();
 for (Row r : result) {
-  Assert.assertEquals(r.get(1), r.get(0));
+  Vector calculatedVector = (Vector)r.get(0);
--- End diff --

Not sure why these don't fail the style checker, but, space after cats, i++ 
instead of i ++, and 1.0e-8 for clarity


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14150#discussion_r70937028
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -42,7 +43,9 @@ class PCASuite extends SparkFunSuite with 
MLlibTestSparkContext {
 val pca_transform = pca.transform(dataRDD).collect()
 val mat_multiply = mat.multiply(pc).rows.collect()
 
-assert(pca_transform.toSet === mat_multiply.toSet)
-assert(pca.explainedVariance === explainedVariance)
+pca_transform.zip(mat_multiply).foreach { case (calculated: Vector, 
expected: Vector) =>
--- End diff --

Not a big deal but do you really need the explicit types?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14150: [SPARK-16494] [ML] Upgrade breeze version to 0.12

2016-07-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14150
  
Any other changes that might cause compatibility problems? i doubt it but 
worth skimming the release notes. This seems OK for 2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70937097
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -338,7 +338,7 @@ querySpecification
(RECORDREADER recordReader=STRING)?
fromClause?
(WHERE where=booleanExpression)?)
-| ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
+| ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause?
| fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
--- End diff --

How about this `SELECT`? Should we also allow users use `Hint` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70937541
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -354,6 +354,14 @@ case class BroadcastHint(child: LogicalPlan) extends 
UnaryNode {
   override lazy val statistics: Statistics = 
super.statistics.copy(isBroadcastable = true)
 }
 
+/**
+ * A general hint for the child.
+ * a pair of (name, parameters).
--- End diff --

`a` -> `A`. I think we still can disclose more info about this logical node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70937920
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
--- End diff --

Based on the implementation, we also need to emphasize this rule has to be 
executed before the rule `ResolveRelations`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70938003
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
+  }
+  resolvedChild
+
+// Remove unrecognized hint
+case Hint(name, _, child) =>
+  logDebug(s"Ignore Unknown Hint: $name")
--- End diff --

Sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14218: [SPARK-16563][SQL] fix spark sql thrift server Fe...

2016-07-15 Thread alicegugu

GitHub user alicegugu opened a pull request:

https://github.com/apache/spark/pull/14218

[SPARK-16563][SQL] fix spark sql thrift server FetchResults bug

## What changes were proposed in this pull request?

Add a constant iterator which point to head of result. The header will be 
used to reset iterator when fetch result from first row repeatedly.

## How was this patch tested?

This bug was found when using Cloudera HUE connecting to spark sql thrift 
server, currently SQL statement result can be only fetched for once. The fix 
was tested manually with Cloudera HUE, With this fix, HUE can fetch spark SQL 
results repeatedly through thrift server. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alicegugu/spark SparkSQLFetchResultsBug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14218


commit a93954179a49fd5c31600062e03f71831466c4ff
Author: Alice 
Date:   2016-07-15T06:13:52Z

[SPARK-16563][SQL] fix spark sql thrift server ExecuteStatementOperation 
FetchResults bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70938037
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -84,7 +84,8 @@ class Analyzer(
 Batch("Substitution", fixedPoint,
   CTESubstitution,
   WindowsSubstitution,
-  EliminateUnions),
+  EliminateUnions,
+  SubstituteHint),
--- End diff --

Like `EliminateUnions`, please rename it to `SubstituteHints`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12951: [SPARK-15176][Core] Add maxShares setting to Pools

2016-07-15 Thread njwhite

Github user njwhite commented on the issue:

https://github.com/apache/spark/pull/12951
  
ping?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70938251
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
+  }
+  resolvedChild
+
+// Remove unrecognized hint
--- End diff --

`hint` -> `hints`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14218: [SPARK-16563][SQL] fix spark sql thrift server FetchResu...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14218
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14218: [SPARK-16563][SQL] fix spark sql thrift server FetchResu...

2016-07-15 Thread alicegugu

Github user alicegugu commented on the issue:

https://github.com/apache/spark/pull/14218
  
@liancheng hey, please help review (BTW are you the lian cheng from MSTC 
zju?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70938777
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -84,7 +84,8 @@ class Analyzer(
 Batch("Substitution", fixedPoint,
   CTESubstitution,
   WindowsSubstitution,
-  EliminateUnions),
+  EliminateUnions,
+  SubstituteHint),
--- End diff --

Great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14219: sort sparseVector's indices before doing multipli...

2016-07-15 Thread wilson-lauw

GitHub user wilson-lauw opened a pull request:

https://github.com/apache/spark/pull/14219

sort sparseVector's indices before doing multiplication

## What changes were proposed in this pull request?

sort sparseVector's indices before doing multiplication to make sure the 
result returned correctly


## How was this patch tested?

manual and existing tests





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wilson-lauw/spark SPARK-16566

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14219.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14219


commit 2aaf742dac7e8d01387d0f615e3b951a2bf5b1d4
Author: Wilson Lauw 
Date:   2016-07-15T08:34:31Z

sort sparseVector's indices before doing multiplication




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70938951
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
--- End diff --

For this one, we have this rule in `Batch("Substitution")`. I think that's 
fairly enough information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14219: [SPARK-16566][MLLib] sort sparseVector's indices before ...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14219
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70939090
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHint extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+  for (table <- parameters) {
+var stop = false
+resolvedChild = child.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t.table == 
table =>
+stop = true
+BroadcastHint(r)
+}
+  }
+  resolvedChild
+
+// Remove unrecognized hint
--- End diff --

Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
Finished the first pass. : ) Great job!

Based on the PR description, I am not very sure about the scope, especially 
regarding the nested cases. How about the tables defined in a temp view? The 
table has been resolved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
It is very late. : ) Will review it again after your code changes. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14216
  
Yeah, I think that's what has to happen to keep this efficient. There's 
another problem, in that `numNonzeros` returns the wrong values now. It doesn't 
return counts. `nnz` is really like a "weightSum" and `weightSum` is really 
like a "totalWeightSum". Your `cnnz` is really "nnz". It may be worth fixing 
this while we're here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Thank you so much for in-depth interview, @gatorsmile .
I've learned a lot. Yes. It is very late. I took a nap during the evening. 
:)
See you tomorrow on GitHub.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #62370 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62370/consoleFull)**
 for PR 14045 at commit 
[`3b8c3ce`](https://github.com/apache/spark/commit/3b8c3ce36c1fcb2f201fe3588e2cbd3a4b32e7b4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62370/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14137: SPARK-16478 graphX (added graph caching in strongly conn...

2016-07-15 Thread wesolowskim

Github user wesolowskim commented on the issue:

https://github.com/apache/spark/pull/14137
  
I removed counts at the end of outside loop. I added it before because 
without it I still encountered problems, but I guess something else must have 
been wrong. I reasoned that despite the fact that sccGraph is persisted it can 
be evicted before it is used again. It was quite strange, because it doesn't 
happen much between sccGraph reuse and this additional counts. I tested it once 
again and it works with current version. 
I added change that saves some computing time in some cases - if while loop 
leaves because iter>maxIter Pregels are useless. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14216
  
@srowen OK. I'll fix the var names first. 
nnz => weightSum
weightSum => totalWeightSum
cnnz => nnz

is that right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14220: [SPARK-16568][SQL][Documentation] update sql prog...

2016-07-15 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/14220

[SPARK-16568][SQL][Documentation] update sql programming guide refreshTable 
API in python code

## What changes were proposed in this pull request?

update `refreshTable` API in python code.

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark update_sql_doc_catalog

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14220.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14220


commit e77203460dd274f1c1c0b6a3292511282cf64d3d
Author: WeichenXu 
Date:   2016-07-12T12:21:38Z

update_sql_doc_catalog




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14220: [SPARK-16568][SQL][Documentation] update sql programming...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14220
  
**[Test build #62371 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62371/consoleFull)**
 for PR 14220 at commit 
[`e772034`](https://github.com/apache/spark/commit/e77203460dd274f1c1c0b6a3292511282cf64d3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62372 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62372/consoleFull)**
 for PR 14132 at commit 
[`210b636`](https://github.com/apache/spark/commit/210b6365789320f12d0757ad66e7b122614f606c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14220: [SPARK-16568][SQL][Documentation] update sql programming...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14220
  
**[Test build #62371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62371/consoleFull)**
 for PR 14220 at commit 
[`e772034`](https://github.com/apache/spark/commit/e77203460dd274f1c1c0b6a3292511282cf64d3d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14220: [SPARK-16568][SQL][Documentation] update sql programming...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14220
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14216
  
I think so. I think we will have to double check that 'fixing' numNonzero 
is OK too, that other methods aren't expecting the weights actually. @dbtsai 
what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14220: [SPARK-16568][SQL][Documentation] update sql programming...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14220
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62371/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70948608
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -338,7 +338,7 @@ querySpecification
(RECORDREADER recordReader=STRING)?
fromClause?
(WHERE where=booleanExpression)?)
-| ((kind=SELECT setQuantifier? namedExpressionSeq fromClause?
+| ((kind=SELECT hint? setQuantifier? namedExpressionSeq fromClause?
| fromClause (kind=SELECT setQuantifier? namedExpressionSeq)?)
--- End diff --

Do you want the following? It's already handled.
```scala
scala> sql("FROM t JOIN u ON t.id = u.id SELECT /*+ MAPJOIN(u) */ 
*").explain
== Physical Plan ==
*BroadcastHashJoin [id#0L], [id#4L], Inner, BuildRight
:- *Range (0, 10, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]))
   +- *Range (0, 10, splits=8)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14216
  
**[Test build #62373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62373/consoleFull)**
 for PR 14216 at commit 
[`fe8ff62`](https://github.com/apache/spark/commit/fe8ff624a56447adbb417466b46cc69933a5b1a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14216
  
@srowen OK var names updated.
and the 'fixing' numNonzero which you said means the number of input 
vectors which weight > 0 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
In database, the purpose of view is **hiding the underneath tables**. We 
should not support to specify the table inside the view. However, you can 
specify the hint on the view itself. Let me give you example. I assumed that 
you asked the following scenario.

```scala
scala> sql("create temporary view view_u as select * from u")
```

We know that `u` is inside the view. But any table names inside the view 
should be ignored because they are not is unrecognized table name in this 
context.
```scala
scala> sql("SELECT /*+ MAPJOIN(u) */ * FROM t JOIN view_u ON t.id = 
view_u.id").explain(true)
== Physical Plan ==
*SortMergeJoin [id#0L], [id#4L], Inner
:- *Sort [id#0L ASC], false, 0
:  +- Exchange hashpartitioning(id#0L, 200)
: +- *Range (0, 10, splits=8)
+- *Sort [id#4L ASC], false, 0
   +- ReusedExchange [id#4L], Exchange hashpartitioning(id#0L, 200)
```

However, if you give the hint on view **view_u**, it can be propagated.
```scala
scala> sql("SELECT /*+ MAPJOIN(view_u) */ * FROM t JOIN view_u ON t.id = 
view_u.id").explain
== Physical Plan ==
*BroadcastHashJoin [id#0L], [id#4L], Inner, BuildRight
:- *Range (0, 10, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]))
   +- *Range (0, 10, splits=8)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Now, the PR is updated and becomes more robust. Thank you, all.
- Remove `logDebug`. (@rxin)
- Add a unit test in `AnalysisTest`. (@rxin)
  - To do this, I extended the test harness `AnalysisTest.makeAnalyzer` by 
adding one more test temp table.
  ```scala
 catalog.createTempView("TaBlE", TestRelations.testRelation, 
overrideIfExists = true)
+  catalog.createTempView("TaBlE2", TestRelations.testRelation2, 
overrideIfExists = true)
```
- Override `resolved` explicitly. (@gatorsmile)
- Add a rule in `SparkStrategies`. (@gatorsmile)
- Rename `SubstituteHint` with `SubstituteHints`. (@gatorsmile)
- Use `resolver` to support case sensitive/insensitive rules. (@gatorsmile)
- Fix typos. (@gatorsmile)

For the other comments, I left my thoughts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14216
  
**[Test build #62373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62373/consoleFull)**
 for PR 14216 at commit 
[`fe8ff62`](https://github.com/apache/spark/commit/fe8ff624a56447adbb417466b46cc69933a5b1a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14216
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62373/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14167: [SPARK-16194] Mesos Driver env vars

2016-07-15 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/14167#discussion_r70954615
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -353,38 +353,60 @@ private[spark] class MesosClusterScheduler(
 }
   }
 
-  private def buildDriverCommand(desc: MesosDriverDescription): 
CommandInfo = {
-val appJar = CommandInfo.URI.newBuilder()
-  
.setValue(desc.jarUrl.stripPrefix("file:").stripPrefix("local:")).build()
-val builder = CommandInfo.newBuilder().addUris(appJar)
-val entries = conf.getOption("spark.executor.extraLibraryPath")
-  .map(path => Seq(path) ++ desc.command.libraryPathEntries)
-  .getOrElse(desc.command.libraryPathEntries)
-
-val prefixEnv = if (!entries.isEmpty) {
-  Utils.libraryPathEnvPrefix(entries)
-} else {
-  ""
+  private def getDriverExecutorURI(desc: MesosDriverDescription) = {
+desc.schedulerProperties.get("spark.executor.uri")
+  .orElse(desc.command.environment.get("SPARK_EXECUTOR_URI"))
+  }
+
+  private def getDriverEnvironment(desc: MesosDriverDescription): 
Environment = {
+val env = {
+  val executorOpts = desc.schedulerProperties.map { case (k, v) => 
s"-D$k=$v" }.mkString(" ")
+  val executorEnv = Map("SPARK_EXECUTOR_OPTS" -> executorOpts)
+
+  val prefix = "spark.mesos.driverEnv."
+  val driverEnv = 
desc.schedulerProperties.filterKeys(_.startsWith(prefix))
+.map { case (k, v) => (k.substring(prefix.length), v) }
+
+  driverEnv ++ executorEnv ++ desc.command.environment
 }
+
 val envBuilder = Environment.newBuilder()
-desc.command.environment.foreach { case (k, v) =>
-  
envBuilder.addVariables(Variable.newBuilder().setName(k).setValue(v).build())
+env.foreach { case (k, v) =>
+  envBuilder.addVariables(Variable.newBuilder().setName(k).setValue(v))
 }
-// Pass all spark properties to executor.
-val executorOpts = desc.schedulerProperties.map { case (k, v) => 
s"-D$k=$v" }.mkString(" ")
-envBuilder.addVariables(
-  
Variable.newBuilder().setName("SPARK_EXECUTOR_OPTS").setValue(executorOpts))
+envBuilder.build()
+  }
+
+  private def getDriverUris(desc: MesosDriverDescription): 
List[CommandInfo.URI] = {
+val confUris = List(conf.getOption("spark.mesos.uris"),
+  desc.schedulerProperties.get("spark.mesos.uris"),
+  desc.schedulerProperties.get("spark.submit.pyFiles")).flatMap(
+  _.map(_.split(",").map(_.trim))
+).flatten
+
+val jarUrl = desc.jarUrl.stripPrefix("file:").stripPrefix("local:")
+
+((jarUrl :: confUris) ++ getDriverExecutorURI(desc).toList).map(uri =>
+  CommandInfo.URI.newBuilder().setValue(uri.trim()).build())
+  }
+
+  private def getDriverCommandValue(desc: MesosDriverDescription): String 
= {
 val dockerDefined = 
desc.schedulerProperties.contains("spark.mesos.executor.docker.image")
-val executorUri = desc.schedulerProperties.get("spark.executor.uri")
-  .orElse(desc.command.environment.get("SPARK_EXECUTOR_URI"))
+val executorUri = getDriverExecutorURI(desc)
 // Gets the path to run spark-submit, and the path to the Mesos 
sandbox.
 val (executable, sandboxPath) = if (dockerDefined) {
   // Application jar is automatically downloaded in the mounted 
sandbox by Mesos,
   // and the path to the mounted volume is stored in $MESOS_SANDBOX 
env variable.
   ("./bin/spark-submit", "$MESOS_SANDBOX")
 } else if (executorUri.isDefined) {
-  
builder.addUris(CommandInfo.URI.newBuilder().setValue(executorUri.get).build())
   val folderBasename = executorUri.get.split('/').last.split('.').head
+
+  val entries = conf.getOption("spark.executor.extraLibraryPath")
+.map(path => Seq(path) ++ desc.command.libraryPathEntries)
+.getOrElse(desc.command.libraryPathEntries)
+
+  val prefixEnv = if (!entries.isEmpty) 
Utils.libraryPathEnvPrefix(entries) else ""
--- End diff --

Replace with entries.nonEmpty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14167: [SPARK-16194] Mesos Driver env vars

2016-07-15 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/14167#discussion_r70954807
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -399,20 +421,18 @@ private[spark] class MesosClusterScheduler(
   // Sandbox points to the current directory by default with Mesos.
   (cmdExecutable, ".")
 }
-val primaryResource = new File(sandboxPath, 
desc.jarUrl.split("/").last).toString()
 val cmdOptions = generateCmdOption(desc, sandboxPath).mkString(" ")
+val primaryResource = new File(sandboxPath, 
desc.jarUrl.split("/").last).toString()
--- End diff --

no need for parentheses use: toString


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14167: [SPARK-16194] Mesos Driver env vars

2016-07-15 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/14167#discussion_r70955202
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/Utils.scala ---
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler.cluster.mesos
+
+import java.util.Collections
+
+import org.apache.mesos.Protos._
+import org.apache.mesos.Protos.Value.Scalar
+import org.apache.mesos.SchedulerDriver
+import org.mockito.{ArgumentCaptor, Matchers}
+import org.mockito.Mockito._
+import scala.collection.JavaConverters._
+
+object Utils {
+  def createOffer(offerId: String, slaveId: String, mem: Int, cpu: Int): 
Offer = {
+val builder = Offer.newBuilder()
+builder.addResourcesBuilder()
+  .setName("mem")
+  .setType(Value.Type.SCALAR)
+  .setScalar(Scalar.newBuilder().setValue(mem))
+builder.addResourcesBuilder()
+  .setName("cpus")
+  .setType(Value.Type.SCALAR)
+  .setScalar(Scalar.newBuilder().setValue(cpu))
+builder.setId(createOfferId(offerId))
+  .setFrameworkId(FrameworkID.newBuilder()
+.setValue("f1"))
+  .setSlaveId(SlaveID.newBuilder().setValue(slaveId))
+  .setHostname(s"host${slaveId}")
--- End diff --

{} redundant


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14167: [SPARK-16194] Mesos Driver env vars

2016-07-15 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/14167#discussion_r70955348
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala
 ---
@@ -172,4 +187,28 @@ class MesosClusterSchedulerSuite extends SparkFunSuite 
with LocalSparkContext wi
   assert(escape(s"onlywrap${char}this") === 
wrapped(s"onlywrap${char}this"))
 })
   }
+
+  test("supports spark.mesos.driverEnv.*") {
+setScheduler()
+
+val mem = 1000
+val cpu = 1
+
+val response = scheduler.submitDriver(
+  new MesosDriverDescription("d1", "jar", mem, cpu, true,
+command,
+Map("spark.mesos.executor.home" -> "test",
+  "spark.app.name" -> "test",
+  "spark.mesos.driverEnv.TEST_ENV" -> "TEST_VAL"),
+"s1",
+new Date()))
+assert(response.success)
+
+val offer = Utils.createOffer("o1", "s1", mem, cpu)
+scheduler.resourceOffers(driver, List(offer).asJava)
+val tasks = Utils.verifyTaskLaunched(driver, "o1")
+val env = 
tasks(0).getCommand.getEnvironment.getVariablesList.asScala.map(v =>
--- End diff --

use head instead of task(0)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14167: [SPARK-16194] Mesos Driver env vars

2016-07-15 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/14167
  
LGTM other than minor style issues.  I run our tests against it so 
refactoring is successful i guess.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14221: [SPARK-3359] [DOCS] More changes to resolve javad...

2016-07-15 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/14221

[SPARK-3359] [DOCS] More changes to resolve javadoc 8 errors that will help 
unidoc/genjavadoc compatibility

## What changes were proposed in this pull request?

These are yet more changes that resolve problems with unidoc/genjavadoc and 
Java 8. It does not fully resolve the problem, but gets rid of as many errors 
as we can from this end.


## How was this patch tested?

Jenkins build of docs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-3359.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14221


commit ea8884562f696da979a353bdb49a008fdd8049bd
Author: Sean Owen 
Date:   2016-07-15T11:23:01Z

More changes to resolve javadoc 8 errors that will help unidoc to work with 
Java 8; does not fully resolve the issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14221: [SPARK-3359] [DOCS] More changes to resolve javad...

2016-07-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14221#discussion_r70957858
  
--- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/util/GraphGenerators.scala ---
@@ -119,7 +119,7 @@ object GraphGenerators extends Logging {
* A random graph generator using the R-MAT model, proposed in
* "R-MAT: A Recursive Model for Graph Mining" by Chakrabarti et al.
*
-   * See [[http://www.cs.cmu.edu/~christos/PUBLICATIONS/siam04.pdf]].
+   * See http://www.cs.cmu.edu/~christos/PUBLICATIONS/siam04.pdf.
--- End diff --

Was valid scaladoc but doesn't convert to valid javadoc; it's a genjavadoc 
limitation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62372/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62372 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62372/consoleFull)**
 for PR 14132 at commit 
[`210b636`](https://github.com/apache/spark/commit/210b6365789320f12d0757ad66e7b122614f606c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14221: [SPARK-3359] [DOCS] More changes to resolve javad...

2016-07-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14221#discussion_r70957938
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala ---
@@ -212,7 +212,7 @@ object Pipeline extends MLReadable[Pipeline] {
 }
   }
 
-  /** Methods for [[MLReader]] and [[MLWriter]] shared between 
[[Pipeline]] and [[PipelineModel]] */
+  /** Methods for `MLReader` and `MLWriter` shared between [[Pipeline]] 
and [[PipelineModel]] */
--- End diff --

Several of these instances are because for whatever reason visibility is 
wrong in the generated javadoc and so it fails. It seemed more worth it to zap 
the error than retain these links.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14221: [SPARK-3359] [DOCS] More changes to resolve javadoc 8 er...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14221
  
**[Test build #62374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62374/consoleFull)**
 for PR 14221 at commit 
[`ea88845`](https://github.com/apache/spark/commit/ea8884562f696da979a353bdb49a008fdd8049bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14140: [SPARK-16426][MLlib] Fix bug that caused NaNs in Isotoni...

2016-07-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14140
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14140: [SPARK-16426][MLlib] Fix bug that caused NaNs in ...

2016-07-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14140


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14152: [SPARK-16395] [STREAMING] Fail if too many CheckpointWri...

2016-07-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14152
  
I'm gonna go for this if there are no objections. I'd even put it into 2.0, 
but unless someone seconds that I won't.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14137: SPARK-16478 graphX (added graph caching in strongly conn...

2016-07-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14137
  
I don't think you can reason about whether something's evicted as it's up 
to the runtime. Here the RDD has to be materialized before the method returns 
because its predecessors will have been unpersisted, but that is already done 
in the loop.

What's the reason the final loop iteration doesn't need to compute this? it 
wasn't obvious to me; a comment would be useful for future readers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14137: SPARK-16478 graphX (added graph caching in strongly conn...

2016-07-15 Thread wesolowskim

Github user wesolowskim commented on the issue:

https://github.com/apache/spark/pull/14137
  
Added comment explaining if. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14221: [SPARK-3359] [DOCS] More changes to resolve javadoc 8 er...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14221
  
**[Test build #62374 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62374/consoleFull)**
 for PR 14221 at commit 
[`ea88845`](https://github.com/apache/spark/commit/ea8884562f696da979a353bdb49a008fdd8049bd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14221: [SPARK-3359] [DOCS] More changes to resolve javadoc 8 er...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14221
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62374/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14221: [SPARK-3359] [DOCS] More changes to resolve javadoc 8 er...

2016-07-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14221
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12983: [SPARK-15213][PySpark] Unify 'range' usages

2016-07-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/12983
  
Let's close this for lack of follow up


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14129: [SPARK-16280][SQL][WIP] Implement histogram_numeric SQL ...

2016-07-15 Thread tilumi

Github user tilumi commented on the issue:

https://github.com/apache/spark/pull/14129
  
According to the result, I figured out the performance  of the  codegen 
version is bounded by the number of bins since the large generated code for 
array creation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14135: [Spark-16479] Add Example for asynchronous action

2016-07-15 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14135
  
Please close this @phalodi 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14135: [Spark-16479] Add Example for asynchronous action

2016-07-15 Thread phalodi

Github user phalodi closed the pull request at:

https://github.com/apache/spark/pull/14135


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceG...

2016-07-15 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/14222

[SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups should support 
partial aggregation

## What changes were proposed in this pull request?

`KeyValueGroupedDataset.reduceGroups` is currently implemented via 
`flatMapGroups`, which does not support partial aggregation and so is very 
inefficient.

`KeyValueGroupedDataset.reduceGroups` should support partial aggregation. 
This PR implements it with `Aggregator`.


## How was this patch tested?

Existing tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 improve-reducegroups

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14222


commit 11357737ad58a2a6c1ea2e17026669fc138f556c
Author: Liang-Chi Hsieh 
Date:   2016-07-15T14:06:41Z

Support partial aggregation for reduceGroups.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups s...

2016-07-15 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14222
  
**[Test build #62375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62375/consoleFull)**
 for PR 14222 at commit 
[`1135773`](https://github.com/apache/spark/commit/11357737ad58a2a6c1ea2e17026669fc138f556c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14222: [SPARK-16391][SQL] KeyValueGroupedDataset.reduceG...

2016-07-15 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14222#discussion_r70979389
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -177,10 +178,33 @@ class KeyValueGroupedDataset[K, V] private[sql](
* @since 1.6.0
*/
   def reduceGroups(f: (V, V) => V): Dataset[(K, V)] = {
-val func = (key: K, it: Iterator[V]) => Iterator((key, it.reduce(f)))
+val encoder = encoderFor[V]
+val intEncoder: ExpressionEncoder[Int] = ExpressionEncoder()
+val aggregator: TypedColumn[V, V] = new Aggregator[V, (Int, V), V] {
+  def bufferEncoder: Encoder[(Int, V)] = 
ExpressionEncoder.tuple(intEncoder, encoder)
+  def outputEncoder: Encoder[V] = encoder
 
-implicit val resultEncoder = ExpressionEncoder.tuple(kExprEnc, 
vExprEnc)
-flatMapGroups(func)
+  def zero: (Int, V) = (0, null.asInstanceOf[V])
--- End diff --

One problem with `Aggregator` here is the zero value. This PR uses an Int 
(can be Boolean too) to indicate if the buffer is initialized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 373 matches

Mail list logo