date:20160704

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69510671
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +94,63 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with CodegenFallback {
+
+  private lazy val numRows = children.head.eval().asInstanceOf[Int]
+  private lazy val numFields = ((children.length - 1) + numRows - 1) / 
numRows
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 
arguments.")
+} else if (children.head.dataType != IntegerType || 
!children.head.foldable || numRows < 1) {
+  TypeCheckResult.TypeCheckFailure("The number of rows must be a 
positive constant integer.")
+} else {
+  for (i <- 1 until children.length) {
+val j = (i - 1) % numFields
+if (children(i).dataType != elementSchema.fields(j).dataType) {
+  return TypeCheckResult.TypeCheckFailure(
+s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " 
+
+  s"Argument $i (${children(i).dataType})")
+}
+  }
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def elementSchema: StructType = {
+var schema = new StructType()
--- End diff --

how about
```
StructType(children.tail.take(numFields).zipWithIndex.map {
  case (e, index) => StructField(s"col$index", e.dataType)
})
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69510162
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +94,63 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with CodegenFallback {
+
+  private lazy val numRows = children.head.eval().asInstanceOf[Int]
+  private lazy val numFields = ((children.length - 1) + numRows - 1) / 
numRows
--- End diff --

can you explain a bit more about this? It will be good if we can expression 
the logic more clear using `math.ceil` or something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14044
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69509768
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,51 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("ParseUrl") {
+def checkParseUrl(expected: String, urlStr: String, partToExtract: 
String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal.create(urlStr, StringType),
+  Literal.create(partToExtract, StringType))), expected)
+}
+def checkParseUrlWithKey(
+expected: String, urlStr: String,
+partToExtract: String, key: String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal.create(urlStr, StringType), 
Literal.create(partToExtract, StringType),
+  Literal.create(key, StringType))), expected)
+}
+
+checkParseUrl("spark.apache.org", 
"http://spark.apache.org/path?query=1;, "HOST")
+checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH")
+checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, 
"QUERY")
+checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF")
+checkParseUrl("http", "http://spark.apache.org/path?query=1;, 
"PROTOCOL")
+checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, 
"FILE")
+checkParseUrl("spark.apache.org:8080", 
"http://spark.apache.org:8080/path?query=1;, "AUTHORITY")
+checkParseUrl("userinfo", 
"http://useri...@spark.apache.org/path?query=1;, "USERINFO")
--- End diff --

what will happen if there is no userinfo in the url?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69509714
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,51 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("ParseUrl") {
+def checkParseUrl(expected: String, urlStr: String, partToExtract: 
String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal.create(urlStr, StringType),
+  Literal.create(partToExtract, StringType))), expected)
+}
+def checkParseUrlWithKey(
+expected: String, urlStr: String,
--- End diff --

one parameter one line please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69509699
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,51 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("ParseUrl") {
+def checkParseUrl(expected: String, urlStr: String, partToExtract: 
String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal.create(urlStr, StringType),
--- End diff --

`Literal.create(urlStr, StringType)` -> `Literal(urlString)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14044
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14044
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61744/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14044
  
**[Test build #61744 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61744/consoleFull)**
 for PR 14044 at commit 
[`45eb28a`](https://github.com/apache/spark/commit/45eb28af51203a97c22c8b9022cb38ac0451d401).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13218: [SPARK-15440] [Core] [Deploy] Add CSRF Filter for...

2016-07-04 Thread yanboliang

Github user yanboliang closed the pull request at:

https://github.com/apache/spark/pull/13218


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13218: [SPARK-15440] [Core] [Deploy] Add CSRF Filter for REST A...

2016-07-04 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/13218
  
Updated PR at #14052 , close this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14052: [SPARK-15440] [Core] [Deploy] Add CSRF Filter for REST A...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14052
  
**[Test build #61745 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61745/consoleFull)**
 for PR 14052 at commit 
[`fb610ef`](https://github.com/apache/spark/commit/fb610efc79352ca9b4501f40df629e6d127170d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14052: [SPARK-15440] [Core] [Deploy] Add CSRF Filter for...

2016-07-04 Thread yanboliang

GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/14052

[SPARK-15440] [Core] [Deploy] Add CSRF Filter for REST APIs to Spark

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark csrf-rest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14052.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14052


commit 93b1c6f7f75b8b32246d1949e775ac091b02a7e3
Author: Yanbo Liang 
Date:   2016-05-20T07:12:05Z

Add CSRF Filter for REST APIs to Spark

commit fb610efc79352ca9b4501f40df629e6d127170d4
Author: Yanbo Liang 
Date:   2016-07-04T11:16:26Z

add param: spark.rest.csrf.enable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14033
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61741/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14033
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61742/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14033
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14033
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61742 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61742/consoleFull)**
 for PR 14033 at commit 
[`8abcab5`](https://github.com/apache/spark/commit/8abcab5906a758dde97bbbdc07595e37ec16fcc9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61741 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61741/consoleFull)**
 for PR 14033 at commit 
[`9d93ddc`](https://github.com/apache/spark/commit/9d93ddc1c79819b7b65301575dc54d02c4fae577).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12972: [SPARK-15198][SQL] Support for pushing down filters for ...

2016-07-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/12972
  
No problem! thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14024: [SPARK-15923][YARN] Spark Application rest api re...

2016-07-04 Thread Sherry302

Github user Sherry302 closed the pull request at:

https://github.com/apache/spark/pull/14024


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12972: [SPARK-15198][SQL] Support for pushing down filters for ...

2016-07-04 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/12972
  
LGTM, merging to master. Sorry for leaving this PR for so long...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14044
  
LGTM pending Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61743/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #61743 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61743/consoleFull)**
 for PR 14045 at commit 
[`114a69b`](https://github.com/apache/spark/commit/114a69b30bee0bb80f9028205fc020387c29ac24).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14041: [SPARK-16359][STREAMING][KAFKA] unidoc skip kafka 0.10

2016-07-04 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14041
  
I just noticed that our nightly docs build has been failing with an error 
related to kafka (Example [1]). Will this PR fix this or should we open a new 
JIRA for this ?

[1] 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.0-docs/209/consoleFull




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14044
  
Now, I update the title and description of PR/JIRA.
The only patch in this PR is the following one word change.
```
-new Dataset[Row](sparkSession, logicalPlan, 
RowEncoder(qe.analyzed.schema))
+new Dataset[Row](sparkSession, qe, RowEncoder(qe.analyzed.schema))
```
Thank you all for fast review & advice. At first commit, I thought it is 
important to remove all repeating logics. But, now only the minimum meaningful 
code change remains.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14044
  
**[Test build #61744 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61744/consoleFull)**
 for PR 14044 at commit 
[`45eb28a`](https://github.com/apache/spark/commit/45eb28af51203a97c22c8b9022cb38ac0451d401).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #61743 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61743/consoleFull)**
 for PR 14045 at commit 
[`114a69b`](https://github.com/apache/spark/commit/114a69b30bee0bb80f9028205fc020387c29ac24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14044
  
Hi, @cloud-fan , @hvanhovell , @liancheng .

According to @cloud-fan 's advice, after changing the following, it turns 
out that the difference is not noticeable.
```
-new Dataset[Row](sparkSession, qe, RowEncoder(qe.analyzed.schema), 
skipAnalysis = true)
+new Dataset[Row](sparkSession, qe, RowEncoder(qe.analyzed.schema))
```

Exactly as you guys told, the second call of `qe.assertAnalyzed()` is not 
the root cause. The only difference resides on 
`sparkSession.sessionState.executePlan(logicalPlan)`.

I'll update the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14051
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14051
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61739/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14051
  
**[Test build #61739 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61739/consoleFull)**
 for PR 14051 at commit 
[`0acd1e0`](https://github.com/apache/spark/commit/0acd1e0c2f3a517bda064c889d3f7ee9db2d5c39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61740/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #61740 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61740/consoleFull)**
 for PR 14045 at commit 
[`5c4c1c8`](https://github.com/apache/spark/commit/5c4c1c84606763b78b6b8b774be87862df520f9c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14044: [SPARK-16360][SQL] Speed up SQL query performance...

2016-07-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14044#discussion_r69503621
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -62,7 +62,7 @@ private[sql] object Dataset {
   def ofRows(sparkSession: SparkSession, logicalPlan: LogicalPlan): 
DataFrame = {
 val qe = sparkSession.sessionState.executePlan(logicalPlan)
 qe.assertAnalyzed()
-new Dataset[Row](sparkSession, logicalPlan, 
RowEncoder(qe.analyzed.schema))
+new Dataset[Row](sparkSession, qe, RowEncoder(qe.analyzed.schema), 
skipAnalysis = true)
--- End diff --

Oh, I misunderstand your point.
You mean 1) changing `logicalPlan` , but 2) `skipAnalysis = false`.
Okay. I'll report soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61742 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61742/consoleFull)**
 for PR 14033 at commit 
[`8abcab5`](https://github.com/apache/spark/commit/8abcab5906a758dde97bbbdc07595e37ec16fcc9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14039
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14039
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61738/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14039
  
**[Test build #61738 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61738/consoleFull)**
 for PR 14039 at commit 
[`55c8e03`](https://github.com/apache/spark/commit/55c8e034f9a4e231d49c79a77631da58e6130afd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69503280
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +94,70 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with CodegenFallback {
+
+  private lazy val numRows = try {
+children.head.eval().asInstanceOf[Int]
+  } catch {
+case _: ClassCastException =>
+  throw new AnalysisException("The number of rows must be a positive 
constant integer.")
+  }
+
+  private lazy val numFields = ((children.length - 1) + numRows - 1) / 
numRows
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 
arguments.")
+} else if (children.head.dataType != IntegerType || 
!children.head.foldable ||
+  children.head.eval().asInstanceOf[Int] < 1) {
+  TypeCheckResult.TypeCheckFailure("The number of rows must be a 
positive constant integer.")
+} else {
+  for (i <- 1 until children.length) {
+val j = (i - 1) % numFields
+if (children(i).dataType != elementSchema.fields(j).dataType) {
+  return TypeCheckResult.TypeCheckFailure(
+s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " 
+
--- End diff --

I see. Thank you for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61741/consoleFull)**
 for PR 14033 at commit 
[`9d93ddc`](https://github.com/apache/spark/commit/9d93ddc1c79819b7b65301575dc54d02c4fae577).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69503069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +94,70 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with CodegenFallback {
+
+  private lazy val numRows = try {
+children.head.eval().asInstanceOf[Int]
+  } catch {
+case _: ClassCastException =>
--- End diff --

Oh, indeed. Without that, all test passes. `elementSchema` is not called 
before.
During developing, I thought I found a case for that. But, I must be 
confused at some mixed cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12691: [Spark-14761][SQL][WIP] Reject invalid join methods when...

2016-07-04 Thread bkpathak

Github user bkpathak commented on the issue:

https://github.com/apache/spark/pull/12691
  
Hi @JoshRosen, could you please look at the pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14044
  
Thank you for review, @liancheng .
I'm sure that the performance of Analyzer need to be improved. But, in any 
cases, the cost of analyzer cannot be zero.
We should skip the redundant analysis. IMO, that idea sounds orthogonal to 
this PR. So, I asked @hvanhovell to make a PR for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14044: [SPARK-16360][SQL] Speed up SQL query performance...

2016-07-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14044#discussion_r69502213
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -62,7 +62,7 @@ private[sql] object Dataset {
   def ofRows(sparkSession: SparkSession, logicalPlan: LogicalPlan): 
DataFrame = {
 val qe = sparkSession.sessionState.executePlan(logicalPlan)
 qe.assertAnalyzed()
-new Dataset[Row](sparkSession, logicalPlan, 
RowEncoder(qe.analyzed.schema))
+new Dataset[Row](sparkSession, qe, RowEncoder(qe.analyzed.schema), 
skipAnalysis = true)
--- End diff --

I think I wrote the result in the PR description. Is it not what you mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14051
  
**[Test build #61739 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61739/consoleFull)**
 for PR 14051 at commit 
[`0acd1e0`](https://github.com/apache/spark/commit/0acd1e0c2f3a517bda064c889d3f7ee9db2d5c39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #61740 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61740/consoleFull)**
 for PR 14045 at commit 
[`5c4c1c8`](https://github.com/apache/spark/commit/5c4c1c84606763b78b6b8b774be87862df520f9c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14044: [SPARK-16360][SQL] Speed up SQL query performance by rem...

2016-07-04 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14044
  
Agree with @hvanhovell. Analysis should never take so long a time for such 
a simple query. We should avoid duplicated analysis work, but fixing 
performance issue(s) within the analyzer seems to be more resultful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14046: [SPARK-16366][SPARKR] Fix time comparison failure...

2016-07-04 Thread sun-rui

Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/14046#discussion_r69501135
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1258,10 +1258,12 @@ test_that("date functions on a DataFrame", {
   df2 <- createDataFrame(l2)
   expect_equal(collect(select(df2, minute(df2$b)))[, 1], c(34, 24))
   expect_equal(collect(select(df2, second(df2$b)))[, 1], c(0, 34))
-  expect_equal(collect(select(df2, from_utc_timestamp(df2$b, "JST")))[, 1],
-   c(as.POSIXlt("2012-12-13 21:34:00 UTC"), 
as.POSIXlt("2014-12-15 10:24:34 UTC")))
-  expect_equal(collect(select(df2, to_utc_timestamp(df2$b, "JST")))[, 1],
-   c(as.POSIXlt("2012-12-13 03:34:00 UTC"), 
as.POSIXlt("2014-12-14 16:24:34 UTC")))
+  t <- c(as.POSIXlt("2012-12-13 21:34:00 UTC"), as.POSIXlt("2014-12-15 
10:24:34 UTC"))
+  attr(t, "tzone") <- NULL
--- End diff --

I do not have a deep understanding of time zone in R. let me spend some 
time to see if I can have a better fix. May need a look at date/time handling 
in serde.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13818: [SPARK-15968][SQL] Nonempty partitioned metastore tables...

2016-07-04 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13818
  
Shall we also have this in branch-2.0? This seems to be a pretty serious 
bug. cc @rxin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/14039
  
@markhamstra Thanks for the comment. I think the reuse of fragments highly 
depends on user's queries, catalyst optimizer, cluster resources... Reusing 
`ShuffledRowRDD` shuffle data in a single job is a good idea though, it seems 
difficult to stay the data in multiple jobs because spark cannot know when the 
data should be garbaged-collected and it possibly eats much disk space. I think 
caching mechanism is a better idea to reuse fragments in multiple jobs. Or,  do 
u have any smart/concrete idea to reuse the shuffle data?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69499913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

Let me do it in this PR. Thank you for your review! : ) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69499892
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

We do not support non-foldable limit clauses.

https://github.com/apache/spark/blob/d063898bebaaf4ec2aad24c3ac70aabdbf97a190/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L67-L89

https://github.com/apache/spark/blob/d063898bebaaf4ec2aad24c3ac70aabdbf97a190/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L398-L401

But,,, we do not issue an exception if users do it. Thus, the error we got 
is strange:
```
assertion failed: No plan for GlobalLimit (_nondeterministic#203 > 0.2)
+- Project [key#11, value#12, rand(-1441968339187861415) AS 
_nondeterministic#203]
   +- LocalLimit (_nondeterministic#202 > 0.2)
  +- Project [key#11, value#12, rand(-1308350387169017676) AS 
_nondeterministic#202]
 +- LogicalRDD [key#11, value#12]

java.lang.AssertionError: assertion failed: No plan for GlobalLimit 
(_nondeterministic#203 > 0.2)
+- Project [key#11, value#12, rand(-1441968339187861415) AS 
_nondeterministic#203]
   +- LocalLimit (_nondeterministic#202 > 0.2)
  +- Project [key#11, value#12, rand(-1308350387169017676) AS 
_nondeterministic#202]
 +- LogicalRDD [key#11, value#12]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69499666
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

- Oracle: 
http://docs.oracle.com/javadb/10.5.3.0/ref/rrefsqljoffsetfetch.html
- DB2 z/OS: 
https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sql_fetchfirstclause.html
- MySQL: http://dev.mysql.com/doc/refman/5.7/en/select.html
- PostgreSQL: https://www.postgresql.org/docs/8.1/static/queries-limit.html

It sounds like nobody supports it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13818: [SPARK-15968][SQL] Nonempty partitioned metastore...

2016-07-04 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13818


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13818: [SPARK-15968][SQL] Nonempty partitioned metastore tables...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13818
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/14039
  
@srowen My understanding is that shuffle data in stages are possibly shared 
in a job. However, once the job is finished, the current implementation cannot 
reuse the shuffle data anymore. So, we can safely remove them. Is this 
incorrect? Spark can reuse them between different jobs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14039
  
**[Test build #61738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61738/consoleFull)**
 for PR 14039 at commit 
[`55c8e03`](https://github.com/apache/spark/commit/55c8e034f9a4e231d49c79a77631da58e6130afd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69498807
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

ah, but it's still foldable. Is it possible it's non-foldable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14044: [SPARK-16360][SQL] Speed up SQL query performance...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14044#discussion_r69498744
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -62,7 +62,7 @@ private[sql] object Dataset {
   def ofRows(sparkSession: SparkSession, logicalPlan: LogicalPlan): 
DataFrame = {
 val qe = sparkSession.sessionState.executePlan(logicalPlan)
 qe.assertAnalyzed()
-new Dataset[Row](sparkSession, logicalPlan, 
RowEncoder(qe.analyzed.schema))
+new Dataset[Row](sparkSession, qe, RowEncoder(qe.analyzed.schema), 
skipAnalysis = true)
--- End diff --

can we test how much we can speed up by avoiding the duplicated check 
analysis? I think it's necessary to avoid duplicated analysis, but seems check 
analysis is not a big deal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14044: [SPARK-16360][SQL] Speed up SQL query performance...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14044#discussion_r69498260
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -62,7 +62,7 @@ private[sql] object Dataset {
   def ofRows(sparkSession: SparkSession, logicalPlan: LogicalPlan): 
DataFrame = {
 val qe = sparkSession.sessionState.executePlan(logicalPlan)
 qe.assertAnalyzed()
-new Dataset[Row](sparkSession, logicalPlan, 
RowEncoder(qe.analyzed.schema))
+new Dataset[Row](sparkSession, qe, RowEncoder(qe.analyzed.schema), 
skipAnalysis = true)
--- End diff --

we can make the `encoder` a by-name parameter in `Dataset`, then the 
`qe.assertAnalyzed()` will be called first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69498258
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

Nope. Users can input an expression here. For example, 

https://github.com/apache/spark/blob/e5d703bca85c65ce329b1e202283cfa35d109146/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L234


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69498156
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +94,70 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with CodegenFallback {
+
+  private lazy val numRows = try {
+children.head.eval().asInstanceOf[Int]
+  } catch {
+case _: ClassCastException =>
+  throw new AnalysisException("The number of rows must be a positive 
constant integer.")
+  }
+
+  private lazy val numFields = ((children.length - 1) + numRows - 1) / 
numRows
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 
arguments.")
+} else if (children.head.dataType != IntegerType || 
!children.head.foldable ||
+  children.head.eval().asInstanceOf[Int] < 1) {
+  TypeCheckResult.TypeCheckFailure("The number of rows must be a 
positive constant integer.")
+} else {
+  for (i <- 1 until children.length) {
+val j = (i - 1) % numFields
+if (children(i).dataType != elementSchema.fields(j).dataType) {
+  return TypeCheckResult.TypeCheckFailure(
+s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " 
+
--- End diff --

not a big deal, `Argument i` is also fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69498130
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +94,70 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with CodegenFallback {
+
+  private lazy val numRows = try {
+children.head.eval().asInstanceOf[Int]
+  } catch {
+case _: ClassCastException =>
--- End diff --

`elementSchema` is a method, where do we call it before the type checking?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69498054
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -46,6 +46,15 @@ trait CheckAnalysis extends PredicateHelper {
 }).length > 1
   }
 
+  private def checkLimitClause(limitExpr: Expression): Unit = {
+val numRows = limitExpr.eval().asInstanceOf[Int]
--- End diff --

is the limit expression guaranteed to be literal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14051
  
**[Test build #61737 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61737/consoleFull)**
 for PR 14051 at commit 
[`82b4edd`](https://github.com/apache/spark/commit/82b4edd374cd5cd0f4f7c87e8d2e5ec7d3fbf3f1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14051
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61737/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14051
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61736/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14034
  
**[Test build #61736 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61736/consoleFull)**
 for PR 14034 at commit 
[`3c402d3`](https://github.com/apache/spark/commit/3c402d304883fa83712f07cd09a3bbe765b1f071).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14051: [SPARK-16372][MLlib] RowMatrix constructor should use re...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14051
  
**[Test build #61737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61737/consoleFull)**
 for PR 14051 at commit 
[`82b4edd`](https://github.com/apache/spark/commit/82b4edd374cd5cd0f4f7c87e8d2e5ec7d3fbf3f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14051: [SPARK-16372][MLlib] RowMatrix constructor should...

2016-07-04 Thread yinxusen

GitHub user yinxusen opened a pull request:

https://github.com/apache/spark/pull/14051

[SPARK-16372][MLlib] RowMatrix constructor should use retag for Java 
compatibility

## What changes were proposed in this pull request?

The following Java code because of type erasing:

```Java
JavaRDD rows = jsc.parallelize(...);
RowMatrix mat = new RowMatrix(rows.rdd());
QRDecomposition result = mat.tallSkinnyQR(true);
```

We should use retag to restore the type to prevent the following exception:

```Java
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to 
[Lorg.apache.spark.mllib.linalg.Vector;
```


## How was this patch tested?

Java unit test




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yinxusen/spark SPARK-16372

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14051


commit 82b4edd374cd5cd0f4f7c87e8d2e5ec7d3fbf3f1
Author: Xusen Yin 
Date:   2016-07-05T00:03:44Z

add retag




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14048: [SPARK-16370][SQL] Union queries with side effects shoul...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14048
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61734/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14048: [SPARK-16370][SQL] Union queries with side effects shoul...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14048
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14048: [SPARK-16370][SQL] Union queries with side effects shoul...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14048
  
**[Test build #61734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61734/consoleFull)**
 for PR 14048 at commit 
[`e59c3d1`](https://github.com/apache/spark/commit/e59c3d12a945575776f99f0766adb54a90bd2cf1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aw...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14049
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aw...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14049
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61735/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aw...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14049
  
**[Test build #61735 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61735/consoleFull)**
 for PR 14049 at commit 
[`72991db`](https://github.com/apache/spark/commit/72991dbec4c5c5e38fa0ab74a6b83d87007a7f12).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #8013: [SPARK-3181][MLLIB]: Add Robust Regression Algorithm with...

2016-07-04 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/8013
  
@rxin @mengxr I'm back to US from a leave. Going to revisit PRs under me. 

I had worked with @MechCoder to implement Huber estimator in python scikit 
https://github.com/scikit-learn/scikit-learn/pull/5291 which had been merged. 
@fjiang6, @MechCoder, @sethah, are you interested in porting this feature to 
Spark which should be fairly straightforward? 

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13729: [SPARK-16008][ML] Remove unnecessary serialization in lo...

2016-07-04 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/13729
  
@sethah Late comment. Great improvement for high dimensional problems. I 
didn't test it out myself, and I wonder whether `@transient` annotation works 
in the constructor of `LogisticAggregator`. Thus, the code will be cleaner with 
using `c.add(instance)`. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14050: [MINOR][EXAMPLES] Window function examples

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14050
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14050: [MINOR][EXAMPLES] Window function examples

2016-07-04 Thread aokolnychyi

GitHub user aokolnychyi opened a pull request:

https://github.com/apache/spark/pull/14050

[MINOR][EXAMPLES] Window function examples

## What changes were proposed in this pull request?

An example that explains the usage of window functions. 
It shows the difference between no/unbounded/bounded window frames and how 
they are resolved.
The example also embraces 2 ways to define window frames: based on physical 
(rowsBetween) and logical (rangeBetween) offsets.

The example should be useful for people who do not have much experience 
with window functions since it explains how Spark internally deals with window 
frames.

## How was this patch tested?

The existing tests were run, no failures. No additional test cases are 
needed.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aokolnychyi/spark window_function_examples

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14050


commit fed17613e23c1634ae47542a00960dac77bc95fc
Author: aokolnychyi 
Date:   2016-07-04T22:25:39Z

[MINOR][EXAMPLES] Window function examples




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aw...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14049
  
**[Test build #61735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61735/consoleFull)**
 for PR 14049 at commit 
[`72991db`](https://github.com/apache/spark/commit/72991dbec4c5c5e38fa0ab74a6b83d87007a7f12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14034
  
**[Test build #61736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61736/consoleFull)**
 for PR 14034 at commit 
[`3c402d3`](https://github.com/apache/spark/commit/3c402d304883fa83712f07cd09a3bbe765b1f071).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14049: [SPARK-16369][MLlib] tallSkinnyQR of RowMatrix sh...

2016-07-04 Thread yinxusen

GitHub user yinxusen opened a pull request:

https://github.com/apache/spark/pull/14049

[SPARK-16369][MLlib] tallSkinnyQR of RowMatrix should aware of empty 
partition

## What changes were proposed in this pull request?

tallSkinnyQR of RowMatrix should aware of empty partition, which could 
cause exception from Breeze qr decomposition.

See the [archived dev 
mail](https://mail-archives.apache.org/mod_mbox/spark-dev/201510.mbox/%3ccaf7adnrycvpl3qx-vzjhq4oymiuuhoscut_tkom63cm18ik...@mail.gmail.com%3E)
 for more details.


## How was this patch tested?

Scala unit test.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yinxusen/spark SPARK-16369

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14049.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14049


commit 72991dbec4c5c5e38fa0ab74a6b83d87007a7f12
Author: Xusen Yin 
Date:   2016-07-04T22:24:55Z

fix empty partition issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14034#discussion_r69494036
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -251,6 +251,22 @@ trait CheckAnalysis extends PredicateHelper {
 s"but one table has '${firstError.output.length}' columns 
and another table has " +
 s"'${s.children.head.output.length}' columns")
 
+  case l: GlobalLimit =>
+val numRows = l.limitExpr.eval().asInstanceOf[Int]
+if (numRows < 0) {
+  failAnalysis(
+s"number_rows in limit clause must be equal to or greater 
than 0. " +
+  s"number_rows:$numRows")
+}
+
+  case l: LocalLimit =>
--- End diff --

I do not think we can merge them, but, yeah, we can create a local function 
for it. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14047: [SPARK-16368] [SQL] Fix Strange Errors When Creat...

2016-07-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14047#discussion_r69493923
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -391,6 +391,29 @@ class HiveDDLSuite
 }
   }
 
+  test("create view with mismatched schema") {
--- End diff --

Without enabling Hive supports, we are unable to `CREATE VIEW` and then 
`SELECT VIEW`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14047: [SPARK-16368] [SQL] Fix Strange Errors When Creat...

2016-07-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14047#discussion_r69493896
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -391,6 +391,29 @@ class HiveDDLSuite
 }
   }
 
+  test("create view with mismatched schema") {
--- End diff --

This is for `CREATE VIEW` Another is for `CREATE TEMP VIEW`. They are 
testing different code paths. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69493271
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +94,70 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with CodegenFallback {
+
+  private lazy val numRows = try {
+children.head.eval().asInstanceOf[Int]
+  } catch {
+case _: ClassCastException =>
+  throw new AnalysisException("The number of rows must be a positive 
constant integer.")
+  }
+
+  private lazy val numFields = ((children.length - 1) + numRows - 1) / 
numRows
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 
arguments.")
+} else if (children.head.dataType != IntegerType || 
!children.head.foldable ||
+  children.head.eval().asInstanceOf[Int] < 1) {
+  TypeCheckResult.TypeCheckFailure("The number of rows must be a 
positive constant integer.")
+} else {
+  for (i <- 1 until children.length) {
+val j = (i - 1) % numFields
+if (children(i).dataType != elementSchema.fields(j).dataType) {
+  return TypeCheckResult.TypeCheckFailure(
+s"Argument ${j + 1} (${elementSchema.fields(j).dataType}) != " 
+
--- End diff --

Should I replace to `${i}th argument`? There is no problem to change like 
that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14048: [SPARK-16370][SQL] Union queries with side effects shoul...

2016-07-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14048
  
**[Test build #61734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61734/consoleFull)**
 for PR 14048 at commit 
[`e59c3d1`](https://github.com/apache/spark/commit/e59c3d12a945575776f99f0766adb54a90bd2cf1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14048: [SPARK-16370][SQL] Union queries with side effect...

2016-07-04 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/14048

[SPARK-16370][SQL] Union queries with side effects should be executed 
eagerly

## What changes were proposed in this pull request?

Currently, some queries having side effects like `Command`, 
`InsertIntoTable`, and `CreateTableUsingAsSelect` are executed eagerly.
However, for `UNION` queries, they are executed only in `UNION ALL` queries 
with all children having side effects.
This issue executes them eagerly for both `UNION` and `UNION ALL` queries 
if one of their children has side effects.

## How was this patch tested?

Pass the Jenkins tests with a new testcase.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-16370

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14048.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14048


commit e59c3d12a945575776f99f0766adb54a90bd2cf1
Author: Dongjoon Hyun 
Date:   2016-07-04T21:42:32Z

[SPARK-16370][SQL] Union queries with side effects should be executed 
eagerly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...

2016-07-04 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/14031
  
Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13796: [SPARK-7159][ML] Add multiclass logistic regression to S...

2016-07-04 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/13796
  
@sethah I apologize for the delay. I just came back to US. Gonna make the 
first pass. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13984: [SPARK-16310][SPARKR] R na.string-like default fo...

2016-07-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/13984#discussion_r69492220
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -744,6 +747,9 @@ read.df.default <- function(path = NULL, source = NULL, 
schema = NULL, ...) {
   if (is.null(source)) {
 source <- getDefaultSqlSource()
   }
+  if (source == "csv" && is.null(options[["nullValue"]])) {
--- End diff --

AFAIK, R read.table is equivalent to read.csv, read.csv2 or read.delim - 
and only for delimited text file:
https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

Unlike in delimited/csv file, R NA is typically null in JSON, represented as
  "myString": null
(
But there is no consistent approach from what I can see in R. There is no 
support for JSON in Base. There are jsonlite, RJSONIO, rjson, and it could be 
`na` or `.na` (but again typically default to "null")

I think it will be an interesting to support custom null/NA mapping for 
other text data sources.

From what I can see nullValue is only supported in Spark for csv data 
source.

https://github.com/apache/spark/blob/0ad6ce7e54b1d8f5946dde652fa5341d15059158/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L373


_
From: Shivaram Venkataraman 
>
Sent: Monday, July 4, 2016 12:33 PM
Subject: Re: [apache/spark] [SPARK-16310][SPARKR] R na.string-like default 
for csv source (#13984)
To: apache/spark >
Cc: Felix Cheung 
>, Author 
>



In 
R/pkg/R/SQLContext.R:

> @@ -744,6 +747,9 @@ read.df.default <- function(path = NULL, source = 
NULL, schema = NULL, ...) {>if (is.null(source)) {>  source <- 
getDefaultSqlSource()>}> +  if (source == "csv" && 
is.null(options[["nullValue"]])) {

I think na.strings works for read.table and not just for read.csv in R ? Is 
the concern that NA is not a good default for other formats like JSON etc. ?

-
You are receiving this because you authored the thread.
Reply to this email directly, view it on 
GitHub,
 or mute the 
thread.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14047: [SPARK-16368] [SQL] Fix Strange Errors When Creating Vie...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14047
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61733/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14047: [SPARK-16368] [SQL] Fix Strange Errors When Creating Vie...

2016-07-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14047
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 343 matches

Mail list logo