[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-07 Thread weiqingy
Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r110511571
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
@@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
 
 object SharedState {
 
+  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
--- End diff --

In a [prior PR](https://github.com/apache/spark/pull/16324), 
FsUrlStreamHandlerFactory is set to JVM URL class directly. @gatorsmile raised 
a concern that `URL.setURLStreamHandlerFactory` can be called only once per 
JVM, and that is the motivation of this PR. Either one is OK for me; however 
we've got to choose one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17574: [SPARK-20264][SQL] asm should be non-test dependency in ...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17574
  
**[Test build #75618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75618/testReport)**
 for PR 17574 at commit 
[`2a03188`](https://github.com/apache/spark/commit/2a0318882a3133cc3dbd88f824a92f83cdf2c5e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17574: [SPARK-20264][SQL] asm should be non-test depende...

2017-04-07 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/17574

[SPARK-20264][SQL] asm should be non-test dependency in sql/core

## What changes were proposed in this pull request?
sq/core module currently declares asm as a test scope dependency. 
Transitively it should actually be a normal dependency since the actual core 
module defines it. This occasionally confuses IntelliJ.

## How was this patch tested?
N/A - This is a build change.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-20264

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17574


commit 2a0318882a3133cc3dbd88f824a92f83cdf2c5e7
Author: Reynold Xin 
Date:   2017-04-08T05:46:28Z

[SPARK-20264][SQL] asm should be non-test dependency in sql/core




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110510848
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -225,20 +225,28 @@ case class Invoke(
   getFuncResult(ev.value, s"${obj.value}.$functionName($argString)")
 } else {
   val funcResult = ctx.freshName("funcResult")
-  s"""
-Object $funcResult = null;
-${getFuncResult(funcResult, 
s"${obj.value}.$functionName($argString)")}
-if ($funcResult == null) {
-  ${ev.isNull} = true;
-} else {
+  if (!returnNullable) {
--- End diff --

since we have `postNullCheck`, can we always go to this branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110510800
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -608,7 +616,7 @@ case class MapObjects private(
$convertedArray = $arrayConstructor;
  """,
 genValue => s"$convertedArray[$loopIndex] = $genValue;",
-s"new ${classOf[GenericArrayData].getName}($convertedArray);"
+s"new ${classOf[GenericArrayData].getName}($convertedArray); 
/*###*/"
--- End diff --

?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110510776
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -577,7 +584,7 @@ object ScalaReflection extends ScalaReflection {
   udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt(),
   Nil,
   dataType = 
ObjectType(udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt()))
-Invoke(obj, "serialize", udt, inputObject :: Nil)
+Invoke(obj, "serialize", udt, inputObject :: Nil, returnNullable = 
false)
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110510779
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -586,7 +593,7 @@ object ScalaReflection extends ScalaReflection {
   udt.getClass,
   Nil,
   dataType = ObjectType(udt.getClass))
-Invoke(obj, "serialize", udt, inputObject :: Nil)
+Invoke(obj, "serialize", udt, inputObject :: Nil, returnNullable = 
false)
--- End diff --

sam here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110510765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -356,7 +361,8 @@ object ScalaReflection extends ScalaReflection {
   udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt(),
   Nil,
   dataType = 
ObjectType(udt.userClass.getAnnotation(classOf[SQLUserDefinedType]).udt()))
-Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
+Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil,
--- End diff --

The `deserialize` is totally implemented by users, can we guarantee not 
return null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks ...

2017-04-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17569#discussion_r110510773
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -365,7 +371,8 @@ object ScalaReflection extends ScalaReflection {
   udt.getClass,
   Nil,
   dataType = ObjectType(udt.getClass))
-Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil)
+Invoke(obj, "deserialize", ObjectType(udt.userClass), getPath :: 
Nil,
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...

2017-04-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17568#discussion_r110510575
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2230,6 +2230,8 @@ class Analyzer(
   val result = resolved transformDown {
 case UnresolvedMapObjects(func, inputData, cls) if 
inputData.resolved =>
   inputData.dataType match {
+case ArrayType(et, false) if cls.isEmpty =>
--- End diff --

To be safe, we should check:

1. no custom collection class specified
2. the `function` will convert an expression `e` to `AssertNotNull(e)`(this 
guarantees we are expecting a primitive array)
3. the `inputData` is of type array and its element is not nullable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17568: [SPARK-20254][SQL] Remove unnecessary data conver...

2017-04-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17568#discussion_r110510454
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2230,6 +2230,8 @@ class Analyzer(
   val result = resolved transformDown {
 case UnresolvedMapObjects(func, inputData, cls) if 
inputData.resolved =>
   inputData.dataType match {
+case ArrayType(et, false) if cls.isEmpty =>
--- End diff --

is it really safe to do so? The `MapObject` is not only used for null 
checking, but also to resolve struct in array.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...

2017-04-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17573
  
Thanks! Merging to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17573: [SPARK-20262][SQL] AssertNotNull should throw Nul...

2017-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17573


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17562: [SPARK-20246][SQL] should not push predicate down...

2017-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17562


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17562: [SPARK-20246][SQL] should not push predicate down throug...

2017-04-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17562
  
Thanks! Merging to master/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17573
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75617/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17573
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17573
  
**[Test build #75617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75617/testReport)**
 for PR 17573 at commit 
[`4c16795`](https://github.com/apache/spark/commit/4c16795fc1c06cdbb938195da2e4c80a469b47e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class AssertNotNull(child: Expression, walkedTypePath: 
Seq[String] = Nil)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...

2017-04-07 Thread witgo
Github user witgo commented on the issue:

https://github.com/apache/spark/pull/17567
  
LGTM.  
Are there any performance test reports?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17546
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75616/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17546
  
**[Test build #75616 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75616/testReport)**
 for PR 17546 at commit 
[`830255c`](https://github.com/apache/spark/commit/830255ce0a3476f4d56e1d6ebf4fa3d77c7b619f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-07 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/17568
  
@cloud-fan could you please review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-07 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/17569
  
@cloud-fan could you please review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17573: [SPARK-20262][SQL] AssertNotNull should throw Nul...

2017-04-07 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/17573

[SPARK-20262][SQL] AssertNotNull should throw NullPointerException

## What changes were proposed in this pull request?
AssertNotNull currently throws RuntimeException. It should throw 
NullPointerException, which is more specific.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-20262

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17573


commit 4c16795fc1c06cdbb938195da2e4c80a469b47e5
Author: Reynold Xin 
Date:   2017-04-08T00:16:43Z

[SPARK-20262][SQL] AssertNotNull should throw NullPointerException




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17573: [SPARK-20262][SQL] AssertNotNull should throw NullPointe...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17573
  
**[Test build #75617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75617/testReport)**
 for PR 17573 at commit 
[`4c16795`](https://github.com/apache/spark/commit/4c16795fc1c06cdbb938195da2e4c80a469b47e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17546: [SPARK-20233] [SQL] Apply star-join filter heuristics to...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17546
  
**[Test build #75616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75616/testReport)**
 for PR 17546 at commit 
[`830255c`](https://github.com/apache/spark/commit/830255ce0a3476f4d56e1d6ebf4fa3d77c7b619f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...

2017-04-07 Thread ioana-delaney
Github user ioana-delaney commented on a diff in the pull request:

https://github.com/apache/spark/pull/17546#discussion_r110495345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -736,6 +736,12 @@ object SQLConf {
   .checkValue(weight => weight >= 0 && weight <= 1, "The weight value 
must be in [0, 1].")
   .createWithDefault(0.7)
 
+  val JOIN_REORDER_DP_STAR_FILTER =
+buildConf("spark.sql.cbo.joinReorder.dp.star.filter")
+  .doc("Applies star-join filter heuristics to cost based join 
enumeration.")
+  .booleanConf
+  .createWithDefault(false)
--- End diff --

@ron8hu Thank you. We will keep the default false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17469
  
It seems somethine goes wrong with @holdnk and Jenkins. I think I dont have 
a permission to trigger this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17567
  
**[Test build #3646 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3646/testReport)**
 for PR 17567 at commit 
[`3828d03`](https://github.com/apache/spark/commit/3828d03caea6326659c33b37b599081d69ba8106).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17527
  
**[Test build #3647 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3647/testReport)**
 for PR 17527 at commit 
[`662f6ae`](https://github.com/apache/spark/commit/662f6aea586ef52ae0fdabc8a28e4e9674ad04ff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17572: String interpolation required for error message

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17572
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17572: String interpolation required for error message

2017-04-07 Thread vijaykramesh
Github user vijaykramesh commented on the issue:

https://github.com/apache/spark/pull/17572
  
I'm not sure if I should open my own jira for this issue or if that is 
handled by the project maintainers?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17572: String interpolation required for error message

2017-04-07 Thread vijaykramesh
GitHub user vijaykramesh opened a pull request:

https://github.com/apache/spark/pull/17572

String interpolation required for error message

## What changes were proposed in this pull request?
This error message doesn't get properly formatted because of a missing `s`. 
 Currently the error looks like:

```
Caused by: java.lang.IllegalArgumentException: requirement failed: indices 
should be one-based and in ascending order; found current=$current, 
previous=$previous; line="$line"
```
(note the literal `$current` instead of the interpolated value)


Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vijaykramesh/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17572.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17572


commit 7cd0a0defe6e3ecb4bfb249b2644298230a03ac7
Author: Vijay Ramesh 
Date:   2017-04-07T21:41:18Z

need to do string interpolation for error message to display last line




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-07 Thread map222
Github user map222 commented on the issue:

https://github.com/apache/spark/pull/17469
  
@HyukjinKwon Do I need to do something to start the Jenkins test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFil...

2017-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17570


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex

2017-04-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17570
  
Merging in master/branch-2.1.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75614/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75614 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75614/testReport)**
 for PR 17569 at commit 
[`ae5e232`](https://github.com/apache/spark/commit/ae5e232da543f6c7c5d6f6a3526bdb56c6f793b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...

2017-04-07 Thread ron8hu
Github user ron8hu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17546#discussion_r110480494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -736,6 +736,12 @@ object SQLConf {
   .checkValue(weight => weight >= 0 && weight <= 1, "The weight value 
must be in [0, 1].")
   .createWithDefault(0.7)
 
+  val JOIN_REORDER_DP_STAR_FILTER =
+buildConf("spark.sql.cbo.joinReorder.dp.star.filter")
+  .doc("Applies star-join filter heuristics to cost based join 
enumeration.")
+  .booleanConf
+  .createWithDefault(false)
--- End diff --

In Spark 2.2, we introduced a couple of new configuration parameters in 
optimizer area.  In order to play on the safe side, we set the default value to 
false.  I suggest that we can change the default value to true AFTER we are 
sure that the new optimizer feature does not cause any regression.  I think the 
system regression/integration test suites help us make a decision. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13206: [SPARK-15420] [SQL] Add repartition and sort to prepare ...

2017-04-07 Thread Downchuck
Github user Downchuck commented on the issue:

https://github.com/apache/spark/pull/13206
  
may be fixed in https://github.com/apache/spark/pull/16898 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17558: [SPARK-20247][CORE] Add jar but this jar is missing late...

2017-04-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17558
  
agreed, why would the jar be missing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...

2017-04-07 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/17567
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17570
  
**[Test build #3645 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3645/testReport)**
 for PR 17570 at commit 
[`1c69820`](https://github.com/apache/spark/commit/1c69820f9d905f75b5d7e90b5d0e17b690e8d8bf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17546: [SPARK-20233] [SQL] Apply star-join filter heuris...

2017-04-07 Thread ioana-delaney
Github user ioana-delaney commented on a diff in the pull request:

https://github.com/apache/spark/pull/17546#discussion_r110466604
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -736,6 +736,12 @@ object SQLConf {
   .checkValue(weight => weight >= 0 && weight <= 1, "The weight value 
must be in [0, 1].")
   .createWithDefault(0.7)
 
+  val JOIN_REORDER_DP_STAR_FILTER =
+buildConf("spark.sql.cbo.joinReorder.dp.star.filter")
+  .doc("Applies star-join filter heuristics to cost based join 
enumeration.")
+  .booleanConf
+  .createWithDefault(false)
--- End diff --

@gatorsmile  I am also fine with changing the default.
@wzhfy What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17557: [SPARK-20208][WIP][R][DOCS] Document R fpGrowth s...

2017-04-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17557#discussion_r110465373
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -906,6 +910,24 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ FP-growth
+
+`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on 
a `SparkDataFrame`.
+
+* `spark.freqItemsets` method can be used to retrieve a `SparkDataFrame` 
with the frequent itemsets.
+* `spark.associationRules` returns a `SparkDataFrame` with the association 
rules.
+
+
+```{r}
+items <- selectExpr(createDataFrame(data.frame(items = c(
+  "s,t,u",
--- End diff --

something that is not coded in 3 lines ;)
reading from a file if we could - if there isn't any dataset that we can 
license to use, can we anonymize an existing one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...

2017-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17571


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17571
  
merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17571
  
right, tests don't run example anyway...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17568
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17568
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75610/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17568: [SPARK-20254][SQL] Remove unnecessary data conversion fo...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17568
  
**[Test build #75610 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75610/testReport)**
 for PR 17568 at commit 
[`6a5fa5a`](https://github.com/apache/spark/commit/6a5fa5abb8ae73eaf2866630af070e0301660149).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17527
  
**[Test build #3647 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3647/testReport)**
 for PR 17527 at commit 
[`662f6ae`](https://github.com/apache/spark/commit/662f6aea586ef52ae0fdabc8a28e4e9674ad04ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17567
  
**[Test build #3646 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3646/testReport)**
 for PR 17567 at commit 
[`3828d03`](https://github.com/apache/spark/commit/3828d03caea6326659c33b37b599081d69ba8106).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17567
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75609/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17567
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17567: [SPARK-19991][CORE][YARN] FileSegmentManagedBuffer perfo...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17567
  
**[Test build #75609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75609/testReport)**
 for PR 17567 at commit 
[`3828d03`](https://github.com/apache/spark/commit/3828d03caea6326659c33b37b599081d69ba8106).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17571
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75615/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17571
  
**[Test build #75615 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75615/testReport)**
 for PR 17571 at commit 
[`f7e71ea`](https://github.com/apache/spark/commit/f7e71ea8c01d44852fde9c1a6a930e09cc95d2e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17557: [SPARK-20208][WIP][R][DOCS] Document R fpGrowth s...

2017-04-07 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17557#discussion_r110459968
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -906,6 +910,24 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ FP-growth
+
+`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on 
a `SparkDataFrame`.
+
+* `spark.freqItemsets` method can be used to retrieve a `SparkDataFrame` 
with the frequent itemsets.
+* `spark.associationRules` returns a `SparkDataFrame` with the association 
rules.
+
+
+```{r}
+items <- selectExpr(createDataFrame(data.frame(items = c(
+  "s,t,u",
--- End diff --

What do you mean by "real"? Something human readable (e.g. milk, bread, 
butter) or some standard pattern mining dataset? If the former one then it is 
not a problem. If the latter one I am not aware of any dataset which would be 
safe enough on the license side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17571
  
**[Test build #75613 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75613/testReport)**
 for PR 17571 at commit 
[`95b5383`](https://github.com/apache/spark/commit/95b5383fae4da22aa0552e969c05b9488accb1a1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75613/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17571
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17571
  
**[Test build #75615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75615/testReport)**
 for PR 17571 at commit 
[`f7e71ea`](https://github.com/apache/spark/commit/f7e71ea8c01d44852fde9c1a6a930e09cc95d2e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...

2017-04-07 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17571#discussion_r110457676
  
--- Diff: examples/src/main/r/ml/glm.R ---
@@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family 
= "gaussian")
 summary(gaussianGLM2)
 
 # Fit a generalized linear model of family "binomial" with spark.glm
-training2 <- read.df("data/mllib/sample_binary_classification_data.txt", 
source = "libsvm")
-df_list2 <- randomSplit(training2, c(7,3), 2)
+training2 <- 
read.df("/data/mllib/sample_multiclass_classification_data.txt", source = 
"libsvm")
--- End diff --

Thanks! copy paste error. Corrected now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...

2017-04-07 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17571#discussion_r110457466
  
--- Diff: examples/src/main/r/ml/glm.R ---
@@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family 
= "gaussian")
 summary(gaussianGLM2)
 
 # Fit a generalized linear model of family "binomial" with spark.glm
-training2 <- read.df("data/mllib/sample_binary_classification_data.txt", 
source = "libsvm")
--- End diff --

just a bad example. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...

2017-04-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17571#discussion_r110457459
  
--- Diff: examples/src/main/r/ml/glm.R ---
@@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family 
= "gaussian")
 summary(gaussianGLM2)
 
 # Fit a generalized linear model of family "binomial" with spark.glm
-training2 <- read.df("data/mllib/sample_binary_classification_data.txt", 
source = "libsvm")
-df_list2 <- randomSplit(training2, c(7,3), 2)
+training2 <- 
read.df("/data/mllib/sample_multiclass_classification_data.txt", source = 
"libsvm")
--- End diff --

actually, you might need to leave it as relative path, ie. not starting 
with `/`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17571
  
@felixcheung 
Just noticed that the current example for logistic regression in the 
programming guide did not seem to be a good one. It did not converge using 
IRWLS, and Quasi-Newton yielded almost zero estimates for all coefficients.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...

2017-04-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17571#discussion_r110457201
  
--- Diff: examples/src/main/r/ml/glm.R ---
@@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family 
= "gaussian")
 summary(gaussianGLM2)
 
 # Fit a generalized linear model of family "binomial" with spark.glm
-training2 <- read.df("data/mllib/sample_binary_classification_data.txt", 
source = "libsvm")
--- End diff --

is this an issue with `binary_classification_data` data?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic regressio...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17571
  
**[Test build #75613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75613/testReport)**
 for PR 17571 at commit 
[`95b5383`](https://github.com/apache/spark/commit/95b5383fae4da22aa0552e969c05b9488accb1a1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75614/testReport)**
 for PR 17569 at commit 
[`ae5e232`](https://github.com/apache/spark/commit/ae5e232da543f6c7c5d6f6a3526bdb56c6f793b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17571: [SPARK-20258][Doc][SparkR] Fix SparkR logistic re...

2017-04-07 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/17571

[SPARK-20258][Doc][SparkR] Fix SparkR logistic regression example in 
programming guide (did not converge) 

## What changes were proposed in this pull request?

SparkR logistic regression example did not converge in programming guide 
(for IRWLS). All estimates are essentially zero:

```
training2 <- read.df("data/mllib/sample_binary_classification_data.txt", 
source = "libsvm")
df_list2 <- randomSplit(training2, c(7,3), 2)
binomialDF <- df_list2[[1]]
binomialTestDF <- df_list2[[2]]
binomialGLM <- spark.glm(binomialDF, label ~ features, family = "binomial")

17/04/07 11:42:03 WARN WeightedLeastSquares: Cholesky solver failed due to 
singular covariance matrix. Retrying with Quasi-Newton solver.

> summary(binomialGLM)

Coefficients:
 Estimate
(Intercept)9.0255e+00
features_0 0.e+00
features_1 0.e+00
features_2 0.e+00
features_3 0.e+00
features_4 0.e+00
features_5 0.e+00
features_6 0.e+00
features_7 0.e+00
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark programGuide2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17571


commit 95b5383fae4da22aa0552e969c05b9488accb1a1
Author: actuaryzhang 
Date:   2017-04-07T18:37:33Z

update logistic regression example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17527
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17527
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75608/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17527
  
**[Test build #75608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75608/testReport)**
 for PR 17527 at commit 
[`662f6ae`](https://github.com/apache/spark/commit/662f6aea586ef52ae0fdabc8a28e4e9674ad04ff).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17566: [SPARK-19518][SQL] IGNORE NULLS in first / last in SQL

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17566
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17566: [SPARK-19518][SQL] IGNORE NULLS in first / last in SQL

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17566
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75607/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17566: [SPARK-19518][SQL] IGNORE NULLS in first / last in SQL

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17566
  
**[Test build #75607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75607/testReport)**
 for PR 17566 at commit 
[`6d21ad8`](https://github.com/apache/spark/commit/6d21ad81073fcec7bb623635328a604fb99303a4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17516: [SPARK-20197][SPARKR] CRAN check fail with packag...

2017-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17516


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17516: [SPARK-20197][SPARKR] CRAN check fail with package insta...

2017-04-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17516
  
merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17516: [SPARK-20197][SPARKR] CRAN check fail with package insta...

2017-04-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17516
  
thanks, I find it rather odd but probably by design that the current 
directory is different when running `R CMD check .tgz`. will need to look at 
the more 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17557: [SPARK-20208][WIP][R][DOCS] Document R fpGrowth s...

2017-04-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17557#discussion_r110450322
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -906,6 +910,24 @@ predicted <- predict(model, df)
 head(predicted)
 ```
 
+ FP-growth
+
+`spark.fpGrowth` executes FP-growth algorithm to mine frequent itemsets on 
a `SparkDataFrame`.
+
+* `spark.freqItemsets` method can be used to retrieve a `SparkDataFrame` 
with the frequent itemsets.
+* `spark.associationRules` returns a `SparkDataFrame` with the association 
rules.
+
+
+```{r}
+items <- selectExpr(createDataFrame(data.frame(items = c(
+  "s,t,u",
--- End diff --

thanks! - I'd prefer example with real data...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17516: [SPARK-20197][SPARKR] CRAN check fail with package insta...

2017-04-07 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/17516
  
Got it. LGTM. Thanks for explanation. I'm fine with merging this to master !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17570
  
**[Test build #3645 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3645/testReport)**
 for PR 17570 at commit 
[`1c69820`](https://github.com/apache/spark/commit/1c69820f9d905f75b5d7e90b5d0e17b690e8d8bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex

2017-04-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17570
  
LGTM pending Jenkins.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex

2017-04-07 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17570
  
Jenkins, add to whitelist.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example fo...

2017-04-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17553


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...

2017-04-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17553
  
merged to master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17553
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17553
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75612/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17553
  
**[Test build #75612 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75612/testReport)**
 for PR 17553 at commit 
[`ca87b38`](https://github.com/apache/spark/commit/ca87b38fae0dcae66ca09db15051b2f44a3f542f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17516: [SPARK-20197][SPARKR] CRAN check fail with package insta...

2017-04-07 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17516
  
There are two parts to the branch-2.1 fix

First, the reason why the test failed was because `SPARK_HOME` was not set 
before calling `spark.install()` when running as a package. This would not be a 
problem when in Jenkins, but only when running with `R CMD check SparkR*.tgz`. 
The fix was to move `spark.install` to earlier.

Second, even after the change, while testing it, I found that `R CMD check` 
was getting `spark-warehouse` etc in the `testthat` directory, NOT in 
`SPARK_HOME` - therefore that test would be essentially a no-op or always 
passes anyway. I made the call to disable it (with `skip_if_cran`), but that 
had the unintended effect of also turning off that test in Jenkins, as we are 
testing with `--as-cran` (as explained above)

And so the attempt here in this PR to fix this for real in master. Since we 
are rolling our RC anytime, I don't want to delay the first fix (install.spark) 
only to sort out the 2nd part, which could come a bit later.

If you feel that's safer, we could also add `skip_if_cran` to this test in 
master -  just know that it will also turn off this test in Jenkins. Since with 
`R CMD check` the `spark-warehouse` and `metastore_db` are not written to 
`SPARK_HOME`, but to `testthat`, this test will pass during the package test 
with `R CMD check` - so long as we merge this PR to move `install.spark` first



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17553
  
**[Test build #75612 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75612/testReport)**
 for PR 17553 at commit 
[`ca87b38`](https://github.com/apache/spark/commit/ca87b38fae0dcae66ca09db15051b2f44a3f542f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFileIndex

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17570
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17569
  
**[Test build #75611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75611/testReport)**
 for PR 17569 at commit 
[`4482e1c`](https://github.com/apache/spark/commit/4482e1c2b920e201afca1379a3686df9a4db5bc9).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17569: [SPARK-20253][SQL] Remove unnecessary nullchecks of a re...

2017-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17569
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75611/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17570: [SPARK-20255] Move listLeafFiles() to InMemoryFil...

2017-04-07 Thread adrian-ionescu
GitHub user adrian-ionescu opened a pull request:

https://github.com/apache/spark/pull/17570

[SPARK-20255] Move listLeafFiles() to InMemoryFileIndex

## What changes were proposed in this pull request

Trying to get a grip on the `FileIndex` hierarchy, I was confused by the 
following inconsistency:

On the one hand, `PartitioningAwareFileIndex` defines `leafFiles` and 
`leafDirToChildrenFiles` as abstract, but on the other it fully implements 
`listLeafFiles` which does all the listing of files. However, the latter is 
only used by `InMemoryFileIndex`.

I'm hereby proposing to move this method (and all its dependencies) to the 
implementation class that actually uses it, and thus unclutter the 
`PartitioningAwareFileIndex` interface.

## How was this patch tested?

`./build/sbt sql/test`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-ionescu/apache-spark list-leaf-files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17570


commit 1c69820f9d905f75b5d7e90b5d0e17b690e8d8bf
Author: Adrian Ionescu 
Date:   2017-04-07T17:06:49Z

Move listLeafFiles() to InMemoryFileIndex




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17553: [SPARK-20026][Doc][SparkR] Add Tweedie example for Spark...

2017-04-07 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17553
  
Issues fixed. Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >