date:20170525

[GitHub] spark issue #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateStore API...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18107
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18084: [SPARK-19900][core]Remove driver when relaunching.

2017-05-25 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18084
  
Maybe some more actions should be done in `relaunchDriver()` such as have 
`driver.worker` removes the dependency of the relaunched driver, but it will be 
sort of wasting resources to remove and later create a new driver, we should 
always prevent doing such things.

Now, to help us step forward, would you like to spend some time to create a 
valid regression test case? That will help a lot when we are discussing further 
about the proper bug-fix proposal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18114: [SPARK-20889][SparkR] Grouped documentation for d...

2017-05-25 Thread actuaryzhang

GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/18114

[SPARK-20889][SparkR] Grouped documentation for datetime column methods

## What changes were proposed in this pull request?
Grouped documentation for datetime column methods. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark sparkRDocDate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18114


commit 2c2fa800bb0f4c7f2503a08a9565a8b9ac135d69
Author: Wayne Zhang 
Date:   2017-05-25T21:07:20Z

start working on datetime functions

commit 0d2853d0cff6cbd92fcbb68cebaee0729d25eb8f
Author: Wayne Zhang 
Date:   2017-05-25T22:07:57Z

fix issue in generics and example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18078: [SPARK-10643] Make spark-submit download remote f...

2017-05-25 Thread loneknightpy

Github user loneknightpy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18078#discussion_r118583624
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -535,7 +538,7 @@ class SparkSubmitSuite
 
   test("resolves command line argument paths correctly") {
 val jars = "/jar1,/jar2" // --jars
-val files = "hdfs:/file1,file2"  // --files
+val files = "local:/file1,file2"  // --files
--- End diff --

To make it not try to download file from hdfs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18098: [SPARK-16944][Mesos] Improve data locality when l...

2017-05-25 Thread mgummelt

Github user mgummelt commented on a diff in the pull request:

https://github.com/apache/spark/pull/18098#discussion_r118586090
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -502,6 +521,25 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
 )
   }
 
+  private def satisfiesLocality(offerHostname: String): Boolean = {
+if (hostToLocalTaskCount.nonEmpty) {
--- End diff --

You've agreed that the semantics should be "Launch an executor on a host 
only when we have a task that wants to be on that host, or the configurable 
delay has elapsed"

By launching tasks on any arbitrary host when no locality info is 
available, you're violating those semantics, because even before the delay has 
elapsed, the scheduler will launch a task on an agent that no task wants to be 
on.  Does that make sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18078: [SPARK-10643] [Core] Make spark-submit download r...

2017-05-25 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18078#discussion_r118593252
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -535,7 +538,7 @@ class SparkSubmitSuite
 
   test("resolves command line argument paths correctly") {
 val jars = "/jar1,/jar2" // --jars
-val files = "hdfs:/file1,file2"  // --files
+val files = "local:/file1,file2"  // --files
--- End diff --

It is kinda difficult to test download file from hdfs now, but we should 
cover this scene in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

2017-05-25 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17864
  
@MLnick Thanks much for your comments. Yes, I think always returning Double 
is consistent with Python and R and also other transformers in ML. Plus, as 
@hhbyyh mentioned, this makes the implementation easier. Would you mind taking 
a look at the code and let me know if there is any suggestion for improvement? 
The doc is already updated to make it clear that it always returns Double 
regardless of the input type. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17864: [SPARK-20604][ML] Allow imputer to handle numeric...

2017-05-25 Thread actuaryzhang

Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17864#discussion_r118600408
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ---
@@ -94,12 +94,13 @@ private[feature] trait ImputerParams extends Params 
with HasInputCols {
  * :: Experimental ::
  * Imputation estimator for completing missing values, either using the 
mean or the median
  * of the columns in which the missing values are located. The input 
columns should be of
- * DoubleType or FloatType. Currently Imputer does not support categorical 
features
+ * numeric type. Currently Imputer does not support categorical features
  * (SPARK-15041) and possibly creates incorrect values for a categorical 
feature.
  *
  * Note that the mean/median value is computed after filtering out missing 
values.
  * All Null values in the input columns are treated as missing, and so are 
also imputed. For
  * computing median, DataFrameStatFunctions.approxQuantile is used with a 
relative error of 0.001.
+ * The output column is always of Double type regardless of the input 
column type.
--- End diff --

@MLnick Here is the note on always returning Double type. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18114: [SPARK-20889][SparkR] Grouped documentation for datetime...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18114
  
**[Test build #77391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77391/testReport)**
 for PR 18114 at commit 
[`0d2853d`](https://github.com/apache/spark/commit/0d2853d0cff6cbd92fcbb68cebaee0729d25eb8f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18114: [SPARK-20889][SparkR] Grouped documentation for datetime...

2017-05-25 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18114
  
@felixcheung  
Created this PR to update the doc for the date time methods, similar to 
#18114. About 27 date time methods are documented into one page. 
I'm attaching the snapshot of part of the new help page. 


![image](https://cloud.githubusercontent.com/assets/11082368/26474169/4ad69ef2-4164-11e7-9770-5a6cd2d1e3d6.png)

![image](https://cloud.githubusercontent.com/assets/11082368/26474173/4e83ad56-4164-11e7-9483-2404785375b2.png)

![image](https://cloud.githubusercontent.com/assets/11082368/26474150/3d61fed8-4164-11e7-9e1b-766878374b54.png)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18114: [SPARK-20889][SparkR] Grouped documentation for d...

2017-05-25 Thread actuaryzhang

Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18114#discussion_r118605422
  
--- Diff: R/pkg/R/functions.R ---
@@ -2476,24 +2430,27 @@ setMethod("from_json", signature(x = "Column", 
schema = "structType"),
 column(jc)
   })
 
-#' from_utc_timestamp
+#' @section Details:
+#' \code{from_utc_timestamp}: Given a timestamp, which corresponds to a 
certain time of day in UTC,
+#' returns another timestamp that corresponds to the same time of day in 
the given timezone.
 #'
-#' Given a timestamp, which corresponds to a certain time of day in UTC, 
returns another timestamp
-#' that corresponds to the same time of day in the given timezone.
+#' @rdname column_datetime_functions
 #'
-#' @param y Column to compute on.
-#' @param x time zone to use.
-#'
-#' @family date time functions
-#' @rdname from_utc_timestamp
-#' @name from_utc_timestamp
-#' @aliases from_utc_timestamp,Column,character-method
+#' @aliases from_utc_timestamp from_utc_timestamp,Column,character-method
 #' @export
-#' @examples \dontrun{from_utc_timestamp(df$t, 'PST')}
+#' @examples
+#'
+#' \dontrun{
+#' tmp <- mutate(df, from_utc = from_utc_timestamp(df$time, 'PST'),
+#'  to_utc = to_utc_timestamp(df$time, 'PST'),
+#'  to_unix = unix_timestamp(df$time),
+#'  to_unix2 = unix_timestamp(df$time, '-MM-dd HH'),
+#'  from_unix = from_unixtime(unix_timestamp(df$time)))
+#' head(tmp)}
 #' @note from_utc_timestamp since 1.5.0
-setMethod("from_utc_timestamp", signature(y = "Column", x = "character"),
-  function(y, x) {
-jc <- callJStatic("org.apache.spark.sql.functions", 
"from_utc_timestamp", y@jc, x)
+setMethod("from_utc_timestamp", signature(x = "Column", tz = "character"),
+  function(x, tz) {
--- End diff --

Changed the second argument to `tz` to be consistent with Scala, which also 
makes it less confusing in the doc since other methods also have `y` as 
argument. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/11974
  
Mini-batching in Spark generally isn't that efficient, since to extract a 
mini-batch you still need to iterate over the entire dataset - and that means 
reading it from disk if it doesn't fit into memory.

The performance tests posted on the jira are hard to interpret. It looks to 
me like the computation time goes down as you sample less data, but the cost 
function doesn't decrease as much. What's the conclusion? I'd be more 
interested to see how long it takes to get to the same cost, all we've showed 
so far, AFAICT, is that sampling is faster but produces a worse model. Why 
didn't those tests run until convergence?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77382/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18113: [SPARK-20890][SQL] Added min and max typed aggreg...

2017-05-25 Thread setjet

GitHub user setjet opened a pull request:

https://github.com/apache/spark/pull/18113

[SPARK-20890][SQL] Added min and max typed aggregation functions

## What changes were proposed in this pull request?
Typed Min and Max functions are missing for aggregations done on dataset. 
These are supported for DataFrames and therefore should also be part of the 
DataSet API.

Please note that it is OK that the min and max functions start the MR job 
with MAX and MIN values respectively, because only retrieved keys are returned.

## How was this patch tested?
Added some corresponding unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/setjet/spark spark-20890

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18113.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18113


commit d7159930d10cff73fb838e51e9971e9857911a5c
Author: setjet 
Date:   2017-05-25T21:08:04Z

added min and max typed aggregation functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18113: [SPARK-20890][SQL] Added min and max typed aggregation f...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18113
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18078: [SPARK-10643] [Core] Make spark-submit download remote f...

2017-05-25 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18078
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18078: [SPARK-10643] [Core] Make spark-submit download remote f...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18078
  
**[Test build #77389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77389/testReport)**
 for PR 18078 at commit 
[`62e57df`](https://github.com/apache/spark/commit/62e57df1039435c6b98dfc756ab54320dfbb627a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18078: [SPARK-10643] Make spark-submit download remote f...

2017-05-25 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18078#discussion_r118580623
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -535,7 +538,7 @@ class SparkSubmitSuite
 
   test("resolves command line argument paths correctly") {
 val jars = "/jar1,/jar2" // --jars
-val files = "hdfs:/file1,file2"  // --files
+val files = "local:/file1,file2"  // --files
--- End diff --

Could you expand on why we are changing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18078: [SPARK-10643] Make spark-submit download remote f...

2017-05-25 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18078#discussion_r118580006
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -308,6 +311,15 @@ object SparkSubmit extends CommandLineUtils {
   RPackageUtils.checkAndBuildRPackage(args.jars, printStream, 
args.verbose)
 }
 
+// In client mode, download remotes files.
--- End diff --

nit: "remotes" -> "remote"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18098: [SPARK-16944][Mesos] Improve data locality when launchin...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18098
  
**[Test build #77387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77387/testReport)**
 for PR 18098 at commit 
[`fa2daff`](https://github.com/apache/spark/commit/fa2daffa23476da88b97cfa8c08670d315b294f6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18078: [SPARK-10643] Make spark-submit download remote files to...

2017-05-25 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/18078
  
Could you also add "[Core]" tag in the title? @loneknightpy 
Also cc @cloud-fan @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17343
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17343
  
**[Test build #77382 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77382/testReport)**
 for PR 17343 at commit 
[`d4f09c2`](https://github.com/apache/spark/commit/d4f09c289f0df617435579864a5ce01d1f2059fe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateSt...

2017-05-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/18107#discussion_r118595928
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala
 ---
@@ -508,22 +508,6 @@ class FlatMapGroupsWithStateSuite extends 
StateStoreMetricsTest with BeforeAndAf
 expectedState = Some(5),  // state 
should change
 expectedTimeoutTimestamp = 5000)  // timestamp 
should change
 
-  test("StateStoreUpdater - rows are cloned before writing to StateStore") 
{
--- End diff --

This is not needed any more as the operator is not responsible for cloning 
the rows when writing to the store.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18094: [Spark-20775][SQL] Added scala support from_json

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18094
  
**[Test build #77379 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77379/testReport)**
 for PR 18094 at commit 
[`a2f99ec`](https://github.com/apache/spark/commit/a2f99ec42d54dd37d9908590d85000376108fa7a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18101: [SPARK-20874][Examples]Add Structured Streaming K...

2017-05-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18101


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18051: [SPARK-18825][SPARKR][DOCS][WIP] Eliminate duplicate lin...

2017-05-25 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18051
  
That makes sense! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18105
  
If users have not analyzed the table in Spark yet, we should respect the 
stats from hive metastore. But if users have already run the analyze table 
command in Spark, I think it's fair to ask them to re-analyze if data changed. 
BTW I don't think the analyze table command is bound with CBO, if you think the 
behavior is reason when CBO is on, I think it's also reasonable when CBO is off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18112: [SPARK-20888][SQL][DOCS] Document change of defau...

2017-05-25 Thread mallman

GitHub user mallman opened a pull request:

https://github.com/apache/spark/pull/18112

[SPARK-20888][SQL][DOCS] Document change of default setting of 
spark.sql.hive.caseSensitiveInferenceMode

(Link to Jira: https://issues.apache.org/jira/browse/SPARK-20888)

## What changes were proposed in this pull request?

Document change of default setting of 
spark.sql.hive.caseSensitiveInferenceMode configuration key from NEVER_INFO to 
INFER_AND_SAVE in the Spark SQL 2.1 to 2.2 migration notes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/VideoAmp/spark-public 
spark-20888-document_infer_and_save

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18112


commit 037e7d373ae33b3debda0c0d70ca95233fb3ca2d
Author: Michael Allman 
Date:   2017-05-25T17:49:14Z

[SPARK-20888][SQL][DOCS] Document change of default setting of
spark.sql.hive.caseSensitiveInferenceMode configuration key from
NEVER_INFO to INFER_AND_SAVE




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18094: [Spark-20775][SQL] Added scala support from_json

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18094
  
**[Test build #77379 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77379/testReport)**
 for PR 18094 at commit 
[`a2f99ec`](https://github.com/apache/spark/commit/a2f99ec42d54dd37d9908590d85000376108fa7a).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18094: [Spark-20775][SQL] Added scala support from_json

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18094
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77379/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11746: [SPARK-13602][CORE] Add shutdown hook to DriverRu...

2017-05-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11746#discussion_r118544204
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala ---
@@ -53,9 +53,11 @@ private[deploy] class DriverRunner(
   @volatile private var killed = false
 
   // Populated once finished
-  private[worker] var finalState: Option[DriverState] = None
-  private[worker] var finalException: Option[Exception] = None
-  private var finalExitCode: Option[Int] = None
+  @volatile private[worker] var finalState: Option[DriverState] = None
+  @volatile private[worker] var finalException: Option[Exception] = None
+
+  // Timeout to wait for when trying to terminate a driver.
+  private val DRIVER_TERMINATE_TIMEOUT_MS = 10 * 1000
--- End diff --

@cloud-fan Make sense. However, it requires designing an approach to set 
configurations for launching driver JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18094: [Spark-20775][SQL] Added scala support from_json

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18094
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18112: [SPARK-20888][SQL][DOCS] Document change of default sett...

2017-05-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/18112
  
@budde Can you please review (urgently) for inclusion as a migration note 
for 2.2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18112: [SPARK-20888][SQL][DOCS] Document change of default sett...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18112
  
**[Test build #77380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77380/testReport)**
 for PR 18112 at commit 
[`037e7d3`](https://github.com/apache/spark/commit/037e7d373ae33b3debda0c0d70ca95233fb3ca2d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18094: [Spark-20775][SQL] Added scala support from_json

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18094
  
**[Test build #77381 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77381/testReport)**
 for PR 18094 at commit 
[`27a8c26`](https://github.com/apache/spark/commit/27a8c26dac354a326a76381097a6669ff973501a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18092: [SPARK-20640][CORE]Make rpc timeout and retry for shuffl...

2017-05-25 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/18092
  
>> I can not think of meaningful test cases, are there any suggestions?

How about just "unit tests" ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batc...

2017-05-25 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/11974#discussion_r118546602
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala ---
@@ -89,6 +92,9 @@ class KMeansSuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultR
 intercept[IllegalArgumentException] {
   new KMeans().setInitSteps(0)
 }
+intercept[IllegalArgumentException] {
+  new KMeans().setMiniBatchFraction(0)
--- End diff --

Minor nit but we should probably check a few edge cases - <0, >1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18110: [SPARK-20887][CORE] support alternative keys in ConfigBu...

2017-05-25 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/18110
  
@cloud-fan, what about `SparkConf`'s `configsWithAlternatives `: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L596
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17471: [SPARK-3577] Report Spill size on disk for UnsafeExterna...

2017-05-25 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/17471
  
@sameeragarwal - Thanks for taking a look. I will update the PR adding test 
case soon. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18110: [SPARK-20887][CORE] support alternative keys in ConfigBu...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18110
  
It's only used in the `SparkCong.get(key: String)` code path, not 
`SparkConf.get(entry: ConfigEntry[T])` code path. That's why we only support 
alternative keys if users get conf value by hard-coded conf name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18092: [SPARK-20640][CORE]Make rpc timeout and retry for...

2017-05-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18092#discussion_r118547246
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -170,11 +170,17 @@ private[spark] class BlockManager(
   // service, or just our own Executor's BlockManager.
   private[spark] var shuffleServerId: BlockManagerId = _
 
+  private val registrationTimeout =
+conf.getTimeAsMs("spark.shuffle.registration.timeout", "5s")
--- End diff --

For new configurations, should we be putting these into the `config` 
package object? See 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/package.scala
 and https://github.com/apache/spark/pull/10205


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18112: [SPARK-20888][SQL][DOCS] Document change of default sett...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18112
  
**[Test build #77380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77380/testReport)**
 for PR 18112 at commit 
[`037e7d3`](https://github.com/apache/spark/commit/037e7d373ae33b3debda0c0d70ca95233fb3ca2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16578
  
Also, I'm confused about somethingâwho has jenkins retest privileges? And 
can I get them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18112: [SPARK-20888][SQL][DOCS] Document change of default sett...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18112
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77380/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18112: [SPARK-20888][SQL][DOCS] Document change of default sett...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18112
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batc...

2017-05-25 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/11974#discussion_r118548583
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -85,6 +85,20 @@ private[clustering] trait KMeansParams extends Params 
with HasMaxIter with HasFe
   def getInitSteps: Int = $(initSteps)
 
   /**
+   * The fraction of the data to update centers per iteration. Must be 
 0 and  1.
--- End diff --

nit: "fraction of data used to ..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18110: [SPARK-20887][CORE] support alternative keys in ConfigBu...

2017-05-25 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/18110
  
Ahhh, makes sense. Thanks for the clarification.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batc...

2017-05-25 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/11974#discussion_r118548684
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -85,6 +85,20 @@ private[clustering] trait KMeansParams extends Params 
with HasMaxIter with HasFe
   def getInitSteps: Int = $(initSteps)
 
   /**
+   * The fraction of the data to update centers per iteration. Must be 
 0 and  1.
+   * Default: 1.0.
+   * @group param
+   */
+  @Since("2.3.0")
+  final val miniBatchFraction = new DoubleParam(this, "miniBatchFraction", 
"The fraction of the" +
+" data to update clustering centers per iteration. Must be in (0, 1].",
--- End diff --

ditto here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batc...

2017-05-25 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/11974#discussion_r118548792
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -85,6 +85,20 @@ private[clustering] trait KMeansParams extends Params 
with HasMaxIter with HasFe
   def getInitSteps: Int = $(initSteps)
 
   /**
+   * The fraction of the data to update centers per iteration. Must be 
 0 and  1.
+   * Default: 1.0.
+   * @group param
+   */
+  @Since("2.3.0")
+  final val miniBatchFraction = new DoubleParam(this, "miniBatchFraction", 
"The fraction of the" +
+" data to update clustering centers per iteration. Must be in (0, 1].",
--- End diff --

and "cluster centers" rather than "clustering" centers


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18064
  
**[Test build #77374 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77374/testReport)**
 for PR 18064 at commit 
[`def0878`](https://github.com/apache/spark/commit/def0878e70c5e1b247b761dee6fdc945ce49a2e3).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77374/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18064
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2017-05-25 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/11974
  
cc @srowen @setha also


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18051: [SPARK-18825][SPARKR][DOCS][WIP] Eliminate duplicate lin...

2017-05-25 Thread zero323

Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/18051
  
Exactly my point. Run examples internally ([it is not hard to patch 
knitr](https://github.com/zero323/knitr/commit/7a0d8f9ddb9d77a9c235f25aca26131e83c1f6cc)
 or even `tools::Rd2ex`) to validate examples and improve online docs. #18025 
looks great - I'll try to review it when I have a spare moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18051: [SPARK-18825][SPARKR][DOCS][WIP] Eliminate duplic...

2017-05-25 Thread zero323

Github user zero323 closed the pull request at:

https://github.com/apache/spark/pull/18051


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18112: [SPARK-20888][SQL][DOCS] Document change of default sett...

2017-05-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/18112
  
CC @cloud-fan @ericl 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...

2017-05-25 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/18083
  
> I am not sure that monitoring (with real metrics) the number of dropped 
events really worth it. You just want to know if messages have been dropped 
(and having the number in the log is fine).

Even if the absolute number of dropped events doesn't matter that much I 
would still like to have this metric: it's simple to implement and being able 
to use my existing metrics-based monitoring infrastructure correlate dropped 
events with other signals can be helpful.

> For the execution time of message processing it is very interesting, but 
not having the by listener or by event type breakdowns (just the global timing) 
will not allow to do a fine grained analysis and so not to do improvements.

For now my timing is capturing the total time to process each message, 
counting the time to dequeue plus the aggregate time across all of the 
listeners. Given the current single-threaded processing strategy this is still 
a useful signal, even if not quite as useful as per-listener metrics. I agree 
that per-listener metrics would be more useful, though, so let me see if 
there's a clean refactoring to get the metrics at the per-listener level.

> So putting the counters in ListenerBus is more appropriate for me. This 
will allows to not only monitor the LiveListenerBus, but the other one too 
(like: StreamingQueryListenerBus, StreamingListenerBus, ...)

I considered this and I'll look into it, but it's less of a priority for me 
given that I'm mostly concerned about perf. bottlenecks in LiveListenerBus 
event delivery. The other listener busses don't queue/drop events and the two 
that you mentioned are actually wrapping `LiveListenerBus` and are both 
listener bus implementations as well as listeners themselves. Thus my cop-out 
suggestion is going to be to deal with those in a followup PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18098: [SPARK-16944][Mesos] Improve data locality when l...

2017-05-25 Thread mgummelt

Github user mgummelt commented on a diff in the pull request:

https://github.com/apache/spark/pull/18098#discussion_r118550928
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -393,7 +409,30 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
 val offerId = offer.getId.getValue
 val resources = remainingResources(offerId)
 
-if (canLaunchTask(slaveId, resources)) {
+var createTask = canLaunchTask(slaveId, resources)
+if (hostToLocalTaskCount.nonEmpty) {
--- End diff --

It sounds like you're assuming `hostToLocalTaskCount.empty -> there are no 
executors to launch`.  That may be true, but regardless, if that's the case, 
then `createTask` is already false, so this check is redundant, right?

To make this more clear, please factor out this code into some function 
like `satisfiesLocality`, and instead of calling it here, for consistency, 
please add it as one of the constraints in `canLaunchTask`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17094: [SPARK-19762][ML] Hierarchy for consolidating ML ...

2017-05-25 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17094#discussion_r118475804
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/LeastSquaresAggregator.scala
 ---
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.optim.aggregator
+
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.linalg.{BLAS, Vector, Vectors}
+
+/**
+ * LeastSquaresAggregator computes the gradient and loss for a 
Least-squared loss function,
+ * as used in linear regression for samples in sparse or dense vector in 
an online fashion.
+ *
+ * Two LeastSquaresAggregator can be merged together to have a summary of 
loss and gradient of
+ * the corresponding joint dataset.
+ *
+ * For improving the convergence rate during the optimization process, and 
also preventing against
+ * features with very large variances exerting an overly large influence 
during model training,
+ * package like R's GLMNET performs the scaling to unit variance and 
removing the mean to reduce
+ * the condition number, and then trains the model in scaled space but 
returns the coefficients in
+ * the original scale. See page 9 in 
http://cran.r-project.org/web/packages/glmnet/glmnet.pdf
+ *
+ * However, we don't want to apply the `StandardScaler` on the training 
dataset, and then cache
+ * the standardized dataset since it will create a lot of overhead. As a 
result, we perform the
+ * scaling implicitly when we compute the objective function. The 
following is the mathematical
+ * derivation.
+ *
+ * Note that we don't deal with intercept by adding bias here, because the 
intercept
+ * can be computed using closed form after the coefficients are converged.
+ * See this discussion for detail.
+ * 
http://stats.stackexchange.com/questions/13617/how-is-the-intercept-computed-in-glmnet
+ *
+ * When training with intercept enabled,
+ * The objective function in the scaled space is given by
+ *
+ * 
+ *$$
+ *L = 1/2n ||\sum_i w_i(x_i - \bar{x_i}) / \hat{x_i} - (y - \bar{y}) / 
\hat{y}||^2,
+ *$$
+ * 
+ *
+ * where $\bar{x_i}$ is the mean of $x_i$, $\hat{x_i}$ is the standard 
deviation of $x_i$,
+ * $\bar{y}$ is the mean of label, and $\hat{y}$ is the standard deviation 
of label.
+ *
+ * If we fitting the intercept disabled (that is forced through 0.0),
+ * we can use the same equation except we set $\bar{y}$ and $\bar{x_i}$ to 
0 instead
+ * of the respective means.
+ *
+ * This can be rewritten as
+ *
+ * 
+ *$$
+ *\begin{align}
+ * L &= 1/2n ||\sum_i (w_i/\hat{x_i})x_i - \sum_i 
(w_i/\hat{x_i})\bar{x_i} - y / \hat{y}
+ *  + \bar{y} / \hat{y}||^2 \\
+ *   &= 1/2n ||\sum_i w_i^\prime x_i - y / \hat{y} + offset||^2 = 1/2n 
diff^2
+ *\end{align}
+ *$$
+ * 
+ *
+ * where $w_i^\prime$ is the effective coefficients defined by 
$w_i/\hat{x_i}$, offset is
+ *
+ * 
+ *$$
+ *- \sum_i (w_i/\hat{x_i})\bar{x_i} + \bar{y} / \hat{y}.
+ *$$
+ * 
+ *
+ * and diff is
+ *
+ * 
+ *$$
+ *\sum_i w_i^\prime x_i - y / \hat{y} + offset
+ *$$
+ * 
+ *
+ * Note that the effective coefficients and offset don't depend on 
training dataset,
+ * so they can be precomputed.
+ *
+ * Now, the first derivative of the objective function in scaled space is
+ *
+ * 
+ *$$
+ *\frac{\partial L}{\partial w_i} = diff/N (x_i - \bar{x_i}) / 
\hat{x_i}
+ *$$
+ * 
+ *
+ * However, $(x_i - \bar{x_i})$ will densify the computation, so it's not
+ * an ideal formula when the training dataset is sparse format.
+ *
+ * This can be addressed by adding the dense $\bar{x_i} / \hat{x_i}$ terms
+ * in the end by keeping the sum of diff. The first derivative of total
+ * objective

[GitHub] spark issue #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18106
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18106
  
**[Test build #77361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77361/testReport)**
 for PR 18106 at commit 
[`a5ade70`](https://github.com/apache/spark/commit/a5ade70afe7601db16ec24956f270feb4499ee42).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Trunc(data: Expression, format: Expression = Literal(0))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18106
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77361/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-25 Thread facaiy

Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/18058
  
Resolved.

By the way,
Which one is preferable, rebase or merge?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17972: [SPARK-20723][ML]Add intermediate storage level t...

2017-05-25 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17972#discussion_r118479964
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -398,6 +398,14 @@ class DecisionTreeClassifierSuite
 
 testDefaultReadWrite(model)
   }
+
+  test("intermediate dataset storage level") {
--- End diff --

Ideally we need to test that the actual storage levels for the intermediate 
RDDs are correct. I did this for ALS - not sure if the same approach might be 
used here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17972: [SPARK-20723][ML]Add intermediate storage level t...

2017-05-25 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17972#discussion_r118479347
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -406,4 +409,21 @@ private[ml] trait HasAggregationDepth extends Params {
   /** @group expertGetParam */
   final def getAggregationDepth: Int = $(aggregationDepth)
 }
+
+/**
+ * Trait for shared param intermediateStorageLevel (default: 
"MEMORY_AND_DISK").
+ */
+private[ml] trait HasIntermediateStorageLevel extends Params {
+
+  /**
+   * Param for Param for StorageLevelfor intermediate datasets.
+   * @group expertParam
+   */
+  final val intermediateStorageLevel: Param[String] = new 
Param[String](this, "intermediateStorageLevel", "Param for StorageLevelfor 
intermediate datasets", (s: String) => 
Try(StorageLevel.fromString(s)).isSuccess && s != "NONE")
--- End diff --

should be a space: `StorageLevelfor` -> `StorageLevel for`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17770#discussion_r118479956
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -573,7 +576,7 @@ class Dataset[T] private[sql](
 Dataset.ofRows(
   sparkSession,
   LogicalRDD(
-logicalPlan.output,
+planWithBarrier.output,
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18064
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77366/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18109: Merge pull request #1 from apache/master

2017-05-25 Thread WindCanDie

GitHub user WindCanDie opened a pull request:

https://github.com/apache/spark/pull/18109

Merge pull request #1 from apache/master

2017/5/23

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WindCanDie/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18109.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18109


commit 676a7c37d45233d4cd5d7aaa4a1ff031088b81d8
Author: WindCanDie <491237...@qq.com>
Date:   2017-05-25T14:04:49Z

Merge pull request #1 from apache/master

2017/5/23




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A]R.

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18019
  
**[Test build #77371 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77371/testReport)**
 for PR 18019 at commit 
[`ab9e66e`](https://github.com/apache/spark/commit/ab9e66e957614906ceaf6bb85a5d8709feba9f90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-05-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18075
  
Thanks, sound good to me for now.
cc @ueshin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18110: [SPARK-20887][CORE] support alternative keys in C...

2017-05-25 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/18110

[SPARK-20887][CORE] support alternative keys in ConfigBuilder

## What changes were proposed in this pull request?

`ConfigBuilder` builds `ConfigEntry` which can only read value with one 
key, if we wanna change the config name but still keep the old one, it's hard 
to do.

This PR introduce `ConfigBuilder.withAlternative`, to support reading 
config value with alternative keys. And also rename 
`spark.scheduler.listenerbus.eventqueue.size` to 
`spark.scheduler.listenerbus.eventqueue.capacity` with this feature, according 
to https://github.com/apache/spark/pull/14269#discussion_r118432313

## How was this patch tested?

a new test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark config

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18110.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18110


commit 200ee5416af97bf9f1f007a8ec4c619abe4a1a2f
Author: Wenchen Fan 
Date:   2017-05-25T15:47:37Z

support alternative keys in ConfigBuilder

commit cc51dd09611adb838c1ef15aaffe11d90b0b119c
Author: Wenchen Fan 
Date:   2017-05-25T15:48:46Z

rename a config




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18110: [SPARK-20887][CORE] support alternative keys in ConfigBu...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18110
  
**[Test build #77372 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77372/testReport)**
 for PR 18110 at commit 
[`cc51dd0`](https://github.com/apache/spark/commit/cc51dd09611adb838c1ef15aaffe11d90b0b119c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17113
  
**[Test build #77367 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77367/testReport)**
 for PR 17113 at commit 
[`44c7108`](https://github.com/apache/spark/commit/44c7108bdf478f823f567d44ed703d445febf6fe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17150
  
**[Test build #3756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3756/testReport)**
 for PR 17150 at commit 
[`7a30de4`](https://github.com/apache/spark/commit/7a30de4c6d49bdaa0ef229ac0555f0581ee54d68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17113
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17113
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77367/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18110: [SPARK-20887][CORE] support alternative keys in ConfigBu...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18110
  
cc @JoshRosen @dhruve


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18111: [SPARK-20886][CORE] HadoopMapReduceCommitProtocol...

2017-05-25 Thread steveloughran

GitHub user steveloughran opened a pull request:

https://github.com/apache/spark/pull/18111

[SPARK-20886][CORE] HadoopMapReduceCommitProtocol to fail meaningfully if 
FileOutputCommitter.getWorkPath==null

## What changes were proposed in this pull request?

Handles the situation where a `FileOutputCommitter.getWorkPath()` returns 
`null` by a `require()` call and a message which explains the problem and 
includes the `toString` value of the committer for better diagnostics.

The situation occurs if the committer being passed in is a job committer, 
not a task committer, that is: it was initalised with a `JobAttemptContext` not 
a `TaskAttemptContext`.

The existing code does an  `Option(workPath.toString).getOrElse(path)` 
which *may* be an attempt to handle the null path case. If so, it isn't, 
because its the `.toString()` call which is failing. If people do think that 
code should be resilient to null work paths, that line could be changed. 
However, it may hide the underlying problem: the committer is misconfigured.

It may be a rare-occurence today, but it is more likely with modified 
subclasses of `FileOutputCommitter`, as well as possible
with some ongoing work of mine in Hadoop to better support commitment to 
cloud storage infrastructures.

## How was this patch tested?

Manually. The before & after stack traces are on the JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/spark 
cloud/SPARK-20886-committer-NPE

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18111.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18111


commit 02eb7bf0ee6b81841f22e3c46d822eaebb28e85c
Author: Steve Loughran 
Date:   2017-05-25T15:46:50Z

SPARK-20886 HadoopMapReduceCommitProtocol to fail with message if 
FileOutputCommitter.getWorkPath==null
Add a requirement.
The existing code does an Option.getWorkpath.toString() which *may* be an 
attempt to handle the null path case. If so, it isn't, because its the 
.toString() which is failing.

Change-Id: Idddf9813761e7008425542f96903bce12bedd978




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18110: [SPARK-20887][CORE] support alternative keys in ConfigBu...

2017-05-25 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/18110

FWIW I didn't actually say that we should rename that key since the cost of
the confusing name isn't that high right now. So while I don't oppose this
mechanism I'm neutral on it given that the only use case so far seems kind
of minor. I was mostly commenting just so that future readers and reviewers
can more easily spot the issue and hopefully pick better names going
forward.
On Thu, May 25, 2017 at 8:57 AM Apache Spark QA 
wrote:

> *Test build #77372 has started
> 
*
> for PR 18110 at commit cc51dd0
> 

> .
>
> â
> You are receiving this because you were mentioned.
>
>
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18105: [SPARK-20881] [SQL] Use Hive's stats in metastore when c...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18105
  
I think we should always trust Spark's table stats over Hive's, no matter 
CBO is on or not. If users update the stats at hive side, it's their own 
responsibility to update it at Spark side.

IIUC `AnalyzeTableCommand` appears before CBO right? What was the behavior 
before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateStore API...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18107
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateStore API...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18107
  
**[Test build #77363 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77363/testReport)**
 for PR 18107 at commit 
[`d645b41`](https://github.com/apache/spark/commit/d645b416ddd79b56c00bb443569de4c7af5de4fb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StateStoreId(`
  * `case class StateStoreStats()`
  * `case class UnsafeRowTuple(var key: UnsafeRow = null, var value: 
UnsafeRow = null) `
  * `trait StateStoreWriter extends StatefulOperator `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateStore API...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18107
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77363/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17770
  
LGTM except some minor comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-25 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16989#discussion_r118482160
  
--- Diff: 
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
 ---
@@ -401,4 +413,64 @@ class ShuffleBlockFetcherIteratorSuite extends 
SparkFunSuite with PrivateMethodT
 assert(id3 === ShuffleBlockId(0, 2, 0))
   }
 
+  test("Blocks should be shuffled to disk when size of the request is 
above the" +
+" threshold(maxReqSizeShuffleToMem).") {
+val blockManager = mock(classOf[BlockManager])
+val localBmId = BlockManagerId("test-client", "test-client", 1)
+doReturn(localBmId).when(blockManager).blockManagerId
+
+val diskBlockManager = mock(classOf[DiskBlockManager])
+doReturn{
+  var blockId = new TempLocalBlockId(UUID.randomUUID())
--- End diff --

.. sorry for nit ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18064
  
**[Test build #77366 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77366/testReport)**
 for PR 18064 at commit 
[`513bb9b`](https://github.com/apache/spark/commit/513bb9b23e823271d10aa5b125eb89f1671cb88b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-25 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18058
  
Merged into master and branch-2.2. Thanks for all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18091
  
**[Test build #77364 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77364/testReport)**
 for PR 18091 at commit 
[`c79de07`](https://github.com/apache/spark/commit/c79de072fd4c0e32f5a62d15f8d921095d4e3bf0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18075: [SPARK-18016][SQL][CATALYST] Code Generation: Con...

2017-05-25 Thread bdrillard

Github user bdrillard commented on a diff in the pull request:

https://github.com/apache/spark/pull/18075#discussion_r118497931
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -233,10 +223,124 @@ class CodegenContext {
   // The collection of sub-expression result resetting methods that need 
to be called on each row.
   val subexprFunctions = mutable.ArrayBuffer.empty[String]
 
-  def declareAddedFunctions(): String = {
-addedFunctions.map { case (funcName, funcCode) => funcCode 
}.mkString("\n")
+  /**
+   * Holds the class and instance names to be generated. `OuterClass` is a 
placeholder standing for
+   * whichever class is generated as the outermost class and which will 
contain any nested
+   * sub-classes. All other classes and instance names in this list will 
represent private, nested
+   * sub-classes.
+   */
+  private val classes: mutable.ListBuffer[(String, String)] =
+mutable.ListBuffer[(String, String)](("OuterClass", null))
+
+  // A map holding the current size in bytes of each class to be generated.
+  private val classSize: mutable.Map[String, Int] =
+mutable.Map[String, Int](("OuterClass", 0))
+
+  // A map holding lists of functions belonging to their class.
+  private val classFunctions: mutable.Map[String, 
mutable.ListBuffer[String]] =
+mutable.Map(("OuterClass", mutable.ListBuffer.empty[String]))
+
+  // Returns the size of the most recently added class.
+  private def currClassSize(): Int = classSize(classes.head._1)
+
+  // Returns the class name and instance name for the most recently added 
class.
+  private def currClass(): (String, String) = classes.head
+
+  // Adds a new class. Requires the class' name, and its instance name.
+  private def addClass(className: String, classInstance: String): Unit = {
+classes.prepend(Tuple2(className, classInstance))
+classSize += className -> 0
+classFunctions += className -> mutable.ListBuffer.empty[String]
+  }
+
+  /**
+   * Adds a function to the generated class. If the code for the 
`OuterClass` grows too large, the
+   * function will be inlined into a new private, nested class, and a 
class-qualified name for the
+   * function will be returned. Otherwise, the function will be inined to 
the `OuterClass` the
+   * simple `funcName` will be returned.
+   *
+   * @param funcName the class-unqualified name of the function
+   * @param funcCode the body of the function
+   * @return the name of the function, qualified by class if it will be 
inlined to a private,
+   * nested sub-class
+   */
+  def addNewFunction(
+funcName: String,
+funcCode: String,
+inlineToOuterClass: Boolean = false): String = {
+// The number of named constants that can exist in the class is 
limited by the Constant Pool
+// limit, 65,536. We cannot know how many constants will be inserted 
for a class, so we use a
+// threshold of 1600k bytes to determine when a function should be 
inlined to a private, nested
+// sub-class.
+val classInfo = if (inlineToOuterClass) {
+  ("OuterClass", "")
+} else if (currClassSize > 160) {
+  val className = freshName("NestedClass")
+  val classInstance = freshName("nestedClassInstance")
+
+  addClass(className, classInstance)
+
+  className -> classInstance
+} else {
+  currClass()
+}
+val name = classInfo._1
+
+classSize.update(name, classSize(name) + funcCode.length)
+classFunctions.update(name, classFunctions(name) += funcCode)
--- End diff --

Here's a commit with that change if you think it checks out: 
https://github.com/apache/spark/pull/18075/commits/c225f3ad3b5183be6c637633b0ebffc765be9532#diff-8bcc5aea39c73d4bf38aef6f6951d42cL290


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17310: [SPARK-18579][SQL] Use ignoreLeadingWhiteSpace and ignor...

2017-05-25 Thread prabcs

Github user prabcs commented on the issue:

https://github.com/apache/spark/pull/17310
  
OK, great then !! We'll use 2.2
Thanks !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17150
  
**[Test build #3756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3756/testReport)**
 for PR 17150 at commit 
[`7a30de4`](https://github.com/apache/spark/commit/7a30de4c6d49bdaa0ef229ac0555f0581ee54d68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18107: [SPARK-20883][SPARK-20376][SS] Refactored StateStore API...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18107
  
**[Test build #3755 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3755/testReport)**
 for PR 18107 at commit 
[`d645b41`](https://github.com/apache/spark/commit/d645b416ddd79b56c00bb443569de4c7af5de4fb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StateStoreId(`
  * `case class StateStoreStats()`
  * `case class UnsafeRowTuple(var key: UnsafeRow = null, var value: 
UnsafeRow = null) `
  * `trait StateStoreWriter extends StatefulOperator `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18015
  
**[Test build #3758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3758/testReport)**
 for PR 18015 at commit 
[`db108bc`](https://github.com/apache/spark/commit/db108bcd53a021d8a654371aa567d444b542f24a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18060: [SPARK-20835][Core]It should exit directly when the --to...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18060
  
**[Test build #3757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3757/testReport)**
 for PR 18060 at commit 
[`eae0f3d`](https://github.com/apache/spark/commit/eae0f3d4a22911156b1bf47dd6df0cd0ed31dc28).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-25 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/18058
  
I personally prefer merging when the PR is still in progress - it preserves 
the commit history for reviewers. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18058
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18058
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77369/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18058
  
**[Test build #77369 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77369/testReport)**
 for PR 18058 at commit 
[`fcaee9e`](https://github.com/apache/spark/commit/fcaee9e199cf4b1e716e1ebd8d42d8ccaa429545).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class DecisionTreeClassifierWrapperWriter(instance: 
DecisionTreeClassifierWrapper)`
  * `  class DecisionTreeClassifierWrapperReader extends 
MLReader[DecisionTreeClassifierWrapper] `
  * `  class DecisionTreeRegressorWrapperWriter(instance: 
DecisionTreeRegressorWrapper)`
  * `  class DecisionTreeRegressorWrapperReader extends 
MLReader[DecisionTreeRegressorWrapper] `
  * `class HasMinSupport(Params):`
  * `class HasMinConfidence(Params):`
  * `case class UnresolvedHint(name: String, parameters: Seq[String], 
child: LogicalPlan)`
  * `case class ResolvedHint(child: LogicalPlan, hints: HintInfo = 
HintInfo())`
  * `case class HintInfo(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

301 - 400 of 591 matches

Mail list logo