[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15338
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed

2016-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
@rxin yes, I just wanted to avoid changing a lot. Will try to fix it in 
that way (at least) to show how it actually look like.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed

2016-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15432
  
@rxin yes, I just wanted to avoid changing a lot. Will try to fix it in 
that way (at least) to show how it actually look like.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null as inpu...

2016-10-11 Thread HyukjinKwon
GitHub user HyukjinKwon reopened a pull request:

https://github.com/apache/spark/pull/15432

[SPARK-17854][SQL] rand/randn allows null as input seed

## What changes were proposed in this pull request?

This PR proposes `rand`/`randn` accept `null` as input. In this case, it 
treats the values as `0`.

It seems MySQL also accepts this.


```sql
mysql> select rand(0);
+-+
| rand(0) |
+-+
| 0.15522042769493574 |
+-+
1 row in set (0.00 sec)

mysql> select rand(NULL);
+-+
| rand(NULL)  |
+-+
| 0.15522042769493574 |
+-+
1 row in set (0.00 sec)
```

and also Hive does according to 
[HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694)


## How was this patch tested?

Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-17854

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15432


commit 7fa7db22dd4f2ba88ab1f09e4b776003b3f62fdb
Author: hyukjinkwon 
Date:   2016-10-11T09:21:18Z

rand/randn allows null as input seed




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15338
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66759/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15421
  
That's interesting. I patched your change to a clean checkout and simply 
tested against the example on the JIRA. It throws the above exception.

  val obj = (sqlSerDe._1)(dis, dataType)
  if (obj == null) {
throw new IllegalArgumentException (s"Invalid type 
$dataType")<= this line
  } else {
obj
  } 

I have no clue why it fails on my laptop. I can test on my own server 
(ubuntu) tonight.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15421
  
I suspect that it could be related to my R installation:

localhost:~ mwang$ R

R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

But I am not sure yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...

2016-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15389#discussion_r82907726
  
--- Diff: python/pyspark/rdd.py ---
@@ -2029,7 +2028,15 @@ def coalesce(self, numPartitions, shuffle=False):
 >>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect()
 [[1, 2, 3, 4, 5]]
 """
-jrdd = self._jrdd.coalesce(numPartitions, shuffle)
+if shuffle:
+# In Scala's repartition code, we will distribute elements 
evenly across output
+# partitions. However, the RDD from Python is serialized as a 
single binary data,
+# so the distribution fails and produces highly skewed 
partitions. We need to
+# convert it to a RDD of java object before repartitioning.
+data_java_rdd = 
self._to_java_object_rdd().coalesce(numPartitions, shuffle)
--- End diff --

Hi @davies, actually it seems a simple benchmark was done in 
https://github.com/apache/spark/pull/15389#discussion_r82444378

If you worry, then, I'd like to proceed another benchmark with larger data 
and then will share when I have some time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82907729
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1035,10 +1035,16 @@ setMethod("dim",
 c(count(x), ncol(x))
   })
 
-#' Collects all the elements of a SparkDataFrame and coerces them into an 
R data.frame.
+#' Download Spark datasets into R
--- End diff --

I'm not sure this should say "datasets" - we don't have this term elsewhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82907897
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1182,10 +1195,18 @@ setMethod("take",
 #' @export
 #' @examples
 #'\dontrun{
-#' sparkR.session()
-#' path <- "path/to/file.json"
-#' df <- read.json(path)
-#' head(df)
+#' # Initialize Spark context and SQL context
+#' sc <- sparkR.init()
+#' sqlContext <- sparkRSQL.init(sc)
--- End diff --

ditto here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82907977
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1168,12 +1179,14 @@ setMethod("take",
 
 #' Head
 #'
-#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. 
If \code{num} is not
-#' specified, then head() returns the first 6 rows as with R data.frame.
+#' Return the first elements of a dataset. If \code{x} is a 
SparkDataFrame, its first 
+#' rows will be returned as a data.frame. If the dataset is a 
\code{Column}, its first 
+#' elements will be returned as a vector. The number of elements to be 
returned
+#' is given by parameter \code{num}. Default value for \code{num} is 6.
 #'
-#' @param x a SparkDataFrame.
-#' @param num the number of rows to return. Default is 6.
-#' @return A data.frame.
+#' @param x A SparkDataFrame or Column
--- End diff --

for something like this the convention we have is to add the @param in 
generics.R - you can see other examples there


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread ericl
Github user ericl commented on the issue:

https://github.com/apache/spark/pull/14690
  
Btw I also made https://github.com/VideoAmp/spark-public/pull/2/files, to 
fix inputFiles.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82908439
  
--- Diff: R/pkg/R/column.R ---
@@ -32,35 +34,65 @@ setOldClass("jobj")
 #' @export
 #' @note Column since 1.4.0
 setClass("Column",
- slots = list(jc = "jobj"))
+ slots = list(jc = "jobj", df = "SparkDataFrameOrNull"))
 
 #' A set of operations working with SparkDataFrame columns
 #' @rdname columnfunctions
 #' @name columnfunctions
 NULL
-
-setMethod("initialize", "Column", function(.Object, jc) {
+setMethod("initialize", "Column", function(.Object, jc, df) {
   .Object@jc <- jc
+
+  # Some Column objects don't have any referencing DataFrame. In such 
case, df will be NULL.
+  if (missing(df)) {
+df <- NULL
+  }
+  .Object@df <- df
   .Object
 })
 
+setMethod("show", signature = "Column", definition = function(object) {
--- End diff --

+1, default to 6 for consistency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11336
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11336
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66767/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11336
  
**[Test build #66767 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66767/consoleFull)**
 for PR 11336 at commit 
[`ed0abf2`](https://github.com/apache/spark/commit/ed0abf24d7f65ad2381f6d664ba23e440013c97a).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82908731
  
--- Diff: R/pkg/R/functions.R ---
@@ -2836,7 +2845,11 @@ setMethod("lpad", signature(x = "Column", len = 
"numeric", pad = "character"),
 setMethod("rand", signature(seed = "missing"),
   function(seed) {
 jc <- callJStatic("org.apache.spark.sql.functions", "rand")
-column(jc)
+
+# By assigning a one-row data.frame, the result of this 
function can be collected
+# returning a one-element Column
+df <- as.DataFrame(sparkRSQL.init(), data.frame(0))
--- End diff --

I think this is why test fails - do not use sparkRQL.init()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82908825
  
--- Diff: R/pkg/R/functions.R ---
@@ -2876,7 +2897,8 @@ setMethod("randn", signature(seed = "missing"),
 setMethod("randn", signature(seed = "numeric"),
   function(seed) {
 jc <- callJStatic("org.apache.spark.sql.functions", "randn", 
as.integer(seed))
-column(jc)
+df <- as.DataFrame(sparkRSQL.init(), data.frame(0))
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82908799
  
--- Diff: R/pkg/R/functions.R ---
@@ -2847,7 +2860,11 @@ setMethod("rand", signature(seed = "missing"),
 setMethod("rand", signature(seed = "numeric"),
   function(seed) {
 jc <- callJStatic("org.apache.spark.sql.functions", "rand", 
as.integer(seed))
-column(jc)
+
+# By assigning a one-row data.frame, the result of this 
function can be collected
+# returning a one-element Column
+df <- as.DataFrame(sparkRSQL.init(), data.frame(0))
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82908811
  
--- Diff: R/pkg/R/functions.R ---
@@ -2865,7 +2882,11 @@ setMethod("rand", signature(seed = "numeric"),
 setMethod("randn", signature(seed = "missing"),
   function(seed) {
 jc <- callJStatic("org.apache.spark.sql.functions", "randn")
-column(jc)
+
+# By assigning a one-row data.frame, the result of this 
function can be collected
+# returning a one-element Column
+df <- as.DataFrame(sparkRSQL.init(), data.frame(0))
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82908836
  
--- Diff: R/pkg/R/functions.R ---
@@ -3026,7 +3048,11 @@ setMethod("translate",
 setMethod("unix_timestamp", signature(x = "missing", format = "missing"),
   function(x, format) {
 jc <- callJStatic("org.apache.spark.sql.functions", 
"unix_timestamp")
-column(jc)
+
+# By assigning a one-row data.frame, the result of this 
function can be collected
+# returning a one-element Column
+df <- as.DataFrame(sparkRSQL.init(), data.frame(0))
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15438: [SPARK-17845][SQL] More self-evident window function fra...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15438
  
**[Test build #66762 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66762/consoleFull)**
 for PR 15438 at commit 
[`1913d29`](https://github.com/apache/spark/commit/1913d29b36a408e8b583fc97045847369e31ff66).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/11336
  
I know why tests fail - please see my comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15438: [SPARK-17845][SQL] More self-evident window function fra...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15438
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66762/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15438: [SPARK-17845][SQL] More self-evident window function fra...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15438
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15307#discussion_r82909681
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala
 ---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.{util => ju}
+
+import scala.collection.mutable
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.metrics.source.{Source => CodahaleSource}
+import org.apache.spark.util.Clock
+
+/**
+ * Class that manages all the metrics related to a StreamingQuery. It does 
the following.
+ * - Calculates metrics (rates, latencies, etc.) based on information 
reported by StreamExecution.
+ * - Allows the current metric values to be queried
+ * - Serves some of the metrics through Codahale/DropWizard metrics
+ *
+ * @param sources Unique set of sources in a query
+ * @param triggerClock Clock used for triggering in StreamExecution
+ * @param codahaleSourceName Root name for all the Codahale metrics
+ */
+class StreamMetrics(sources: Set[Source], triggerClock: Clock, 
codahaleSourceName: String)
+  extends CodahaleSource with Logging {
+
+  import StreamMetrics._
+
+  // Trigger infos
+  private val triggerStatus = new mutable.HashMap[String, String]
+  private val sourceTriggerStatus = new mutable.HashMap[Source, 
mutable.HashMap[String, String]]
+
+  // Rate estimators for sources and sinks
+  private val inputRates = new mutable.HashMap[Source, RateCalculator]
+  private val processingRates = new mutable.HashMap[Source, RateCalculator]
+
+  // Number of input rows in the current trigger
+  private val numInputRows = new mutable.HashMap[Source, Long]
+  private var numOutputRows: Option[Long] = None
+  private var currentTriggerStartTimestamp: Long = -1
+  private var previousTriggerStartTimestamp: Long = -1
+  private var latency: Option[Double] = None
+
+  override val sourceName: String = codahaleSourceName
+  override val metricRegistry: MetricRegistry = new MetricRegistry
+
+  // === Initialization ===
+
+  // Metric names should not have . in them, so that all the metrics of a 
query are identified
+  // together in Ganglia as a single metric group
+  registerGauge("inputRate-total", currentInputRate)
+  registerGauge("processingRate-total", () => currentProcessingRate)
+  registerGauge("latency", () => currentLatency().getOrElse(-1.0))
+
+  sources.foreach { s =>
+inputRates.put(s, new RateCalculator)
+processingRates.put(s, new RateCalculator)
+sourceTriggerStatus.put(s, new mutable.HashMap[String, String])
+
+registerGauge(s"inputRate-${s.toString}", () => 
currentSourceInputRate(s))
+registerGauge(s"processingRate-${s.toString}", () => 
currentSourceProcessingRate(s))
+  }
+
+  // === Setter methods ===
+
+  def reportTriggerStarted(triggerId: Long): Unit = synchronized {
+numInputRows.clear()
+numOutputRows = None
+triggerStatus.clear()
+sourceTriggerStatus.values.foreach(_.clear())
+
+reportTriggerStatus(TRIGGER_ID, triggerId)
+sources.foreach(s => reportSourceTriggerStatus(s, TRIGGER_ID, 
triggerId))
+reportTriggerStatus(ACTIVE, true)
+currentTriggerStartTimestamp = triggerClock.getTimeMillis()
+reportTriggerStatus(START_TIMESTAMP, currentTriggerStartTimestamp)
+  }
+
+  def reportTriggerStatus[T](key: String, value: T): Unit = synchronized {
+triggerStatus.put(key, value.toString)
+  }
+
+  def reportSourceTriggerStatus[T](source: Source, key: String, value: T): 
Unit = synchronized {
+sourceTriggerStatus(source).put(key, value.toString)
+  }
+
+  def reportNumInputRows(inputRows: Map[Source, Long]): Unit = 
synchroni

[GitHub] spark pull request #15439: [SPARK-17880][DOC] The url linking to `Accumulato...

2016-10-11 Thread sarutak
GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/15439

[SPARK-17880][DOC] The url linking to `AccumulatorV2` in the document is 
incorrect.

## What changes were proposed in this pull request?

In `programming-guide.md`, the url which links to `AccumulatorV2` says 
`api/scala/index.html#org.apache.spark.AccumulatorV2` but 
`api/scala/index.html#org.apache.spark.util.AccumulatorV2` is correct.


## How was this patch tested?
manual test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-17880

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15439


commit 623120685fcf3136007c450d5f282a5312bcce2f
Author: Kousuke Saruta 
Date:   2016-10-11T23:22:59Z

Fix the url to AccumulatorV2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15421
  
It's possible with R version - Jenkins is running 3.1.1 I think, the 
minimal supported version.
AppVeyor is running 3.3.2 I believe, which matches closer to the one 
@wangmiao1981 has


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66768 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66768/consoleFull)**
 for PR 15307 at commit 
[`3d7c71a`](https://github.com/apache/spark/commit/3d7c71a24b3fbfe86fee074b9034db4b89eca2bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15307#discussion_r82909980
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
@@ -105,11 +105,21 @@ class StreamExecution(
   var lastExecution: QueryExecution = null
 
   @volatile
-  var streamDeathCause: StreamingQueryException = null
+  private var streamDeathCause: StreamingQueryException = null
 
   /* Get the call site in the caller thread; will pass this into the micro 
batch thread */
   private val callSite = Utils.getCallSite()
 
+  /** Metrics for this query */
+  private val streamMetrics =
+new StreamMetrics(uniqueSources.toSet, triggerClock, 
s"StructuredStreaming.$name")
--- End diff --

yeah. old data cannot update internal metrics. the final posted 
QueryTerminated event in the listener bus will have the final value of the 
metrics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15375
  
This LGTM. Spark unit tests are failing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15307#discussion_r82910187
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala
 ---
@@ -56,7 +57,12 @@ case class StateStoreRestoreExec(
 child: SparkPlan)
   extends execution.UnaryExecNode with StatefulOperator {
 
+  override lazy val metrics = Map(
+"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of 
output rows"))
+
   override protected def doExecute(): RDD[InternalRow] = {
--- End diff --

`longMetrics("...")` forces `metrics` to be initialized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15307#discussion_r82910246
  
--- Diff: python/pyspark/sql/streaming.py ---
@@ -189,6 +189,282 @@ def resetTerminated(self):
 self._jsqm.resetTerminated()
 
 
+class StreamingQueryStatus(object):
+"""A class used to report information about the progress of a 
StreamingQuery.
+
+.. note:: Experimental
+
+.. versionadded:: 2.1
+"""
+
+def __init__(self, jsqs):
+self._jsqs = jsqs
+
+def __str__(self):
+"""
+Pretty string of this query status.
+
+>>> print(sqs)
+StreamingQueryStatus:
+Query name: query
+Query id: 1
+Status timestamp: 123
+Input rate: 1.0 rows/sec
+Processing rate 2.0 rows/sec
+Latency: 345.0 ms
+Trigger status:
+key: value
+Source statuses [1 source]:
+Source 1:MySource1
+Available offset: #0
+Input rate: 4.0 rows/sec
+Processing rate: 5.0 rows/sec
+Trigger status:
+key: value
+Sink status: MySink
+Committed offsets: [#1, -]
+"""
+return self._jsqs.toString()
+
+@property
+@ignore_unicode_prefix
+@since(2.1)
+def name(self):
+"""
+Name of the query. This name is unique across all active queries.
+
+>>> sqs.name
+u'query'
+"""
+return self._jsqs.name()
+
+@property
+@since(2.1)
+def id(self):
+"""
+Id of the query. This id is unique across all queries that have 
been started in
+the current process.
+
+>>> int(sqs.id)
+1
+"""
+return self._jsqs.id()
+
+@property
+@since(2.1)
+def timestamp(self):
+"""
+Timestamp (ms) of when this query was generated.
+
+>>> int(sqs.timestamp)
+123
+"""
+return self._jsqs.timestamp()
+
+@property
+@since(2.1)
+def inputRate(self):
+"""
+Current rate (rows/sec) at which data is being generated by all 
the sources.
+
+>>> sqs.inputRate
+1.0
+"""
+return self._jsqs.inputRate()
+
+@property
+@since(2.1)
+def processingRate(self):
+"""
+Current rate (rows/sec) at which the query is processing data from 
all the sources.
+
+>>> sqs.processingRate
+2.0
+"""
+return self._jsqs.processingRate()
+
+@property
+@since(2.1)
+def latency(self):
+"""
+Current average latency between the data being available in source 
and the sink
+writing the corresponding output.
+
+>>> sqs.latency
+345.0
+"""
+if (self._jsqs.latency().nonEmpty()):
+return self._jsqs.latency().get()
+else:
+return None
+
+@property
+@since(2.1)
+def sourceStatuses(self):
+"""
+Current statuses of the sources.
--- End diff --

Added 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15375
  
It's interesting AppVeyor is not running for this PR even though there are 
R changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread olarayej
Github user olarayej commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82910308
  
--- Diff: R/pkg/R/functions.R ---
@@ -2836,7 +2845,11 @@ setMethod("lpad", signature(x = "Column", len = 
"numeric", pad = "character"),
 setMethod("rand", signature(seed = "missing"),
   function(seed) {
 jc <- callJStatic("org.apache.spark.sql.functions", "rand")
-column(jc)
+
+# By assigning a one-row data.frame, the result of this 
function can be collected
+# returning a one-element Column
+df <- as.DataFrame(sparkRSQL.init(), data.frame(0))
--- End diff --

See my comment from March 30 to illustrate why this is needed. I'll change 
sparkRSQL.init() to sparkR.session(). Thanks for catching this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/15307#discussion_r82910318
  
--- Diff: python/pyspark/sql/streaming.py ---
@@ -189,6 +189,282 @@ def resetTerminated(self):
 self._jsqm.resetTerminated()
 
 
+class StreamingQueryStatus(object):
+"""A class used to report information about the progress of a 
StreamingQuery.
+
+.. note:: Experimental
+
+.. versionadded:: 2.1
+"""
+
+def __init__(self, jsqs):
+self._jsqs = jsqs
+
+def __str__(self):
+"""
+Pretty string of this query status.
+
+>>> print(sqs)
+StreamingQueryStatus:
+Query name: query
+Query id: 1
+Status timestamp: 123
+Input rate: 1.0 rows/sec
+Processing rate 2.0 rows/sec
+Latency: 345.0 ms
+Trigger status:
+key: value
+Source statuses [1 source]:
+Source 1:MySource1
+Available offset: #0
+Input rate: 4.0 rows/sec
+Processing rate: 5.0 rows/sec
+Trigger status:
+key: value
+Sink status: MySink
+Committed offsets: [#1, -]
+"""
+return self._jsqs.toString()
+
+@property
+@ignore_unicode_prefix
+@since(2.1)
+def name(self):
+"""
+Name of the query. This name is unique across all active queries.
+
+>>> sqs.name
+u'query'
+"""
+return self._jsqs.name()
+
+@property
+@since(2.1)
+def id(self):
+"""
+Id of the query. This id is unique across all queries that have 
been started in
+the current process.
+
+>>> int(sqs.id)
+1
+"""
+return self._jsqs.id()
+
+@property
+@since(2.1)
+def timestamp(self):
+"""
+Timestamp (ms) of when this query was generated.
+
+>>> int(sqs.timestamp)
+123
+"""
+return self._jsqs.timestamp()
+
+@property
+@since(2.1)
+def inputRate(self):
+"""
+Current rate (rows/sec) at which data is being generated by all 
the sources.
+
+>>> sqs.inputRate
+1.0
+"""
+return self._jsqs.inputRate()
+
+@property
+@since(2.1)
+def processingRate(self):
+"""
+Current rate (rows/sec) at which the query is processing data from 
all the sources.
+
+>>> sqs.processingRate
+2.0
+"""
+return self._jsqs.processingRate()
+
+@property
+@since(2.1)
+def latency(self):
+"""
+Current average latency between the data being available in source 
and the sink
+writing the corresponding output.
+
+>>> sqs.latency
+345.0
+"""
+if (self._jsqs.latency().nonEmpty()):
+return self._jsqs.latency().get()
+else:
+return None
+
+@property
+@since(2.1)
+def sourceStatuses(self):
+"""
+Current statuses of the sources.
+
+>>> len(sqs.sourceStatuses)
+1
+>>> sqs.sourceStatuses[0].description
+u'MySource1'
+"""
+return [SourceStatus(ss) for ss in self._jsqs.sourceStatuses()]
+
+@property
+@since(2.1)
+def sinkStatus(self):
+"""
+Current status of the sink.
+
+>>> sqs.sinkStatus.description
+u'MySink'
+"""
+return SinkStatus(self._jsqs.sinkStatus())
+
+@property
+@since(2.1)
+def triggerStatus(self):
+"""
+Low-level detailed status of the last completed/currently active 
trigger.
+
+>>> sqs.triggerStatus
+{u'key': u'value'}
--- End diff --

I changed the test data to show a glimpse of the actual data that could be 
there


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h

[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15439
  
**[Test build #66769 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66769/consoleFull)**
 for PR 15439 at commit 
[`6231206`](https://github.com/apache/spark/commit/623120685fcf3136007c450d5f282a5312bcce2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...

2016-10-11 Thread mikejihbe
Github user mikejihbe commented on the issue:

https://github.com/apache/spark/pull/15338
  
Thanks for the review @srowen. Those changes are in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66771 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66771/consoleFull)**
 for PR 15307 at commit 
[`8b4bce8`](https://github.com/apache/spark/commit/8b4bce8ff338aeb982beb6f93e79f09b718c46b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14690
  
**[Test build #66772 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66772/consoleFull)**
 for PR 14690 at commit 
[`10e9e8a`](https://github.com/apache/spark/commit/10e9e8a08661aa53347bccfecbc88aad8e89adb8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10307
  
**[Test build #66773 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66773/consoleFull)**
 for PR 10307 at commit 
[`b9e6481`](https://github.com/apache/spark/commit/b9e64815890db81d8168e4aa350b939b9b83c94e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to run in th...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15338
  
**[Test build #66770 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66770/consoleFull)**
 for PR 15338 at commit 
[`42c9874`](https://github.com/apache/spark/commit/42c9874ac35c124d6cfd93c272dda6e28b4ce9d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread olarayej
Github user olarayej commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82911244
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1035,10 +1035,16 @@ setMethod("dim",
 c(count(x), ncol(x))
   })
 
-#' Collects all the elements of a SparkDataFrame and coerces them into an 
R data.frame.
+#' Download Spark datasets into R
--- End diff --

Sure. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread olarayej
Github user olarayej commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82911261
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1049,11 +1055,16 @@ setMethod("dim",
 #' @export
 #' @examples
 #'\dontrun{
-#' sparkR.session()
-#' path <- "path/to/file.json"
-#' df <- read.json(path)
-#' collected <- collect(df)
-#' firstName <- collected[[1]]$name
+#' # Initialize Spark context and SQL context
+#' sc <- sparkR.init()
+#' sqlContext <- sparkRSQL.init(sc)
--- End diff --

Sure. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82911304
  
--- Diff: R/pkg/R/functions.R ---
@@ -2836,7 +2845,11 @@ setMethod("lpad", signature(x = "Column", len = 
"numeric", pad = "character"),
 setMethod("rand", signature(seed = "missing"),
   function(seed) {
 jc <- callJStatic("org.apache.spark.sql.functions", "rand")
-column(jc)
+
+# By assigning a one-row data.frame, the result of this 
function can be collected
+# returning a one-element Column
+df <- as.DataFrame(sparkRSQL.init(), data.frame(0))
--- End diff --

actually, just change it to `as.DataFrame(data.frame(0))`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12524: [SPARK-12524][Core]DagScheduler may submit a task set fo...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/12524
  
@seayi any progress on this ? Would be good to add this in if consistently 
reproducible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66774 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66774/consoleFull)**
 for PR 15148 at commit 
[`19f6d89`](https://github.com/apache/spark/commit/19f6d8927f56f9e67a1d4f6d9a14722392469b5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15439
  
**[Test build #66769 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66769/consoleFull)**
 for PR 15439 at commit 
[`6231206`](https://github.com/apache/spark/commit/623120685fcf3136007c450d5f282a5312bcce2f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15439
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66769/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15439: [SPARK-17880][DOC] The url linking to `AccumulatorV2` in...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15439
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15440: Fix hadoop.version in building-spark.md

2016-10-11 Thread apivovarov
GitHub user apivovarov opened a pull request:

https://github.com/apache/spark/pull/15440

Fix hadoop.version in building-spark.md

Couple of mvn build examples use `-Dhadoop.version=VERSION` instead of 
actual version number

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apivovarov/spark-1 patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15440.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15440


commit 0fa66f3410c3ae0c4f98d7f4ca4f2b0e53df0e44
Author: Alexander Pivovarov 
Date:   2016-10-11T23:48:18Z

Fix hadoop.version in building-spark.md

Couple mvn build examples use `-Dhadoop.version=VERSION` instead of actual 
version number




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15440: Fix hadoop.version in building-spark.md

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15440
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread falaki
Github user falaki commented on the issue:

https://github.com/apache/spark/pull/15375
  
Seems like a flaky test in `DirectKafkaStreamSuite`:
```
DirectKafkaStreamSuite:
- pattern based subscription *** FAILED *** (1 minute, 41 seconds)
```

If jenkins listens to your commands, maybe we can have it retest this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in...

2016-10-11 Thread ajbozarth
GitHub user ajbozarth opened a pull request:

https://github.com/apache/spark/pull/15441

[SPARK-4411] [Web UI] Add "kill" link for jobs in the UI

## What changes were proposed in this pull request?

Currently users can kill stages via the web ui but not jobs directly (jobs 
are killed if one of their stages is). I've added the ability to kill jobs via 
the web ui. This code change is based on #4823 by @lianhuiwang and updated to 
work with the latest code matching how stages are currently killed. In general 
I've copied the kill stage code warning and note comments and all. I also 
updated applicable tests and documentation.

## How was this patch tested?

Manually tested and dev/run-tests

![screen shot 2016-10-11 at 4 49 43 
pm](https://cloud.githubusercontent.com/assets/13952758/19292857/12f1b7c0-8fd4-11e6-8982-210249f7b697.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajbozarth/spark spark4411

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15441.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15441


commit af461ccce44e2792ea9356ccc2db6c84609511a0
Author: Lianhui Wang 
Date:   2015-02-28T03:24:46Z

Add kill link for jobs in the UI

commit 7f52874badfea314d019b0dc9097c54b8af2f654
Author: Lianhui Wang 
Date:   2015-02-28T05:23:22Z

Update JobsTab.scala

commit 7a6143a8d44620aec47a77ddc4f3242231924d3f
Author: Lianhui Wang 
Date:   2015-03-01T06:26:14Z

Merge branch 'master' of https://github.com/apache/spark into SPARK-4411

commit 584240affe2422e167b4d3ea87b5766623ed72f6
Author: Lianhui Wang 
Date:   2015-03-01T06:30:34Z

address srowen’s comments

commit 25fc0fd1fc574522ab08f23f6f61673960a1072a
Author: Lianhui Wang 
Date:   2015-03-01T06:45:46Z

address srowen’s comments

commit ba168399f4ee4f59a2c0568b9e094b55747e97c0
Author: Lianhui Wang 
Date:   2015-03-24T07:26:43Z

add : Unit return type

commit a0eee0caa14824cefb99d178522f6ada2a305f4a
Author: Lianhui Wang 
Date:   2015-03-25T01:43:40Z

add an else case

commit d0e208385482daac4a7bcaa4a90637cf88f66c77
Author: Alex Bozarth 
Date:   2016-10-11T20:39:32Z

add kill jobs link. initial commit based on pr #4823 by @lianhuiwang

commit f2519fc3903bb6b4c2e08a38d67a5b3df52dea49
Author: Alex Bozarth 
Date:   2016-10-11T21:41:55Z

Fixed scalastyle

commit 999f83a8b89e5fb89d5753b79346f8730656c0cd
Author: Alex Bozarth 
Date:   2016-10-12T00:03:18Z

Merge branch 'master' into spark4411




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in the UI

2016-10-11 Thread ajbozarth
Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/15441
  
@srowen @kayousterhout @tgravescs You have had input or the JIRA or 
previous PR, could you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15408
  
**[Test build #66765 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66765/consoleFull)**
 for PR 15408 at commit 
[`30173fa`](https://github.com/apache/spark/commit/30173facf79e03469291199807f84368a320e262).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in the UI

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15441
  
**[Test build #66775 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66775/consoleFull)**
 for PR 15441 at commit 
[`999f83a`](https://github.com/apache/spark/commit/999f83a8b89e5fb89d5753b79346f8730656c0cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15408
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15408
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66765/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15422: [SPARK-17850][Core]Add a flag to ignore corrupt files

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15422
  
**[Test build #66776 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66776/consoleFull)**
 for PR 15422 at commit 
[`ef88a64`](https://github.com/apache/spark/commit/ef88a64ac5e27e58f6f87bf0588ac1c3995be882).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15295
  
**[Test build #66777 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66777/consoleFull)**
 for PR 15295 at commit 
[`595b220`](https://github.com/apache/spark/commit/595b22097dba8716545cd405fa36448065ce779d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14690
  
**[Test build #66764 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66764/consoleFull)**
 for PR 14690 at commit 
[`175c268`](https://github.com/apache/spark/commit/175c2684eb515a1d0def8cf6a72011aa9a48625d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14690
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66764/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15434: [SPARK-17873][SQL] ALTER TABLE RENAME TO should allow us...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15434
  
**[Test build #66778 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66778/consoleFull)**
 for PR 15434 at commit 
[`65c1885`](https://github.com/apache/spark/commit/65c1885818e4b712c2132e7e97e0b96ceb3f6dd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15436: [SPARK-17875] [BUILD] Remove unneeded direct depe...

2016-10-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/15436#discussion_r82917622
  
--- Diff: dev/deps/spark-deps-hadoop-2.3 ---
@@ -130,7 +130,6 @@ metrics-json-3.1.2.jar
 metrics-jvm-3.1.2.jar
 minlog-1.3.0.jar
 mx4j-3.0.2.jar
-netty-3.8.0.Final.jar
--- End diff --

I think netty 3 is used by hadoop-nfs: 
https://issues.apache.org/jira/browse/HADOOP-12415

However, I don't know why the patch for HADOOP-12415 also added netty 3 to 
`hadoop-hdfs`... 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...

2016-10-11 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15436
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15436
  
**[Test build #66779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66779/consoleFull)**
 for PR 15436 at commit 
[`a5c5c31`](https://github.com/apache/spark/commit/a5c5c3146e702a5c6ac8a86648f58f44d13a95f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13440
  
**[Test build #66766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66766/consoleFull)**
 for PR 13440 at commit 
[`83f5e83`](https://github.com/apache/spark/commit/83f5e83fb87407bdd7dc8d740fba6fb30d1da3aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13440
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13440
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66766/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread olarayej
Github user olarayej commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82918200
  
--- Diff: R/pkg/R/column.R ---
@@ -32,35 +34,57 @@ setOldClass("jobj")
 #' @export
 #' @note Column since 1.4.0
 setClass("Column",
- slots = list(jc = "jobj"))
+ slots = list(jc = "jobj", df = "SparkDataFrameOrNull"))
 
 #' A set of operations working with SparkDataFrame columns
 #' @rdname columnfunctions
 #' @name columnfunctions
 NULL
-
-setMethod("initialize", "Column", function(.Object, jc) {
+setMethod("initialize", "Column", function(.Object, jc, df) {
   .Object@jc <- jc
+
+  # Some Column objects don't have any referencing DataFrame. In such 
case, df will be NULL.
+  if (missing(df)) {
+df <- NULL
+  }
+  .Object@df <- df
   .Object
 })
 
+setMethod("show", signature = "Column", definition = function(object) {
--- End diff --

Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #66774 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66774/consoleFull)**
 for PR 15148 at commit 
[`19f6d89`](https://github.com/apache/spark/commit/19f6d8927f56f9e67a1d4f6d9a14722392469b5a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66774/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11336: [SPARK-9325][SPARK-R] collect() head() and show()...

2016-10-11 Thread olarayej
Github user olarayej commented on a diff in the pull request:

https://github.com/apache/spark/pull/11336#discussion_r82919122
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1168,12 +1179,14 @@ setMethod("take",
 
 #' Head
 #'
-#' Return the first \code{num} rows of a SparkDataFrame as a R data.frame. 
If \code{num} is not
-#' specified, then head() returns the first 6 rows as with R data.frame.
+#' Return the first elements of a dataset. If \code{x} is a 
SparkDataFrame, its first 
+#' rows will be returned as a data.frame. If the dataset is a 
\code{Column}, its first 
+#' elements will be returned as a vector. The number of elements to be 
returned
+#' is given by parameter \code{num}. Default value for \code{num} is 6.
 #'
-#' @param x a SparkDataFrame.
-#' @param num the number of rows to return. Default is 6.
-#' @return A data.frame.
+#' @param x A SparkDataFrame or Column
--- End diff --

Not sure I follow here. Could you point to the specific example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14690
  
**[Test build #66772 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66772/consoleFull)**
 for PR 14690 at commit 
[`10e9e8a`](https://github.com/apache/spark/commit/10e9e8a08661aa53347bccfecbc88aad8e89adb8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14690
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66772/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14690
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12933
  
**[Test build #66780 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66780/consoleFull)**
 for PR 12933 at commit 
[`838dc77`](https://github.com/apache/spark/commit/838dc77d5473e7a584efbd3ac223eba696a427f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12933
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66780/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12933
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12933: [Spark-15155][Mesos] Optionally ignore default role reso...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12933
  
**[Test build #66780 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66780/consoleFull)**
 for PR 12933 at commit 
[`838dc77`](https://github.com/apache/spark/commit/838dc77d5473e7a584efbd3ac223eba696a427f7).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15442: [SPARK-17853][STREAMING][KAFKA][DOC] make it clea...

2016-10-11 Thread koeninger
GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/15442

[SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is 
bad

## What changes were proposed in this pull request?

Documentation fix to make it clear that reusing group id for different 
streams is super duper bad, just like it is with the underlying Kafka consumer.


## How was this patch tested?

I built jekyll doc and made sure it looked ok.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 SPARK-17853

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15442.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15442


commit c78c601b7c8af870085f31635ae8b374fb238332
Author: cody koeninger 
Date:   2016-10-12T01:18:35Z

[SPARK-17853][DOC] make it clear that reusing group.id is bad




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15442: [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15442
  
**[Test build #66781 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66781/consoleFull)**
 for PR 15442 at commit 
[`c78c601`](https://github.com/apache/spark/commit/c78c601b7c8af870085f31635ae8b374fb238332).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew

2016-10-11 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/15297#discussion_r82922339
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SkewShuffleRowRDD.scala 
---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.util.Arrays
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+
+class SkewCoalescedPartitioner(
+val parent: Partitioner,
+val partitionStartIndices: Array[(Int, Int)])
+  extends Partitioner {
+
+  @transient private lazy val parentPartitionMapping: Array[Int] = {
+val n = parent.numPartitions
+val result = new Array[Int](n)
+for (i <- 0 until partitionStartIndices.length) {
+  val start = partitionStartIndices(i)._2
+  val end = if (i < partitionStartIndices.length - 1) 
partitionStartIndices(i + 1)._2 else n
+  for (j <- start until end) {
+result(j) = i
+  }
+}
+result
+  }
+
+  override def numPartitions: Int = partitionStartIndices.length
+
+  override def getPartition(key: Any): Int = {
+parentPartitionMapping(parent.getPartition(key))
+  }
+
+  override def equals(other: Any): Boolean = other match {
+case c: SkewCoalescedPartitioner =>
+  c.parent == parent &&
+c.partitionStartIndices.zip(partitionStartIndices).
+  forall( r => r match {
+case (x, y) => (x._1 == y._1 && x._2 == y._2)
+})
+case _ =>
+  false
+  }
+
+  override def hashCode(): Int = 31 * parent.hashCode() + 
partitionStartIndices.hashCode()
+}
+
+ /**
+  * if mapIndex is -1, same as ShuffledRowRDDPartition
+  * if mapIndex > -1 ,only read one block of mappers.
+  */
+private final class SkewShuffledRowRDDPartition(
+val postShufflePartitionIndex: Int,
+val mapIndex: Int,
+val startPreShufflePartitionIndex: Int,
+val endPreShufflePartitionIndex: Int) extends Partition {
+  override val index: Int = postShufflePartitionIndex
+
+  override def hashCode(): Int = postShufflePartitionIndex
+
+  override def equals(other: Any): Boolean = super.equals(other)
+}
+
+ /**
+  * only use for skew data join. In join case , need fetch the same 
partition of
+  * left output and rigth output together. but when some partiton have 
bigger data than
+  * other partitions, it occur data skew . in the case , we need a 
specialized RDD to handling this.
+  * in skew partition side,we don't produce one partition, because one 
partition produce
+  * one task deal so much data is too slaw . but produce per-stage mapping 
task num parititons.
+  * one task only deal one mapper data. in other no skew side. In order to 
deal with the
+  * corresponding skew partition , we need produce same partition 
per-stage parititon num
+  * times.(Equivalent to broadcoast this partition)
+  *
+  * other no skew partition, then deal like ShuffledRowRDD
+  */
+class SkewShuffleRowRDD(
+var dependency1: ShuffleDependency[Int, InternalRow, InternalRow],
+partitionStartIndices: Array[(Int, Int, Int)])
+  extends ShuffledRowRDD ( dependency1, None) {
+
+  private[this] val numPreShufflePartitions = 
dependency.partitioner.numPartitions
+
+  override def getPartitions: Array[Partition] = {
+val partitions = ArrayBuffer[Partition]()
+var partitionIndex = -1
+for(i <- 0 until partitionStartIndices.length ) {
--- End diff --

` for(i <- 0 until partitionStartIndices.length )` -> `  for (i <- 
partitionStartIndices.indices) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66768 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66768/consoleFull)**
 for PR 15307 at commit 
[`3d7c71a`](https://github.com/apache/spark/commit/3d7c71a24b3fbfe86fee074b9034db4b89eca2bb).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew

2016-10-11 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/15297#discussion_r82922520
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SkewShuffleRowRDD.scala 
---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.util.Arrays
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark._
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+
+class SkewCoalescedPartitioner(
+val parent: Partitioner,
+val partitionStartIndices: Array[(Int, Int)])
+  extends Partitioner {
+
+  @transient private lazy val parentPartitionMapping: Array[Int] = {
+val n = parent.numPartitions
+val result = new Array[Int](n)
+for (i <- 0 until partitionStartIndices.length) {
--- End diff --

`for (i <- 0 until partitionStartIndices.length) ` ->`for (i <- 
partitionStartIndices.indices) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66768/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15443: [SPARK-17881] [SQL] Aggregation function for gene...

2016-10-11 Thread wzhfy
GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/15443

[SPARK-17881] [SQL] Aggregation function for generating string histograms

## What changes were proposed in this pull request?
This agg function generates equi-width histograms in the form of Map(value: 
String, frequency: Long) for string type columns, with a maximum number of 
histogram bins. It returns a empty result if the ndv(number of distinct value) 
of the column exceeds the maximum number allowed.

## How was this patch tested?
add test cases



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark stringHistogram

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15443.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15443


commit a843920983914de7efd21608b8f0e39c70b210d7
Author: wangzhenhua 
Date:   2016-10-12T01:02:37Z

add agg function to generate string histogram




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15307
  
**[Test build #66771 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66771/consoleFull)**
 for PR 15307 at commit 
[`8b4bce8`](https://github.com/apache/spark/commit/8b4bce8ff338aeb982beb6f93e79f09b718c46b6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...

2016-10-11 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15443
  
cc @cloud-fan @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15307
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66771/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15443: [SPARK-17881] [SQL] Aggregation function for generating ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15443
  
**[Test build #66782 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66782/consoleFull)**
 for PR 15443 at commit 
[`a843920`](https://github.com/apache/spark/commit/a843920983914de7efd21608b8f0e39c70b210d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/15441#discussion_r82923313
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala ---
@@ -35,4 +37,18 @@ private[ui] class JobsTab(parent: SparkUI) extends 
SparkUITab(parent, "jobs") {
 
   attachPage(new AllJobsPage(this))
   attachPage(new JobPage(this))
+
+  def handleKillRequest(request: HttpServletRequest): Unit = {
+if (killEnabled && 
(parent.securityManager.checkModifyPermissions(request.getRemoteUser))) {
+  val killFlag = 
Option(request.getParameter("terminate")).getOrElse("false").toBoolean
+  val jobId = Option(request.getParameter("id")).getOrElse("-1").toInt
+  if (jobId >= 0 && killFlag && 
jobProgresslistener.activeJobs.contains(jobId)) {
+sc.get.cancelJob(jobId)
+  }
--- End diff --

Creating an `Option` only to immediately `get` the value out of it is poor 
style, and unnecessary.
```scala
val jobId = Option(request.getParameter("id"))
jobId.foreach { id =>
  if (killFlag && jobProgresslistener.activeJobs.contains(id)) {
sc.get.cancelJob(id)
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/10307
  
**[Test build #66773 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66773/consoleFull)**
 for PR 10307 at commit 
[`b9e6481`](https://github.com/apache/spark/commit/b9e64815890db81d8168e4aa350b939b9b83c94e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66773/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #10307: [SPARK-12334][SQL][PYSPARK] Support read from multiple i...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/10307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   >