date:20151114

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156789202
  
**[Test build #45946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45946/consoleFull)**
 for PR 9718 at commit 
[`7228093`](https://github.com/apache/spark/commit/72280936d32a4e5c7fc0171670553caa52938a80).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156789220
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45946/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156789219
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11672] [ML] set active SQLContext in Ja...

2015-11-14 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/9719

[SPARK-11672] [ML] set active SQLContext in JavaDefaultReadWriteSuite

The same as #9694, but for Java test suite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark SPARK-11672.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9719.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9719


commit 1889a374ed4cd3cdf4cd889d61fffc6f78a11d2a
Author: Xiangrui Meng 
Date:   2015-11-15T07:40:23Z

set active SQLContext in JavaDefaultReadWriteSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11480][CORE][WEBUI] Wrong callsite is d...

2015-11-14 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/9437#issuecomment-156787029
  
I don't think `collectAsync` and `countAsync`  cause this issue.
It's because `takeAsync` calls `ComplexFutureAction#run`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156784632
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45944/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156784631
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156784614
  
**[Test build #45944 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45944/consoleFull)**
 for PR 9272 at commit 
[`caf51a7`](https://github.com/apache/spark/commit/caf51a723591b0820d9c48a0e07fec50f511296b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156783662
  
**[Test build #45946 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45946/consoleFull)**
 for PR 9718 at commit 
[`7228093`](https://github.com/apache/spark/commit/72280936d32a4e5c7fc0171670553caa52938a80).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156783007
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156783008
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45945/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156782998
  
**[Test build #45945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45945/consoleFull)**
 for PR 9718 at commit 
[`f43a7f9`](https://github.com/apache/spark/commit/f43a7f9c92dd4b5dc5e37b1fa41f8f8c00ca3020).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11736] [SQL] Add monotonically_increasi...

2015-11-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9703


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11736] [SQL] Add monotonically_increasi...

2015-11-14 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9703#issuecomment-156781368
  
I've merged this in master and branch-1.6.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11743][SQL] Add UserDefinedType support...

2015-11-14 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/9712#discussion_r44867544
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala
 ---
@@ -68,7 +117,36 @@ class RowEncoderSuite extends SparkFunSuite {
   .add("structOfArray", new StructType().add("array", arrayOfString))
   .add("structOfMap", new StructType().add("map", mapOfString))
   .add("structOfArrayAndMap",
-new StructType().add("array", arrayOfString).add("map", 
mapOfString)))
+new StructType().add("array", arrayOfString).add("map", 
mapOfString))
+  .add("structOfUDT", structOfUDT))
+
+  test(s"encode/decode: arrayOfUDT") {
+val schema = new StructType()
+  .add("arrayOfUDT", arrayOfUDT)
+
+val encoder = RowEncoder(schema)
+
+val input: Row = Row(Seq(new ExamplePoint(0.1, 0.2), new 
ExamplePoint(0.3, 0.4)))
+val row = encoder.toRow(input)
+val convertedBack = encoder.fromRow(row)
+assert(input.getSeq[ExamplePoint](0) == 
convertedBack.getSeq[ExamplePoint](0))
+  }
+
+  test(s"encode/decode: Product") {
+val schema = new StructType()
+  .add("structAsProduct",
+new StructType()
+  .add("int", IntegerType)
+  .add("string", StringType)
+  .add("double", DoubleType))
+
+val encoder = RowEncoder(schema)
+
+val input: Row = Row((100, "test", 0.123))
--- End diff --

Actually I found this problem when working on ScalaUDF. ScalaUDF will use 
`schemaFor` to obtain catalyst type for UDF input and output. The catalyst type 
returned by `schemaFor` for a `Product` is `StructType`. It is reasonable as we 
don't have other type to represent `Product` as I see.

So for a `StructType` field in an external `Row`, both `Row` and `Product` 
are possible values. When we call `extractorsFor` on the external `Row`, 
`externalDataTypeFor` will return `ObjectType(classOf[Row])` for this field. 
But the `get` accessor on the inputObject (i.e., the `Row`) will possibly 
return a `Product` for the ScalaUDF case and an exception will be thrown.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156777598
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45943/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156777595
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156777353
  
**[Test build #45943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45943/consoleFull)**
 for PR 9700 at commit 
[`852845e`](https://github.com/apache/spark/commit/852845ed89efbf260276c8f7e0d0386f008a47d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

2015-11-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9613#discussion_r44867068
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2200,4 +2200,101 @@ setMethod("coltypes",
 rTypes[naIndices] <- types[naIndices]
 
 rTypes
+  })
+
+#' Display the structure of a DataFrame, including column names, column 
types, as well as a
+#' a small sample of rows.
+#' @name str
+#' @title Compactly display the structure of a dataset
+#' @rdname str_data_frame
+#' @family dataframe_funcs
+#' @param x a DataFrame
+#' @examples \dontrun{
+#'
+#' # Create a DataFrame from the Iris dataset
+#' irisDF <- createDataFrame(sqlContext, iris)
+#' 
+#' # Show the structure of the DataFrame
+#' str(irisDF)
+#' 
+#' }
+setMethod("str",
+  signature(object = "DataFrame"),
+  function(object) {
+
+# TODO: These could be made global parameters, though in R 
it's not the case
+DEFAULT_HEAD_ROWS <- 6
+MAX_CHAR_PER_ROW <- 120
+MAX_COLS <- 100
+
+# Get the column names and types of the DataFrame
+names <- names(object)
+types <- coltypes(object)
+
+# Get the number of rows.
+# TODO: Ideally, this should be cached
+cachedCount <- nrow(object)
+
+# Get the first elements of the dataset. Limit number of 
columns accordingly
+dataFrame <- if (ncol(object) > MAX_COLS) {
+   head(object[, c(1:MAX_COLS)], DEFAULT_HEAD_ROWS)
+ } else {
+   head(object, DEFAULT_HEAD_ROWS)
+ }
--- End diff --

if you call `head(object)` it would return the first 6 rows by default, 
perhaps leave it to the default behavior instead of passing in 
`DEFAULT_HEAD_ROWS` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

2015-11-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9613#discussion_r44867056
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2200,4 +2200,101 @@ setMethod("coltypes",
 rTypes[naIndices] <- types[naIndices]
 
 rTypes
+  })
+
+#' Display the structure of a DataFrame, including column names, column 
types, as well as a
+#' a small sample of rows.
+#' @name str
+#' @title Compactly display the structure of a dataset
+#' @rdname str_data_frame
+#' @family dataframe_funcs
+#' @param x a DataFrame
--- End diff --

replace `x` with `object` to match the signature below


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

2015-11-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9613#discussion_r44867054
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2200,4 +2200,101 @@ setMethod("coltypes",
 rTypes[naIndices] <- types[naIndices]
 
 rTypes
+  })
+
+#' Display the structure of a DataFrame, including column names, column 
types, as well as a
+#' a small sample of rows.
+#' @name str
+#' @title Compactly display the structure of a dataset
+#' @rdname str_data_frame
+#' @family dataframe_funcs
--- End diff --

This has been updated recently - you should see when you rebase the latest 
in master - it would be `#' @family DataFrame functions`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

2015-11-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9613#discussion_r44867049
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2200,4 +2200,101 @@ setMethod("coltypes",
 rTypes[naIndices] <- types[naIndices]
 
 rTypes
+  })
+
+#' Display the structure of a DataFrame, including column names, column 
types, as well as a
+#' a small sample of rows.
+#' @name str
+#' @title Compactly display the structure of a dataset
+#' @rdname str_data_frame
--- End diff --

this should be `#' @rdname str`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

2015-11-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9613#discussion_r44867045
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2200,4 +2200,101 @@ setMethod("coltypes",
 rTypes[naIndices] <- types[naIndices]
 
 rTypes
+  })
+
+#' Display the structure of a DataFrame, including column names, column 
types, as well as a
+#' a small sample of rows.
+#' @name str
+#' @title Compactly display the structure of a dataset
+#' @rdname str_data_frame
+#' @family dataframe_funcs
+#' @param x a DataFrame
+#' @examples \dontrun{
+#'
--- End diff --

please remove unneeded empty line after `\dontrun {` and before `}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

2015-11-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9613#discussion_r44867038
  
--- Diff: R/pkg/R/generics.R ---
@@ -971,6 +986,9 @@ setGeneric("size", function(x) { 
standardGeneric("size") })
 #' @export
 setGeneric("soundex", function(x) { standardGeneric("soundex") })
 
+#' @export
--- End diff --

add `#' @rdname str`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156775646
  
**[Test build #45945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45945/consoleFull)**
 for PR 9718 at commit 
[`f43a7f9`](https://github.com/apache/spark/commit/f43a7f9c92dd4b5dc5e37b1fa41f8f8c00ca3020).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

2015-11-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/9613#discussion_r44867029
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2200,4 +2200,101 @@ setMethod("coltypes",
 rTypes[naIndices] <- types[naIndices]
 
 rTypes
+  })
+
+#' Display the structure of a DataFrame, including column names, column 
types, as well as a
+#' a small sample of rows.
+#' @name str
+#' @title Compactly display the structure of a dataset
+#' @rdname str_data_frame
+#' @family dataframe_funcs
+#' @param x a DataFrame
+#' @examples \dontrun{
+#'
+#' # Create a DataFrame from the Iris dataset
+#' irisDF <- createDataFrame(sqlContext, iris)
+#' 
+#' # Show the structure of the DataFrame
+#' str(irisDF)
+#' 
+#' }
+setMethod("str",
+  signature(object = "DataFrame"),
+  function(object) {
+
+# TODO: These could be made global parameters, though in R 
it's not the case
+DEFAULT_HEAD_ROWS <- 6
+MAX_CHAR_PER_ROW <- 120
+MAX_COLS <- 100
+
+# Get the column names and types of the DataFrame
+names <- names(object)
+types <- coltypes(object)
+
+# Get the number of rows.
+# TODO: Ideally, this should be cached
+cachedCount <- nrow(object)
+
+# Get the first elements of the dataset. Limit number of 
columns accordingly
+dataFrame <- if (ncol(object) > MAX_COLS) {
+   head(object[, c(1:MAX_COLS)], DEFAULT_HEAD_ROWS)
+ } else {
+   head(object, DEFAULT_HEAD_ROWS)
+ }
+
+# The number of observations will be displayed only if the 
number
+# of rows of the dataset has already been cached.
+if (!is.null(cachedCount)) {
+  cat(paste0("'", class(object), "': ", cachedCount, " obs. of 
",
+length(names), " variables:\n"))
+} else {
+  cat(paste0("'", class(object), "': ", length(names), " 
variables:\n"))
+}
+
+# Whether the ... should be printed at the end of each row
+ellipsis <- FALSE
+
+# Add ellipsis (i.e., "...") if there are more rows than shown
+if (!is.null(cachedCount) && (cachedCount > 
DEFAULT_HEAD_ROWS)) {
+  ellipsis <- TRUE
+}
+
+if (nrow(dataFrame) > 0) {
+  for (i in 1 : ncol(dataFrame)) {
+firstElements <- ""
+
+# Get the first elements for each column
+if (types[i] == "chr") {
--- End diff --

I understand that, the check on SHORT_TYPES is below in line 2278 though?
here, `types` is still "character" right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11736] [SQL] Add monotonically_increasi...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9703#issuecomment-156775586
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45941/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11736] [SQL] Add monotonically_increasi...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9703#issuecomment-156775585
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11736] [SQL] Add monotonically_increasi...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9703#issuecomment-156775574
  
**[Test build #45941 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45941/consoleFull)**
 for PR 9703 at commit 
[`bc4eb96`](https://github.com/apache/spark/commit/bc4eb96703782b2f15c78d84c01b048e67c0575f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156775499
  
@xwu0226 Looks good! I left a few comments regarding the format.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866996
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
+
+val answer2 = sql("SELECT input_file_name() as file FROM 
external_t5").head().getString(0)
+assert(answer2.contains("external_t5"))
+assert(sql("SELECT input_file_name() as file FROM external_t5")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_t5")
+
+   // External parquet pointing to LOCATION
+
+val location3 = 
Utils.getSparkClassLoader.getResource("data/files/external_parquet").getFile
+sql(s"""CREATE EXTERNAL table external_parquet(c1 int, c2 int)
+stored as parquet
+LOCATION '$location3'""")
+
+val answer3 = sql("SELECT input_file_name() as file FROM 
external_parquet")
+  .head().getString(0)
+assert(answer3.contains("external_parquet"))
+assert(sql("SELECT input_file_name() as file FROM external_parquet")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_parquet")
+
+// Non-External parquet pointing to /tmp/...
+
+sql("CREATE table internal_parquet_tmp(c1 int, c2 int) " +
+  " stored as parquet " +
+  " as select 1, 2")
+
+val answer4 = sql("SELECT input_file_name() as file FROM 
internal_parquet_tmp")
+  .head().getString(0)
+assert(answer4.contains("internal_parquet_tmp"))
+assert(sql("SELECT input_file_name() as file FROM 
internal_parquet_tmp")
+  .distinct().collect().length == 1)
--- End diff --

https://github.com/apache/spark/pull/9542/files#r44866986


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866993
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
+
+val answer2 = sql("SELECT input_file_name() as file FROM 
external_t5").head().getString(0)
+assert(answer2.contains("external_t5"))
+assert(sql("SELECT input_file_name() as file FROM external_t5")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_t5")
+
+   // External parquet pointing to LOCATION
+
+val location3 = 
Utils.getSparkClassLoader.getResource("data/files/external_parquet").getFile
+sql(s"""CREATE EXTERNAL table external_parquet(c1 int, c2 int)
+stored as parquet
+LOCATION '$location3'""")
--- End diff --

https://github.com/apache/spark/pull/9542/files#r44866981


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866992
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
--- End diff --

https://github.com/apache/spark/pull/9542/files#r44866981


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866994
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
+
+val answer2 = sql("SELECT input_file_name() as file FROM 
external_t5").head().getString(0)
+assert(answer2.contains("external_t5"))
+assert(sql("SELECT input_file_name() as file FROM external_t5")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_t5")
+
+   // External parquet pointing to LOCATION
+
+val location3 = 
Utils.getSparkClassLoader.getResource("data/files/external_parquet").getFile
+sql(s"""CREATE EXTERNAL table external_parquet(c1 int, c2 int)
+stored as parquet
+LOCATION '$location3'""")
+
+val answer3 = sql("SELECT input_file_name() as file FROM 
external_parquet")
+  .head().getString(0)
+assert(answer3.contains("external_parquet"))
+assert(sql("SELECT input_file_name() as file FROM external_parquet")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_parquet")
+
+// Non-External parquet pointing to /tmp/...
+
+sql("CREATE table internal_parquet_tmp(c1 int, c2 int) " +
+  " stored as parquet " +
+  " as select 1, 2")
--- End diff --

https://github.com/apache/spark/pull/9542/files#r44866981


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866990
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
+
+val answer2 = sql("SELECT input_file_name() as file FROM 
external_t5").head().getString(0)
+assert(answer2.contains("external_t5"))
+assert(sql("SELECT input_file_name() as file FROM external_t5")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_t5")
+
+   // External parquet pointing to LOCATION
+
+val location3 = 
Utils.getSparkClassLoader.getResource("data/files/external_parquet").getFile
+sql(s"""CREATE EXTERNAL table external_parquet(c1 int, c2 int)
+stored as parquet
+LOCATION '$location3'""")
+
+val answer3 = sql("SELECT input_file_name() as file FROM 
external_parquet")
+  .head().getString(0)
--- End diff --

https://github.com/apache/spark/pull/9542/files#r44866975


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866988
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
+
+val answer2 = sql("SELECT input_file_name() as file FROM 
external_t5").head().getString(0)
+assert(answer2.contains("external_t5"))
+assert(sql("SELECT input_file_name() as file FROM external_t5")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_t5")
+
+   // External parquet pointing to LOCATION
+
+val location3 = 
Utils.getSparkClassLoader.getResource("data/files/external_parquet").getFile
+sql(s"""CREATE EXTERNAL table external_parquet(c1 int, c2 int)
+stored as parquet
+LOCATION '$location3'""")
+
+val answer3 = sql("SELECT input_file_name() as file FROM 
external_parquet")
+  .head().getString(0)
+assert(answer3.contains("external_parquet"))
+assert(sql("SELECT input_file_name() as file FROM external_parquet")
+  .distinct().collect().length == 1)
--- End diff --

https://github.com/apache/spark/pull/9542/files#r44866986


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866986
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
+
+val answer2 = sql("SELECT input_file_name() as file FROM 
external_t5").head().getString(0)
+assert(answer2.contains("external_t5"))
+assert(sql("SELECT input_file_name() as file FROM external_t5")
+  .distinct().collect().length == 1)
--- End diff --

Can we first get the count through `sql("SELECT input_file_name() as file 
FROM external_t5").distinct().count()` and then do the assertion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866981
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
--- End diff --

Can we use the following format?
```
sql(
  s"""
  
  """)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866975
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
+
+val answer2 = sql("SELECT input_file_name() as file FROM 
external_t5").head().getString(0)
+assert(answer2.contains("external_t5"))
+assert(sql("SELECT input_file_name() as file FROM external_t5")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_t5")
+
+   // External parquet pointing to LOCATION
+
+val location3 = 
Utils.getSparkClassLoader.getResource("data/files/external_parquet").getFile
+sql(s"""CREATE EXTERNAL table external_parquet(c1 int, c2 int)
+stored as parquet
+LOCATION '$location3'""")
+
+val answer3 = sql("SELECT input_file_name() as file FROM 
external_parquet")
+  .head().getString(0)
+assert(answer3.contains("external_parquet"))
+assert(sql("SELECT input_file_name() as file FROM external_parquet")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_parquet")
+
+// Non-External parquet pointing to /tmp/...
+
+sql("CREATE table internal_parquet_tmp(c1 int, c2 int) " +
+  " stored as parquet " +
+  " as select 1, 2")
+
+val answer4 = sql("SELECT input_file_name() as file FROM 
internal_parquet_tmp")
+  .head().getString(0)
--- End diff --

The format looks weird.  Can we use the following?
```
val answer4 =
  sql("SELECT input_file_name() as file FROM 
internal_parquet_tmp").head().getString(0)
```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9542#discussion_r44866968
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala 
---
@@ -356,6 +356,66 @@ class HiveUDFSuite extends QueryTest with 
TestHiveSingleton {
 
 sqlContext.dropTempTable("testUDF")
   }
+
+  test("SPARK-11522 select input_file_name from non-parquet table"){
+
+// EXTERNAL OpenCSVSerde table pointing to LOCATION
+
+val location1 = 
Utils.getSparkClassLoader.getResource("data/files/csv_table").getFile
+sql(s"""CREATE EXTERNAL TABLE csv_table(page_id INT, impressions INT)
+ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
+ WITH SERDEPROPERTIES (
+   \"separatorChar\" = \",\",
+   \"quoteChar\" = \"\\\"\",
+   \"escapeChar\"= \"\")
+ LOCATION '$location1'""")
+
+val answer1 = sql("select input_file_name() from 
csv_table").head().getString(0)
+assert(answer1.contains(location1))
+assert(sql("select input_file_name() from 
csv_table").distinct().collect().length == 2)
+sql("DROP TABLE csv_table")
+
+// EXTERNAL pointing to LOCATION
+
+val location2 = 
Utils.getSparkClassLoader.getResource("data/files/external_t5").getFile
+sql(s"""CREATE EXTERNAL table external_t5 (c1 int, c2 int)
+row format delimited fields terminated by ','
+location '$location2'""")
+
+val answer2 = sql("SELECT input_file_name() as file FROM 
external_t5").head().getString(0)
+assert(answer2.contains("external_t5"))
+assert(sql("SELECT input_file_name() as file FROM external_t5")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_t5")
+
+   // External parquet pointing to LOCATION
+
+val location3 = 
Utils.getSparkClassLoader.getResource("data/files/external_parquet").getFile
+sql(s"""CREATE EXTERNAL table external_parquet(c1 int, c2 int)
+stored as parquet
+LOCATION '$location3'""")
+
+val answer3 = sql("SELECT input_file_name() as file FROM 
external_parquet")
+  .head().getString(0)
+assert(answer3.contains("external_parquet"))
+assert(sql("SELECT input_file_name() as file FROM external_parquet")
+  .distinct().collect().length == 1)
+sql("DROP TABLE external_parquet")
+
+// Non-External parquet pointing to /tmp/...
--- End diff --

Seems we do not need to say where it points to since it is a managed table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156775010
  
LGTM pending jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread yolandagao

Github user yolandagao commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156774495
  
Thank you Yin for the review. Updated the comments accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156774463
  
**[Test build #45944 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45944/consoleFull)**
 for PR 9272 at commit 
[`caf51a7`](https://github.com/apache/spark/commit/caf51a723591b0820d9c48a0e07fec50f511296b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156771080
  
**[Test build #45943 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45943/consoleFull)**
 for PR 9700 at commit 
[`852845e`](https://github.com/apache/spark/commit/852845ed89efbf260276c8f7e0d0386f008a47d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156769951
  
Unrelated to this PR (except that it may be partially responsible for the 
most recent test failure), there appears to be a race condition here:
```scala
if (!executionContext.isShutdown) {
  val f = Future { deleteFiles() }
```

(from `FileBasedWriteAheadLog.clean` in spark-streaming)

If the `ExecutionContext` shuts down after `isShutdown` is called but 
before the task underlying the `Future` is enqueued on it, an exception will be 
thrown, which appears to be what's happening in 
`CommonWriteAheadLogTests.logCleanUpTest`. I'm not certain why the 
`ExecutionContext` would be getting shut down, but it does seem like the test 
that's failing has an awfully short timeout, and according to the stack trace, 
the thread pool is very busy at the time of the failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156769727
  
**[Test build #45942 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45942/consoleFull)**
 for PR 9700 at commit 
[`d2079b1`](https://github.com/apache/spark/commit/d2079b104c3f19a8dc756cd871a4b4202c15f9b3).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156769729
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45942/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156769728
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156769661
  
**[Test build #45942 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45942/consoleFull)**
 for PR 9700 at commit 
[`d2079b1`](https://github.com/apache/spark/commit/d2079b104c3f19a8dc756cd871a4b4202c15f9b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] quick-start: Escape shell comman...

2015-11-14 Thread Tarrasch

Github user Tarrasch closed the pull request at:

https://github.com/apache/spark/pull/9647


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] quick-start: Escape shell comman...

2015-11-14 Thread Tarrasch

Github user Tarrasch commented on the pull request:

https://github.com/apache/spark/pull/9647#issuecomment-156769218
  
@srowen sent me an email pointing out I could have a better manner. He is 
of course correct and I thank him for his time. :)

As for this PR. Let's close it as it seems to be jinxed with 
misunderstanding. Hopefully my next patch will go more smooth. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11736] [SQL] Add monotonically_increasi...

2015-11-14 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9703#issuecomment-156768947
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156768936
  
**[Test build #45940 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45940/consoleFull)**
 for PR 9718 at commit 
[`f5f074d`](https://github.com/apache/spark/commit/f5f074d0f3cd0b194e94502e3f89b597700d92d2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156768943
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45940/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11736] [SQL] Add monotonically_increasi...

2015-11-14 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9703#discussion_r44866181
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -563,6 +563,10 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
   df.select(monotonicallyIncreasingId()),
   Row(0L) :: Row(1L) :: Row((1L << 33) + 0L) :: Row((1L << 33) + 1L) 
:: Nil
 )
+checkAnswer(
+  df.select(expr("monotonically_increasing_id()")),
--- End diff --

i was going to suggest just merging the two cases into one, but this works 
too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156768942
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11736] [SQL] Add monotonically_increasi...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9703#issuecomment-156768860
  
**[Test build #45941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45941/consoleFull)**
 for PR 9703 at commit 
[`bc4eb96`](https://github.com/apache/spark/commit/bc4eb96703782b2f15c78d84c01b048e67c0575f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156768512
  
It sure would be nice if Spark's unit tests were consistent. I add 
whitespace and suddenly the build fails?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156768492
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45939/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156768491
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156768481
  
**[Test build #45939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45939/consoleFull)**
 for PR 9264 at commit 
[`c19b3c0`](https://github.com/apache/spark/commit/c19b3c084d0c870a422df6e32f8efbe7620d335c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156768306
  
**[Test build #45940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45940/consoleFull)**
 for PR 9718 at commit 
[`f5f074d`](https://github.com/apache/spark/commit/f5f074d0f3cd0b194e94502e3f89b597700d92d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156767526
  
@davies take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9718#discussion_r44866039
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -49,40 +47,6 @@ class CodeGenerationSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 futures.foreach(Await.result(_, 10.seconds))
   }
 
-  // Test GenerateOrdering for all common types. For each type, we 
construct random input rows that
-  // contain two columns of that type, then for pairs of 
randomly-generated rows we check that
-  // GenerateOrdering agrees with RowOrdering.
-  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
-test(s"GenerateOrdering with $dataType") {
-  val rowOrdering = InterpretedOrdering.forSchema(Seq(dataType, 
dataType))
-  val genOrdering = GenerateOrdering.generate(
-BoundReference(0, dataType, nullable = true).asc ::
-  BoundReference(1, dataType, nullable = true).asc :: Nil)
-  val rowType = StructType(
-StructField("a", dataType, nullable = true) ::
-  StructField("b", dataType, nullable = true) :: Nil)
-  val maybeDataGenerator = RandomDataGenerator.forType(rowType, 
nullable = false)
-  assume(maybeDataGenerator.isDefined)
-  val randGenerator = maybeDataGenerator.get
-  val toCatalyst = 
CatalystTypeConverters.createToCatalystConverter(rowType)
-  for (_ <- 1 to 50) {
-val a = toCatalyst(randGenerator()).asInstanceOf[InternalRow]
-val b = toCatalyst(randGenerator()).asInstanceOf[InternalRow]
-withClue(s"a = $a, b = $b") {
-  assert(genOrdering.compare(a, a) === 0)
-  assert(genOrdering.compare(b, b) === 0)
-  assert(rowOrdering.compare(a, a) === 0)
-  assert(rowOrdering.compare(b, b) === 0)
-  assert(signum(genOrdering.compare(a, b)) === -1 * 
signum(genOrdering.compare(b, a)))
-  assert(signum(rowOrdering.compare(a, b)) === -1 * 
signum(rowOrdering.compare(b, a)))
-  assert(
-signum(rowOrdering.compare(a, b)) === 
signum(genOrdering.compare(a, b)),
-"Generated and non-generated orderings should agree")
-}
-  }
-}
-  }
--- End diff --

These lines are not in `OrderingSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9718#discussion_r44866037
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala
 ---
@@ -29,35 +30,76 @@ class InterpretedOrdering(ordering: Seq[SortOrder]) 
extends Ordering[InternalRow
   def this(ordering: Seq[SortOrder], inputSchema: Seq[Attribute]) =
 this(ordering.map(BindReferences.bindReference(_, inputSchema)))
 
+  private def compareValue(
+  left: Any,
+  right: Any,
+  dataType: DataType,
+  direction: SortDirection): Int = {
+if (left == null && right == null) {
+  return 0
+} else if (left == null) {
+  return if (direction == Ascending) -1 else 1
+} else if (right == null) {
+  return if (direction == Ascending) 1 else -1
+} else {
+  dataType match {
+case dt: AtomicType if direction == Ascending =>
+  return dt.ordering.asInstanceOf[Ordering[Any]].compare(left, 
right)
+case dt: AtomicType if direction == Descending =>
+  return 
dt.ordering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
+case s: StructType if direction == Ascending =>
+  return 
s.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right)
+case s: StructType if direction == Descending =>
+  return 
s.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
+case a: ArrayType =>
+  val leftArray = left.asInstanceOf[ArrayData]
+  val rightArray = right.asInstanceOf[ArrayData]
+  val minLength = scala.math.min(leftArray.numElements(), 
rightArray.numElements())
+  var i = 0
+  while (i < minLength) {
+val isNullLeft = leftArray.isNullAt(i)
+val isNullRight = rightArray.isNullAt(i)
+if (isNullLeft && isNullRight) {
+  // Do nothing.
+} else if (isNullLeft) {
+  return if (direction == Ascending) -1 else 1
+} else if (isNullRight) {
+  return if (direction == Ascending) 1 else -1
+} else {
+  val comp =
+compareValue(
+  leftArray.get(i, a.elementType),
+  rightArray.get(i, a.elementType),
+  a.elementType,
+  direction)
+  if (comp != 0) {
+return comp
+  }
+}
+i += 1
+  }
+  if (leftArray.numElements() < rightArray.numElements()) {
+return if (direction == Ascending) -1 else 1
+  } else if (leftArray.numElements() > rightArray.numElements()) {
+return if (direction == Ascending) 1 else -1
+  } else {
+return 0
+  }
+case other =>
+  throw new IllegalArgumentException(s"Type $other does not 
support ordered operations")
+  }
+}
+  }
+
   def compare(a: InternalRow, b: InternalRow): Int = {
 var i = 0
 while (i < ordering.size) {
   val order = ordering(i)
   val left = order.child.eval(a)
   val right = order.child.eval(b)
-
-  if (left == null && right == null) {
-// Both null, continue looking.
-  } else if (left == null) {
-return if (order.direction == Ascending) -1 else 1
-  } else if (right == null) {
-return if (order.direction == Ascending) 1 else -1
-  } else {
-val comparison = order.dataType match {
-  case dt: AtomicType if order.direction == Ascending =>
-dt.ordering.asInstanceOf[Ordering[Any]].compare(left, right)
-  case dt: AtomicType if order.direction == Descending =>
-dt.ordering.asInstanceOf[Ordering[Any]].reverse.compare(left, 
right)
-  case s: StructType if order.direction == Ascending =>
-
s.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right)
-  case s: StructType if order.direction == Descending =>
-
s.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
-  case other =>
-throw new IllegalArgumentException(s"Type $other does not 
support ordered operations")
-}
-if (comparison != 0) {
-  return comparison
-}
--- End diff --

These lines have been moved to `compareValue`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-14 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/9718

[SPARK-11738] [SQL] Making ArrayType orderable

https://issues.apache.org/jira/browse/SPARK-11738

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark makingArrayOrderable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9718


commit d49b21839cf69e91e25412d780239fca47aafcb1
Author: Yin Huai 
Date:   2015-11-14T22:43:01Z

Make arrays orderable.

commit 519b5933699c644fa75e6db13aa649f6fc58fda1
Author: Yin Huai 
Date:   2015-11-14T22:56:49Z

Allow array type in grouping expression.

commit f5f074d0f3cd0b194e94502e3f89b597700d92d2
Author: Yin Huai 
Date:   2015-11-14T23:26:58Z

Array column is allowed in grouping expressions.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156765212
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156765213
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45938/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156765199
  
**[Test build #45938 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45938/consoleFull)**
 for PR 9264 at commit 
[`489aabc`](https://github.com/apache/spark/commit/489aabc649104cddc8c3c41fbafe4e82e249f427).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44865648
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
  * takeSample. Cancellation works by setting the cancelled flag to true 
and interrupting the
  * action thread if it is being blocked by a job.
  */
+@DeveloperApi
 class ComplexFutureAction[T] extends FutureAction[T] {
 
-  // Pointer to the thread that is executing the action. It is set when 
the action is run.
-  @volatile private var thread: Thread = _
+  @volatile private var _cancelled = false
 
-  // A flag indicating whether the future has been cancelled. This is used 
in case the future
-  // is cancelled before the action was even run (and thus we have no 
thread to interrupt).
-  @volatile private var _cancelled: Boolean = false
-
-  @volatile private var jobs: Seq[Int] = Nil
+  @volatile private var subActions: List[FutureAction[_]] = Nil
 
   // A promise used to signal the future.
-  private val p = promise[T]()
+  private val p = Promise[T]()
 
-  override def cancel(): Unit = this.synchronized {
+  override def cancel(): Unit = synchronized {
 _cancelled = true
-if (thread != null) {
-  thread.interrupt()
-}
+p.tryFailure(new SparkException("Action has been cancelled"))
+subActions.foreach {_.cancel()}
   }
 
   /**
* Executes some action enclosed in the closure. To properly enable 
cancellation, the closure
* should use runJob implementation in this promise. See takeAsync for 
example.
*/
-  def run(func: => T)(implicit executor: ExecutionContext): this.type = {
-scala.concurrent.future {
-  thread = Thread.currentThread
-  try {
-p.success(func)
-  } catch {
-case e: Exception => p.failure(e)
-  } finally {
-// This lock guarantees when calling `thread.interrupt()` in 
`cancel`,
-// thread won't be set to null.
-ComplexFutureAction.this.synchronized {
-  thread = null
-}
-  }
-}
+  def run(func: => Future[T])(implicit executor: ExecutionContext): 
this.type = {
+p.tryCompleteWith(func)
 this
   }
 
   /**
-   * Runs a Spark job. This is a wrapper around the same functionality 
provided by SparkContext
+   * Submit a job for execution and return a FutureAction holding the 
result.
+   * This is a wrapper around the same functionality provided by 
SparkContext
* to enable cancellation.
*/
-  def runJob[T, U, R](
+  def submitJob[T, U, R](
   rdd: RDD[T],
   processPartition: Iterator[T] => U,
   partitions: Seq[Int],
   resultHandler: (Int, U) => Unit,
-  resultFunc: => R) {
+  resultFunc: => R): FutureAction[R] = synchronized {
 // If the action hasn't been cancelled yet, submit the job. The check 
and the submitJob
 // command need to be in an atomic block.
-val job = this.synchronized {
-  if (!isCancelled) {
-rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
-  } else {
-throw new SparkException("Action has been cancelled")
-  }
-}
-
-this.jobs = jobs ++ job.jobIds
-
-// Wait for the job to complete. If the action is cancelled (with an 
interrupt),
-// cancel the job and stop the execution. This is not in a 
synchronized block because
-// Await.ready eventually waits on the monitor in FutureJob.jobWaiter.
-try {
-  Await.ready(job, Duration.Inf)
-} catch {
-  case e: InterruptedException =>
-job.cancel()
-throw new SparkException("Action has been cancelled")
+if (!isCancelled) {
+  val job = rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
+  subActions = job::subActions
--- End diff --

We're talking about a Scala operator in Scala code here though so I don't 
know that these other examples mean much. See the Scala doc. 
http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156758037
  
**[Test build #45937 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45937/consoleFull)**
 for PR 9272 at commit 
[`1fbc372`](https://github.com/apache/spark/commit/1fbc3724d3ba7b90a1c412ac3d85d7a7eec9304c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156756916
  
**[Test build #45939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45939/consoleFull)**
 for PR 9264 at commit 
[`c19b3c0`](https://github.com/apache/spark/commit/c19b3c084d0c870a422df6e32f8efbe7620d335c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44865053
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
  * takeSample. Cancellation works by setting the cancelled flag to true 
and interrupting the
  * action thread if it is being blocked by a job.
  */
+@DeveloperApi
 class ComplexFutureAction[T] extends FutureAction[T] {
 
-  // Pointer to the thread that is executing the action. It is set when 
the action is run.
-  @volatile private var thread: Thread = _
+  @volatile private var _cancelled = false
 
-  // A flag indicating whether the future has been cancelled. This is used 
in case the future
-  // is cancelled before the action was even run (and thus we have no 
thread to interrupt).
-  @volatile private var _cancelled: Boolean = false
-
-  @volatile private var jobs: Seq[Int] = Nil
+  @volatile private var subActions: List[FutureAction[_]] = Nil
 
   // A promise used to signal the future.
-  private val p = promise[T]()
+  private val p = Promise[T]()
 
-  override def cancel(): Unit = this.synchronized {
+  override def cancel(): Unit = synchronized {
 _cancelled = true
-if (thread != null) {
-  thread.interrupt()
-}
+p.tryFailure(new SparkException("Action has been cancelled"))
+subActions.foreach {_.cancel()}
   }
 
   /**
* Executes some action enclosed in the closure. To properly enable 
cancellation, the closure
* should use runJob implementation in this promise. See takeAsync for 
example.
*/
-  def run(func: => T)(implicit executor: ExecutionContext): this.type = {
-scala.concurrent.future {
-  thread = Thread.currentThread
-  try {
-p.success(func)
-  } catch {
-case e: Exception => p.failure(e)
-  } finally {
-// This lock guarantees when calling `thread.interrupt()` in 
`cancel`,
-// thread won't be set to null.
-ComplexFutureAction.this.synchronized {
-  thread = null
-}
-  }
-}
+  def run(func: => Future[T])(implicit executor: ExecutionContext): 
this.type = {
+p.tryCompleteWith(func)
 this
   }
 
   /**
-   * Runs a Spark job. This is a wrapper around the same functionality 
provided by SparkContext
+   * Submit a job for execution and return a FutureAction holding the 
result.
+   * This is a wrapper around the same functionality provided by 
SparkContext
* to enable cancellation.
*/
-  def runJob[T, U, R](
+  def submitJob[T, U, R](
   rdd: RDD[T],
   processPartition: Iterator[T] => U,
   partitions: Seq[Int],
   resultHandler: (Int, U) => Unit,
-  resultFunc: => R) {
+  resultFunc: => R): FutureAction[R] = synchronized {
 // If the action hasn't been cancelled yet, submit the job. The check 
and the submitJob
 // command need to be in an atomic block.
-val job = this.synchronized {
-  if (!isCancelled) {
-rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
-  } else {
-throw new SparkException("Action has been cancelled")
-  }
-}
-
-this.jobs = jobs ++ job.jobIds
-
-// Wait for the job to complete. If the action is cancelled (with an 
interrupt),
-// cancel the job and stop the execution. This is not in a 
synchronized block because
-// Await.ready eventually waits on the monitor in FutureJob.jobWaiter.
-try {
-  Await.ready(job, Duration.Inf)
-} catch {
-  case e: InterruptedException =>
-job.cancel()
-throw new SparkException("Action has been cancelled")
+if (!isCancelled) {
+  val job = rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
+  subActions = job::subActions
--- End diff --

I searched for "ml cons operator" in Google, and out all the hits on the 
first page that included (ML, SML, or F#) code samples, all but one omitted the 
spaces. Anyway, I've inserted spaces for the sake of consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional command

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864970
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
  * takeSample. Cancellation works by setting the cancelled flag to true 
and interrupting the
  * action thread if it is being blocked by a job.
  */
+@DeveloperApi
 class ComplexFutureAction[T] extends FutureAction[T] {
 
-  // Pointer to the thread that is executing the action. It is set when 
the action is run.
-  @volatile private var thread: Thread = _
+  @volatile private var _cancelled = false
 
-  // A flag indicating whether the future has been cancelled. This is used 
in case the future
-  // is cancelled before the action was even run (and thus we have no 
thread to interrupt).
-  @volatile private var _cancelled: Boolean = false
-
-  @volatile private var jobs: Seq[Int] = Nil
+  @volatile private var subActions: List[FutureAction[_]] = Nil
 
   // A promise used to signal the future.
-  private val p = promise[T]()
+  private val p = Promise[T]()
 
-  override def cancel(): Unit = this.synchronized {
+  override def cancel(): Unit = synchronized {
 _cancelled = true
-if (thread != null) {
-  thread.interrupt()
-}
+p.tryFailure(new SparkException("Action has been cancelled"))
+subActions.foreach {_.cancel()}
   }
 
   /**
* Executes some action enclosed in the closure. To properly enable 
cancellation, the closure
* should use runJob implementation in this promise. See takeAsync for 
example.
*/
-  def run(func: => T)(implicit executor: ExecutionContext): this.type = {
-scala.concurrent.future {
-  thread = Thread.currentThread
-  try {
-p.success(func)
-  } catch {
-case e: Exception => p.failure(e)
-  } finally {
-// This lock guarantees when calling `thread.interrupt()` in 
`cancel`,
-// thread won't be set to null.
-ComplexFutureAction.this.synchronized {
-  thread = null
-}
-  }
-}
+  def run(func: => Future[T])(implicit executor: ExecutionContext): 
this.type = {
+p.tryCompleteWith(func)
 this
   }
 
   /**
-   * Runs a Spark job. This is a wrapper around the same functionality 
provided by SparkContext
+   * Submit a job for execution and return a FutureAction holding the 
result.
+   * This is a wrapper around the same functionality provided by 
SparkContext
* to enable cancellation.
*/
-  def runJob[T, U, R](
+  def submitJob[T, U, R](
   rdd: RDD[T],
   processPartition: Iterator[T] => U,
   partitions: Seq[Int],
   resultHandler: (Int, U) => Unit,
-  resultFunc: => R) {
+  resultFunc: => R): FutureAction[R] = synchronized {
 // If the action hasn't been cancelled yet, submit the job. The check 
and the submitJob
 // command need to be in an atomic block.
-val job = this.synchronized {
-  if (!isCancelled) {
-rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
-  } else {
-throw new SparkException("Action has been cancelled")
-  }
-}
-
-this.jobs = jobs ++ job.jobIds
-
-// Wait for the job to complete. If the action is cancelled (with an 
interrupt),
-// cancel the job and stop the execution. This is not in a 
synchronized block because
-// Await.ready eventually waits on the monitor in FutureJob.jobWaiter.
-try {
-  Await.ready(job, Duration.Inf)
-} catch {
-  case e: InterruptedException =>
-job.cancel()
-throw new SparkException("Action has been cancelled")
+if (!isCancelled) {
+  val job = rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
+  subActions = job::subActions
--- End diff --

Really? I can't find any that don't use spaces, after skimming over the 
first 1000 hits. I can't recall seeing Scala code without spaces around this 
operator. *shrug* I'd match the surrounding project code here in matters of 
style/taste.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864943
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,50 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R])
+(starter: => Semaphore) : Unit = {
--- End diff --

OK, no problem, I see what you mean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864860
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,50 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R])
+(starter: => Semaphore) : Unit = {
--- End diff --

To clarify: If I put the parameters in the same parameter list, the 
compiler forces me to be a lot more verbose at the call sites (with curly 
braces, `=>`, and so forth)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864838
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
  * takeSample. Cancellation works by setting the cancelled flag to true 
and interrupting the
  * action thread if it is being blocked by a job.
  */
+@DeveloperApi
 class ComplexFutureAction[T] extends FutureAction[T] {
 
-  // Pointer to the thread that is executing the action. It is set when 
the action is run.
-  @volatile private var thread: Thread = _
+  @volatile private var _cancelled = false
 
-  // A flag indicating whether the future has been cancelled. This is used 
in case the future
-  // is cancelled before the action was even run (and thus we have no 
thread to interrupt).
-  @volatile private var _cancelled: Boolean = false
-
-  @volatile private var jobs: Seq[Int] = Nil
+  @volatile private var subActions: List[FutureAction[_]] = Nil
 
   // A promise used to signal the future.
-  private val p = promise[T]()
+  private val p = Promise[T]()
 
-  override def cancel(): Unit = this.synchronized {
+  override def cancel(): Unit = synchronized {
 _cancelled = true
-if (thread != null) {
-  thread.interrupt()
-}
+p.tryFailure(new SparkException("Action has been cancelled"))
+subActions.foreach {_.cancel()}
   }
 
   /**
* Executes some action enclosed in the closure. To properly enable 
cancellation, the closure
* should use runJob implementation in this promise. See takeAsync for 
example.
*/
-  def run(func: => T)(implicit executor: ExecutionContext): this.type = {
-scala.concurrent.future {
-  thread = Thread.currentThread
-  try {
-p.success(func)
-  } catch {
-case e: Exception => p.failure(e)
-  } finally {
-// This lock guarantees when calling `thread.interrupt()` in 
`cancel`,
-// thread won't be set to null.
-ComplexFutureAction.this.synchronized {
-  thread = null
-}
-  }
-}
+  def run(func: => Future[T])(implicit executor: ExecutionContext): 
this.type = {
+p.tryCompleteWith(func)
 this
   }
 
   /**
-   * Runs a Spark job. This is a wrapper around the same functionality 
provided by SparkContext
+   * Submit a job for execution and return a FutureAction holding the 
result.
+   * This is a wrapper around the same functionality provided by 
SparkContext
* to enable cancellation.
*/
-  def runJob[T, U, R](
+  def submitJob[T, U, R](
   rdd: RDD[T],
   processPartition: Iterator[T] => U,
   partitions: Seq[Int],
   resultHandler: (Int, U) => Unit,
-  resultFunc: => R) {
+  resultFunc: => R): FutureAction[R] = synchronized {
 // If the action hasn't been cancelled yet, submit the job. The check 
and the submitJob
 // command need to be in an atomic block.
-val job = this.synchronized {
-  if (!isCancelled) {
-rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
-  } else {
-throw new SparkException("Action has been cancelled")
-  }
-}
-
-this.jobs = jobs ++ job.jobIds
-
-// Wait for the job to complete. If the action is cancelled (with an 
interrupt),
-// cancel the job and stop the execution. This is not in a 
synchronized block because
-// Await.ready eventually waits on the monitor in FutureJob.jobWaiter.
-try {
-  Await.ready(job, Duration.Inf)
-} catch {
-  case e: InterruptedException =>
-job.cancel()
-throw new SparkException("Action has been cancelled")
+if (!isCancelled) {
+  val job = rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
+  subActions = job::subActions
--- End diff --

I actually haven't seen Spark code that uses `::` at all (not that it 
doesn't exist - I just haven't looked at any files that use it).
My understanding is that using `::` as the "cons" operator is a notation 
inherited from ML. All of the (S)ML examples I have ever seen use it without 
any spaces, though it seems that Scala examples don't necessarily follow the 
same style convention. To me, using it without the spaces reads better, since 
it usually looks like a list of elements chained together, but it doesn't 
really matter to me either way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156751826
  
**[Test build #45938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45938/consoleFull)**
 for PR 9264 at commit 
[`489aabc`](https://github.com/apache/spark/commit/489aabc649104cddc8c3c41fbafe4e82e249f427).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864690
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,50 @@ class AsyncRDDActionsSuite extends SparkFunSuite with 
BeforeAndAfterAll with Tim
   Await.result(f, Duration(20, "milliseconds"))
 }
   }
+
+  private def testAsyncAction[R](action: RDD[Int] => FutureAction[R])
+(starter: => Semaphore) : Unit = {
--- End diff --

Keeping it this way keeps the call sites a lot less cluttered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864675
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
  * takeSample. Cancellation works by setting the cancelled flag to true 
and interrupting the
  * action thread if it is being blocked by a job.
  */
+@DeveloperApi
 class ComplexFutureAction[T] extends FutureAction[T] {
 
-  // Pointer to the thread that is executing the action. It is set when 
the action is run.
-  @volatile private var thread: Thread = _
+  @volatile private var _cancelled = false
 
-  // A flag indicating whether the future has been cancelled. This is used 
in case the future
-  // is cancelled before the action was even run (and thus we have no 
thread to interrupt).
-  @volatile private var _cancelled: Boolean = false
-
-  @volatile private var jobs: Seq[Int] = Nil
+  @volatile private var subActions: List[FutureAction[_]] = Nil
 
   // A promise used to signal the future.
-  private val p = promise[T]()
+  private val p = Promise[T]()
 
-  override def cancel(): Unit = this.synchronized {
+  override def cancel(): Unit = synchronized {
 _cancelled = true
-if (thread != null) {
-  thread.interrupt()
-}
+p.tryFailure(new SparkException("Action has been cancelled"))
+subActions.foreach {_.cancel()}
   }
 
   /**
* Executes some action enclosed in the closure. To properly enable 
cancellation, the closure
* should use runJob implementation in this promise. See takeAsync for 
example.
*/
-  def run(func: => T)(implicit executor: ExecutionContext): this.type = {
-scala.concurrent.future {
-  thread = Thread.currentThread
-  try {
-p.success(func)
-  } catch {
-case e: Exception => p.failure(e)
-  } finally {
-// This lock guarantees when calling `thread.interrupt()` in 
`cancel`,
-// thread won't be set to null.
-ComplexFutureAction.this.synchronized {
-  thread = null
-}
-  }
-}
+  def run(func: => Future[T])(implicit executor: ExecutionContext): 
this.type = {
+p.tryCompleteWith(func)
 this
   }
 
   /**
-   * Runs a Spark job. This is a wrapper around the same functionality 
provided by SparkContext
+   * Submit a job for execution and return a FutureAction holding the 
result.
+   * This is a wrapper around the same functionality provided by 
SparkContext
* to enable cancellation.
*/
-  def runJob[T, U, R](
+  def submitJob[T, U, R](
   rdd: RDD[T],
   processPartition: Iterator[T] => U,
   partitions: Seq[Int],
   resultHandler: (Int, U) => Unit,
-  resultFunc: => R) {
+  resultFunc: => R): FutureAction[R] = synchronized {
 // If the action hasn't been cancelled yet, submit the job. The check 
and the submitJob
 // command need to be in an atomic block.
-val job = this.synchronized {
-  if (!isCancelled) {
-rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
-  } else {
-throw new SparkException("Action has been cancelled")
-  }
-}
-
-this.jobs = jobs ++ job.jobIds
-
-// Wait for the job to complete. If the action is cancelled (with an 
interrupt),
-// cancel the job and stop the execution. This is not in a 
synchronized block because
-// Await.ready eventually waits on the monitor in FutureJob.jobWaiter.
-try {
-  Await.ready(job, Duration.Inf)
-} catch {
-  case e: InterruptedException =>
-job.cancel()
-throw new SparkException("Action has been cancelled")
+if (!isCancelled) {
+  val job = rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
+  subActions = job::subActions
--- End diff --

Hm really? all of the Spark code I see is written this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864658
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +119,27 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
   }
 
   override def ready(atMost: Duration)(implicit permit: CanAwait): 
SimpleFutureAction.this.type = {
-if (!atMost.isFinite()) {
-  awaitResult()
-} else jobWaiter.synchronized {
-  val finishTime = System.currentTimeMillis() + atMost.toMillis
-  while (!isCompleted) {
-val time = System.currentTimeMillis()
-if (time >= finishTime) {
-  throw new TimeoutException
-} else {
-  jobWaiter.wait(finishTime - time)
-}
-  }
-}
+jobWaiter.completionFuture.ready(atMost)
 this
   }
 
   @throws(classOf[Exception])
   override def result(atMost: Duration)(implicit permit: CanAwait): T = {
-ready(atMost)(permit)
-awaitResult() match {
-  case scala.util.Success(res) => res
-  case scala.util.Failure(e) => throw e
-}
+jobWaiter.completionFuture.ready(atMost)
+assert(value.isDefined, "Future has not completed properly")
+value.get.get
   }
 
   override def onComplete[U](func: (Try[T]) => U)(implicit executor: 
ExecutionContext) {
-executor.execute(new Runnable {
-  override def run() {
-func(awaitResult())
-  }
-})
+jobWaiter.completionFuture onComplete {_ => func(value.get)}
   }
 
   override def isCompleted: Boolean = jobWaiter.jobFinished
 
   override def isCancelled: Boolean = _cancelled
 
-  override def value: Option[Try[T]] = {
-if (jobWaiter.jobFinished) {
-  Some(awaitResult())
-} else {
-  None
-}
-  }
-
-  private def awaitResult(): Try[T] = {
-jobWaiter.awaitResult() match {
-  case JobSucceeded => scala.util.Success(resultFunc)
-  case JobFailed(e: Exception) => scala.util.Failure(e)
-}
-  }
+  override def value: Option[Try[T]] =
+jobWaiter.completionFuture.value.map {res => res.map {_ => resultFunc}}
--- End diff --

I could be wrong about this one too -- is it the `map(resultFunc)` that 
doesn't work? keep it if so though this doesn't need braces. The other change 
should work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864655
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +119,27 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
   }
 
   override def ready(atMost: Duration)(implicit permit: CanAwait): 
SimpleFutureAction.this.type = {
-if (!atMost.isFinite()) {
-  awaitResult()
-} else jobWaiter.synchronized {
-  val finishTime = System.currentTimeMillis() + atMost.toMillis
-  while (!isCompleted) {
-val time = System.currentTimeMillis()
-if (time >= finishTime) {
-  throw new TimeoutException
-} else {
-  jobWaiter.wait(finishTime - time)
-}
-  }
-}
+jobWaiter.completionFuture.ready(atMost)
 this
   }
 
   @throws(classOf[Exception])
   override def result(atMost: Duration)(implicit permit: CanAwait): T = {
-ready(atMost)(permit)
-awaitResult() match {
-  case scala.util.Success(res) => res
-  case scala.util.Failure(e) => throw e
-}
+jobWaiter.completionFuture.ready(atMost)
+assert(value.isDefined, "Future has not completed properly")
+value.get.get
   }
 
   override def onComplete[U](func: (Try[T]) => U)(implicit executor: 
ExecutionContext) {
-executor.execute(new Runnable {
-  override def run() {
-func(awaitResult())
-  }
-})
+jobWaiter.completionFuture onComplete {_ => func(value.get)}
   }
 
   override def isCompleted: Boolean = jobWaiter.jobFinished
 
   override def isCancelled: Boolean = _cancelled
 
-  override def value: Option[Try[T]] = {
-if (jobWaiter.jobFinished) {
-  Some(awaitResult())
-} else {
-  None
-}
-  }
-
-  private def awaitResult(): Try[T] = {
-jobWaiter.awaitResult() match {
-  case JobSucceeded => scala.util.Success(resultFunc)
-  case JobFailed(e: Exception) => scala.util.Failure(e)
-}
-  }
+  override def value: Option[Try[T]] =
+jobWaiter.completionFuture.value.map {res => res.map {_ => resultFunc}}
--- End diff --

I've replaced the curly braces with parentheses, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864643
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -27,7 +27,7 @@ import org.scalatest.BeforeAndAfterAll
 import org.scalatest.concurrent.Timeouts
 import org.scalatest.time.SpanSugar._
 
-import org.apache.spark.{LocalSparkContext, SparkContext, SparkException, 
SparkFunSuite}
+import org.apache.spark._
--- End diff --

I think IntelliJ did that automatically because the number of things being 
imported exceeded some threshold.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864618
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -95,19 +102,18 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) 
extends Serializable with Loggi
 val p = partsScanned until math.min(partsScanned + numPartsToTry, 
totalParts)
 
 val buf = new Array[Array[T]](p.size)
-f.runJob(self,
+val job = f.submitJob(self,
   (it: Iterator[T]) => it.take(left).toArray,
   p,
   (index: Int, data: Array[T]) => buf(index) = data,
   Unit)
-
-buf.foreach(results ++= _.take(num - results.size))
-partsScanned += numPartsToTry
+job.flatMap {case _ =>
--- End diff --

Oh, I think I misread this method. You're correct and I think you in fact 
have to keep the blank argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864625
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +119,27 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
   }
 
   override def ready(atMost: Duration)(implicit permit: CanAwait): 
SimpleFutureAction.this.type = {
-if (!atMost.isFinite()) {
-  awaitResult()
-} else jobWaiter.synchronized {
-  val finishTime = System.currentTimeMillis() + atMost.toMillis
-  while (!isCompleted) {
-val time = System.currentTimeMillis()
-if (time >= finishTime) {
-  throw new TimeoutException
-} else {
-  jobWaiter.wait(finishTime - time)
-}
-  }
-}
+jobWaiter.completionFuture.ready(atMost)
 this
   }
 
   @throws(classOf[Exception])
   override def result(atMost: Duration)(implicit permit: CanAwait): T = {
-ready(atMost)(permit)
-awaitResult() match {
-  case scala.util.Success(res) => res
-  case scala.util.Failure(e) => throw e
-}
+jobWaiter.completionFuture.ready(atMost)
+assert(value.isDefined, "Future has not completed properly")
+value.get.get
   }
 
   override def onComplete[U](func: (Try[T]) => U)(implicit executor: 
ExecutionContext) {
-executor.execute(new Runnable {
-  override def run() {
-func(awaitResult())
-  }
-})
+jobWaiter.completionFuture onComplete {_ => func(value.get)}
   }
 
   override def isCompleted: Boolean = jobWaiter.jobFinished
 
   override def isCancelled: Boolean = _cancelled
 
-  override def value: Option[Try[T]] = {
-if (jobWaiter.jobFinished) {
-  Some(awaitResult())
-} else {
-  None
-}
-  }
-
-  private def awaitResult(): Try[T] = {
-jobWaiter.awaitResult() match {
-  case JobSucceeded => scala.util.Success(resultFunc)
-  case JobFailed(e: Exception) => scala.util.Failure(e)
-}
-  }
+  override def value: Option[Try[T]] =
+jobWaiter.completionFuture.value.map {res => res.map {_ => resultFunc}}
--- End diff --

Does not compile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864614
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
  * takeSample. Cancellation works by setting the cancelled flag to true 
and interrupting the
  * action thread if it is being blocked by a job.
  */
+@DeveloperApi
 class ComplexFutureAction[T] extends FutureAction[T] {
 
-  // Pointer to the thread that is executing the action. It is set when 
the action is run.
-  @volatile private var thread: Thread = _
+  @volatile private var _cancelled = false
 
-  // A flag indicating whether the future has been cancelled. This is used 
in case the future
-  // is cancelled before the action was even run (and thus we have no 
thread to interrupt).
-  @volatile private var _cancelled: Boolean = false
-
-  @volatile private var jobs: Seq[Int] = Nil
+  @volatile private var subActions: List[FutureAction[_]] = Nil
 
   // A promise used to signal the future.
-  private val p = promise[T]()
+  private val p = Promise[T]()
 
-  override def cancel(): Unit = this.synchronized {
+  override def cancel(): Unit = synchronized {
 _cancelled = true
-if (thread != null) {
-  thread.interrupt()
-}
+p.tryFailure(new SparkException("Action has been cancelled"))
+subActions.foreach {_.cancel()}
   }
 
   /**
* Executes some action enclosed in the closure. To properly enable 
cancellation, the closure
* should use runJob implementation in this promise. See takeAsync for 
example.
*/
-  def run(func: => T)(implicit executor: ExecutionContext): this.type = {
-scala.concurrent.future {
-  thread = Thread.currentThread
-  try {
-p.success(func)
-  } catch {
-case e: Exception => p.failure(e)
-  } finally {
-// This lock guarantees when calling `thread.interrupt()` in 
`cancel`,
-// thread won't be set to null.
-ComplexFutureAction.this.synchronized {
-  thread = null
-}
-  }
-}
+  def run(func: => Future[T])(implicit executor: ExecutionContext): 
this.type = {
+p.tryCompleteWith(func)
 this
   }
 
   /**
-   * Runs a Spark job. This is a wrapper around the same functionality 
provided by SparkContext
+   * Submit a job for execution and return a FutureAction holding the 
result.
+   * This is a wrapper around the same functionality provided by 
SparkContext
* to enable cancellation.
*/
-  def runJob[T, U, R](
+  def submitJob[T, U, R](
   rdd: RDD[T],
   processPartition: Iterator[T] => U,
   partitions: Seq[Int],
   resultHandler: (Int, U) => Unit,
-  resultFunc: => R) {
+  resultFunc: => R): FutureAction[R] = synchronized {
 // If the action hasn't been cancelled yet, submit the job. The check 
and the submitJob
 // command need to be in an atomic block.
-val job = this.synchronized {
-  if (!isCancelled) {
-rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
-  } else {
-throw new SparkException("Action has been cancelled")
-  }
-}
-
-this.jobs = jobs ++ job.jobIds
-
-// Wait for the job to complete. If the action is cancelled (with an 
interrupt),
-// cancel the job and stop the execution. This is not in a 
synchronized block because
-// Await.ready eventually waits on the monitor in FutureJob.jobWaiter.
-try {
-  Await.ready(job, Duration.Inf)
-} catch {
-  case e: InterruptedException =>
-job.cancel()
-throw new SparkException("Action has been cancelled")
+if (!isCancelled) {
+  val job = rdd.context.submitJob(rdd, processPartition, partitions, 
resultHandler, resultFunc)
+  subActions = job::subActions
--- End diff --

I don't think I've ever seen it done that way, and ScalaStyle doesn't seem 
to care.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864605
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter: 
JobWaiter[_], resultFunc:
  * takeSample. Cancellation works by setting the cancelled flag to true 
and interrupting the
  * action thread if it is being blocked by a job.
  */
+@DeveloperApi
 class ComplexFutureAction[T] extends FutureAction[T] {
 
-  // Pointer to the thread that is executing the action. It is set when 
the action is run.
-  @volatile private var thread: Thread = _
+  @volatile private var _cancelled = false
 
-  // A flag indicating whether the future has been cancelled. This is used 
in case the future
-  // is cancelled before the action was even run (and thus we have no 
thread to interrupt).
-  @volatile private var _cancelled: Boolean = false
-
-  @volatile private var jobs: Seq[Int] = Nil
+  @volatile private var subActions: List[FutureAction[_]] = Nil
 
   // A promise used to signal the future.
-  private val p = promise[T]()
+  private val p = Promise[T]()
 
-  override def cancel(): Unit = this.synchronized {
+  override def cancel(): Unit = synchronized {
 _cancelled = true
-if (thread != null) {
-  thread.interrupt()
-}
+p.tryFailure(new SparkException("Action has been cancelled"))
+subActions.foreach {_.cancel()}
--- End diff --

I personally like the curly braces because it makes it clear that it's a 
code block, but I'll change it anyway, because IntelliJ tends to get a little 
trigger-happy with auto-formatting when it sees them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864566
  
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -276,10 +219,11 @@ class ComplexFutureAction[T] extends FutureAction[T] {
 
   override def value: Option[Try[T]] = p.future.value
 
-  def jobIds: Seq[Int] = jobs
+  def jobIds: Seq[Int] = subActions flatMap {_.jobIds}
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864560
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -66,14 +65,22 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) 
extends Serializable with Loggi
*/
   def takeAsync(num: Int): FutureAction[Seq[T]] = self.withScope {
 val f = new ComplexFutureAction[Seq[T]]
-
-f.run {
-  // This is a blocking action so we should use 
"AsyncRDDActions.futureExecutionContext" which
-  // is a cached thread pool.
-  val results = new ArrayBuffer[T](num)
-  val totalParts = self.partitions.length
-  var partsScanned = 0
-  while (results.size < num && partsScanned < totalParts) {
+// Cached thread pool to handle aggregation of subtasks.
+implicit val executionContext = AsyncRDDActions.futureExecutionContext
+val results = new ArrayBuffer[T](num)
+val totalParts = self.partitions.length
+
+/*
+  Recursively triggers jobs to scan partitions until either the 
requested
+  number of elements are retrieved, or the partitions to scan are 
exhausted.
+  This implementation is non-blocking, asynchronously handling the
+  results of each job and triggering the next job using callbacks on 
futures.
+ */
+def continue(partsScanned : Int) : Future[Seq[T]] =
+  if (results.size >= num || partsScanned >= totalParts) {
+Future.successful(results.toSeq)
+  }
--- End diff --

Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-14 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44864546
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -95,19 +102,18 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) 
extends Serializable with Loggi
 val p = partsScanned until math.min(partsScanned + numPartsToTry, 
totalParts)
 
 val buf = new Array[Array[T]](p.size)
-f.runJob(self,
+val job = f.submitJob(self,
   (it: Iterator[T]) => it.take(left).toArray,
   p,
   (index: Int, data: Array[T]) => buf(index) = data,
   Unit)
-
-buf.foreach(results ++= _.take(num - results.size))
-partsScanned += numPartsToTry
+job.flatMap {case _ =>
--- End diff --

Removed the `case`, but retained the `_`, as I think it makes it clear that 
we're not using the argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11295 Add packages to JUnit output for P...

2015-11-14 Thread gliptak

Github user gliptak commented on the pull request:

https://github.com/apache/spark/pull/9263#issuecomment-156748411
  
Unit test timed out?
```
==
ERROR: test_kafka_direct_stream (pyspark.streaming.tests.KafkaStreamTests)
Test the Python direct Kafka stream API.
--
Traceback (most recent call last):
  File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/streaming/tests.py",
 line 753, in setUp
self._kafkaTestUtils.setup()
  File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
 line 813, in __call__
answer, self.gateway_client, self.target_id, self.name)
  File 
"/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.9-src.zip/py4j/protocol.py",
 line 308, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o10749.setup.
: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to 
zookeeper server within timeout: 6000
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
at org.I0Itec.zkclient.ZkClient.(ZkClient.java:98)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10666][SPARK-6880][CORE] Use properties...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6291#issuecomment-156746446
  
**[Test build #45934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45934/consoleFull)**
 for PR 6291 at commit 
[`2ff80ca`](https://github.com/apache/spark/commit/2ff80ca84f501aeed7b8d91c81b8d5bb4c7e707c).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10666][SPARK-6880][CORE] Use properties...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6291#issuecomment-156746483
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10666][SPARK-6880][CORE] Use properties...

2015-11-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6291#issuecomment-156746484
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45934/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9272#discussion_r44863760
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -20,6 +20,8 @@ package org.apache.spark.sql.hive.client
 import java.io.{File, PrintStream}
 import java.util.{Map => JMap}
 
+import org.apache.hadoop.security.UserGroupInformation
--- End diff --

Let's move this import down to the place where we have other hadoop related 
imports. 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports
 is the doc about import ordering.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156746093
  
@yolandagao Thank you for the update. Overall looks good. Left two comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9272#discussion_r44863646
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -521,8 +521,18 @@ object SparkSubmit {
 sysProps.put("spark.yarn.isPython", "true")
   }
   if (args.principal != null) {
-require(args.keytab != null, "Keytab must be specified when the 
keytab is specified")
-UserGroupInformation.loginUserFromKeytab(args.principal, 
args.keytab)
+require(args.keytab != null, "Keytab must be specified when 
principal is specified")
+if (!new File(args.keytab).exists()) {
+  throw new SparkException(s"Keytab file: ${args.keytab} does not 
exist")
+} else {
+  // Add keytab and principal configurations in sysProps to make 
them available
+  // for later use (e.g. by spark sql). These Configurations will 
be set as
--- End diff --

Let's be more specific on the use case of spark sql. Let's say the isolated 
class loader used to talk to HiveMetastore will use these settings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9272#discussion_r44863636
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala ---
@@ -149,6 +151,26 @@ private[hive] class ClientWrapper(
 val original = Thread.currentThread().getContextClassLoader
 // Switch to the initClassLoader.
 Thread.currentThread().setContextClassLoader(initClassLoader)
+
+// Set up kerberos credentials for UserGroupInformation.loginUser 
within
+// current class loader
+// Instead of using the spark conf of the current spark context, a new 
instance of
+// SparkConf is needed for the original value of spark.yarn.keytab 
specified by user,
+// as yarn.Client resets it for the link name in distributed cache
+val sparkConf = new SparkConf
+if (sparkConf.contains("spark.yarn.principal") && 
sparkConf.contains("spark.yarn.keytab")) {
--- End diff --

Let's make it clear that we set these two settings in SparkSubmit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9928][SQL] Removal of LogicalLocalTable...

2015-11-14 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/9717#issuecomment-156745623
  
Another case failed due to the same reasons. 
```
[error] Test 
org.apache.spark.ml.util.JavaDefaultReadWriteSuite.testDefaultReadWrite failed: 
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
```

Timing issues? Or introduced by a recent merge? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156745289
  
**[Test build #45937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45937/consoleFull)**
 for PR 9272 at commit 
[`1fbc372`](https://github.com/apache/spark/commit/1fbc3724d3ba7b90a1c412ac3d85d7a7eec9304c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 209 matches

Mail list logo