from:"felixcheung"

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030350
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
+df
+
+##+-+---+  
   
+##|eruptions|waiting|
+##+-+---+
+##|  3.6|   79.0|
+##|  1.8|   54.0|
+##|3.333|   74.0|
+##|2.283|   62.0|
+##|4.533|   85.0|
+##|2.883|   55.0|
+##|  4.7|   88.0|
+##|  3.6|   85.0|
+##| 1.95|   51.0|
+##| 4.35|   85.0|
+##+-+---+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that the `SparkSession` created by `sparkR` shell does not have eager 
execution enabled. You can stop the current session and start up a new session 
like above to enable.
--- End diff --

actually I think the suggestion should be to set that in the `sparkR` 
command line as spark conf?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030512
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
--- End diff --

use a different dataset that does not require `suppressWarnings`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030211
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
--- End diff --

we could also start here by saying "similar to R data.frame`...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030277
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
+
+# Instead of displaying the SparkDataFrame class, displays the data 
returned
+df
+
+##+-+---+  
   
+##|eruptions|waiting|
+##+-+---+
+##|  3.6|   79.0|
+##|  1.8|   54.0|
+##|3.333|   74.0|
+##|2.283|   62.0|
+##|4.533|   85.0|
+##|2.883|   55.0|
+##|  4.7|   88.0|
+##|  3.6|   85.0|
+##| 1.95|   51.0|
+##| 4.35|   85.0|
+##+-+---+
+##only showing top 10 rows
+
+{% endhighlight %} 
+
+
+Note that the `SparkSession` created by `sparkR` shell does not have eager 
execution enabled. You can stop the current session and start up a new session 
like above to enable.
--- End diff --

change to `Note that the `SparkSession` created by `sparkR` shell by 
default does not `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219029847
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
--- End diff --

should be `` I think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030474
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
--- End diff --

I'm neutral, should these tests be in test_sparkSQL.R? it takes longer to 
run with many test files


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030085
  
--- Diff: docs/sparkr.md ---
@@ -450,6 +450,42 @@ print(model.summaries)
 {% endhighlight %}
 
 
+### Eager execution
+
+If the eager execution is enabled, the data will be returned to R client 
immediately when the `SparkDataFrame` is created. Eager execution can be 
enabled by setting the configuration property 
`spark.sql.repl.eagerEval.enabled` to `true` when the `SparkSession` is started 
up.
+
+
+{% highlight r %}
+
+# Start up spark session with eager execution enabled
+sparkR.session(master = "local[*]", sparkConfig = 
list(spark.sql.repl.eagerEval.enabled = "true"))
+
+df <- createDataFrame(faithful)
--- End diff --

perhaps a more complete example - like `summarize(groupBy(df, df$waiting), 
count = n(df$waiting))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...

2018-09-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22455#discussion_r219030537
  
--- Diff: R/pkg/tests/fulltests/test_eager_execution.R ---
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+library(testthat)
+
+context("Show SparkDataFrame when eager execution is enabled.")
+
+test_that("eager execution is not enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster)
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
+  expect_is(df, "SparkDataFrame")
+  expected <- "Sepal_Length:double, Sepal_Width:double, 
Petal_Length:double, Petal_Width:double, Species:string"
+  expect_output(show(df), expected)
+  
+  # Stop Spark session
+  sparkR.session.stop()
+})
+
+test_that("eager execution is enabled", {
+  # Start Spark session without eager execution enabled
+  sparkSession <- if (windows_with_hadoop()) {
+sparkR.session(master = sparkRTestMaster,
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true"))
+  } else {
+sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
+   sparkConfig = list(spark.sql.repl.eagerEval.enabled = 
"true"))
+  }
+  
+  df <- suppressWarnings(createDataFrame(iris))
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-19 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22379
  
think maybe someone to review the SQL stuff more?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-16 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r217953294
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1803,6 +1803,18 @@ test_that("string operators", {
 collect(select(df4, split_string(df4$a, "")))[1, 1],
 list(list("a.b@c.d   1", "b"))
   )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "\\.", 2)))[1, 1],
+list(list("a", "b@c.d   1\\b"))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "b", -2)))[1, 1],
+list(list("a.", "@c.d   1\\", ""))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "b", 0)))[1, 1],
--- End diff --

for context, we've had some cases in the past the wrong value is passed for 
an parameter - so let's at least get one with and one without any optional 
parameter


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [MINOR][DOCS] Axe deprecated doc refs

2018-09-16 Thread felixcheung

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 60af706b4 -> 1cb1e4301


[MINOR][DOCS] Axe deprecated doc refs

Continuation of #22370. Summary of discussion there:

There is some inconsistency in the R manual w.r.t. supercedent functions 
linking back to deprecated functions.

 - `createOrReplaceTempView` and `createTable` both link back to functions 
which are deprecated (`registerTempTable` and `createExternalTable`, 
respectively)
 - `sparkR.session` and `dropTempView` do _not_ link back to deprecated 
functions

This PR takes the view that it is preferable _not_ to link back to deprecated 
functions, and removes these references from `?createOrReplaceTempView` and 
`?createTable`.

As `registerTempTable` was included in the `SparkDataFrame functions` `family` 
of functions, other documentation pages which included a link to 
`?registerTempTable` will similarly be altered.

Author: Michael Chirico 
Author: Michael Chirico 

Closes #22393 from MichaelChirico/axe_deprecated_doc_refs.

(cherry picked from commit a1dd78255a3ae023820b2f245cd39f0c57a32fb1)
Signed-off-by: Felix Cheung 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1cb1e430
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1cb1e430
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1cb1e430

Branch: refs/heads/branch-2.4
Commit: 1cb1e43012e57e649d77524f8ff2de231f52c66a
Parents: 60af706
Author: Michael Chirico 
Authored: Sun Sep 16 12:57:44 2018 -0700
Committer: Felix Cheung 
Committed: Sun Sep 16 12:58:04 2018 -0700

--
 R/pkg/R/DataFrame.R | 1 -
 R/pkg/R/catalog.R   | 1 -
 2 files changed, 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1cb1e430/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 4f2d4c7..458deca 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -503,7 +503,6 @@ setMethod("createOrReplaceTempView",
 #' @param x A SparkDataFrame
 #' @param tableName A character vector containing the name of the table
 #'
-#' @family SparkDataFrame functions
 #' @seealso \link{createOrReplaceTempView}
 #' @rdname registerTempTable-deprecated
 #' @name registerTempTable

http://git-wip-us.apache.org/repos/asf/spark/blob/1cb1e430/R/pkg/R/catalog.R
--
diff --git a/R/pkg/R/catalog.R b/R/pkg/R/catalog.R
index baf4d86..c2d0fc3 100644
--- a/R/pkg/R/catalog.R
+++ b/R/pkg/R/catalog.R
@@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) {
 #' @param ... additional named parameters as options for the data source.
 #' @return A SparkDataFrame.
 #' @rdname createTable
-#' @seealso \link{createExternalTable}
 #' @examples
 #'\dontrun{
 #' sparkR.session()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #22393: [MINOR][DOCS] Axe deprecated doc refs

2018-09-16 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22393
  
thx. merged to master/2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [MINOR][DOCS] Axe deprecated doc refs

2018-09-16 Thread felixcheung

Repository: spark
Updated Branches:
  refs/heads/master bfcf74260 -> a1dd78255


[MINOR][DOCS] Axe deprecated doc refs

Continuation of #22370. Summary of discussion there:

There is some inconsistency in the R manual w.r.t. supercedent functions 
linking back to deprecated functions.

 - `createOrReplaceTempView` and `createTable` both link back to functions 
which are deprecated (`registerTempTable` and `createExternalTable`, 
respectively)
 - `sparkR.session` and `dropTempView` do _not_ link back to deprecated 
functions

This PR takes the view that it is preferable _not_ to link back to deprecated 
functions, and removes these references from `?createOrReplaceTempView` and 
`?createTable`.

As `registerTempTable` was included in the `SparkDataFrame functions` `family` 
of functions, other documentation pages which included a link to 
`?registerTempTable` will similarly be altered.

Author: Michael Chirico 
Author: Michael Chirico 

Closes #22393 from MichaelChirico/axe_deprecated_doc_refs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1dd7825
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1dd7825
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1dd7825

Branch: refs/heads/master
Commit: a1dd78255a3ae023820b2f245cd39f0c57a32fb1
Parents: bfcf742
Author: Michael Chirico 
Authored: Sun Sep 16 12:57:44 2018 -0700
Committer: Felix Cheung 
Committed: Sun Sep 16 12:57:44 2018 -0700

--
 R/pkg/R/DataFrame.R | 1 -
 R/pkg/R/catalog.R   | 1 -
 2 files changed, 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a1dd7825/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 4f2d4c7..458deca 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -503,7 +503,6 @@ setMethod("createOrReplaceTempView",
 #' @param x A SparkDataFrame
 #' @param tableName A character vector containing the name of the table
 #'
-#' @family SparkDataFrame functions
 #' @seealso \link{createOrReplaceTempView}
 #' @rdname registerTempTable-deprecated
 #' @name registerTempTable

http://git-wip-us.apache.org/repos/asf/spark/blob/a1dd7825/R/pkg/R/catalog.R
--
diff --git a/R/pkg/R/catalog.R b/R/pkg/R/catalog.R
index baf4d86..c2d0fc3 100644
--- a/R/pkg/R/catalog.R
+++ b/R/pkg/R/catalog.R
@@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) {
 #' @param ... additional named parameters as options for the data source.
 #' @return A SparkDataFrame.
 #' @rdname createTable
-#' @seealso \link{createExternalTable}
 #' @examples
 #'\dontrun{
 #' sparkR.session()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #22393: [MINOR][DOCS] Axe deprecated doc refs

2018-09-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22393
  
yes please - please double the doc created looks correct - there is no 
automatic test for that


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21515: [SPARK-24372][build] Add scripts to help with preparing ...

2018-09-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21515
  
UID already exists?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r217901635
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1803,6 +1803,10 @@ test_that("string operators", {
 collect(select(df4, split_string(df4$a, "")))[1, 1],
 list(list("a.b@c.d   1", "b"))
   )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "\\.", 2)))[1, 1],
+list(list("a", "b@c.d   1\\b"))
--- End diff --

let's add a test for `limit = 0` or `limit = -1` too - while it's the 
default value, is any of the test cases changes behavior for limit = -1?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r217901558
  
--- Diff: R/pkg/NAMESPACE ---
@@ -275,6 +275,7 @@ exportMethods("%<=>%",
   "format_number",
   "format_string",
   "from_json",
+  "from_csv",
--- End diff --

pleas sort this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-15 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r217901588
  
--- Diff: R/pkg/R/functions.R ---
@@ -2202,6 +2208,24 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{from_csv}: Parses a column containing a CSV string into a Column 
of \code{structType}
+#' with the specified \code{schema}.
+#' If the string is unparseable, the Column will contain the value NA.
+#'
+#' @rdname column_collection_functions
+#' @aliases from_csv from_csv,Column,character-method
+#'
--- End diff --

newline with `#'` is significant in ROxygen, please remove this line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22393: [MINOR][DOCS] Axe deprecated doc refs

2018-09-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22393
  
could you check the doc output manually for registerTempTable and 
createTable?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22379
  
see comment above/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-11 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r216875875
  
--- Diff: R/pkg/R/functions.R ---
@@ -3720,3 +3720,22 @@ setMethod("current_timestamp",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"current_timestamp")
 column(jc)
   })
+
+#' @details
+#' \code{from_csv}: Parses a column containing a CSV string into a Column 
of \code{structType}
+#' with the specified \code{schema}.
+#' If the string is unparseable, the Column will contain the value NA.
+#'
+#' @rdname column_collection_functions
+#' @param schema a DDL-formatted string
+#' @aliases from_csv from_csv,Column,character-method
+#'
+#' @note from_csv since 3.0.0
+setMethod("from_csv", signature(x = "Column", schema = "character"),
+  function(x, schema, ...) {
--- End diff --

here 
https://github.com/apache/spark/blob/d2bfd9430f05d006accdecb6a62ed659fbd6a2f8/R/pkg/R/functions.R#L199


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-11 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r216875804
  
--- Diff: R/pkg/R/functions.R ---
@@ -3720,3 +3720,22 @@ setMethod("current_timestamp",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"current_timestamp")
 column(jc)
   })
+
+#' @details
+#' \code{from_csv}: Parses a column containing a CSV string into a Column 
of \code{structType}
+#' with the specified \code{schema}.
+#' If the string is unparseable, the Column will contain the value NA.
+#'
+#' @rdname column_collection_functions
+#' @param schema a DDL-formatted string
+#' @aliases from_csv from_csv,Column,character-method
+#'
+#' @note from_csv since 3.0.0
+setMethod("from_csv", signature(x = "Column", schema = "character"),
+  function(x, schema, ...) {
--- End diff --

no no, this will break - I am referring to find the original doc `@rdname 
column_collection_functions` that has `...` already documented, and then add 
this in


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22376
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21710
  
I think we missed the windows before the branch, I'll review in a few days


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22192
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-09-10 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21649#discussion_r216539767
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3939,7 +3929,15 @@ setMethod("hint",
   signature(x = "SparkDataFrame", name = "character"),
   function(x, name, ...) {
 parameters <- list(...)
-stopifnot(all(sapply(parameters, isTypeAllowedForSqlHint)))
+stopifnot(all(sapply(parameters, function(x) {
--- End diff --

if recall, let's not have a inside scope with the same variable name `x` in 
the outer scope?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22370: don't link to deprecated function

2018-09-10 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22370#discussion_r216539411
  
--- Diff: R/pkg/R/catalog.R ---
@@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) {
 #' @param ... additional named parameters as options for the data source.
 #' @return A SparkDataFrame.
 #' @rdname createTable
-#' @seealso \link{createExternalTable}
--- End diff --

`registerTempTable` is because of the `@family` tag, so it's a bit 
different.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22379#discussion_r216538924
  
--- Diff: R/pkg/R/functions.R ---
@@ -3720,3 +3720,22 @@ setMethod("current_timestamp",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"current_timestamp")
 column(jc)
   })
+
+#' @details
+#' \code{from_csv}: Parses a column containing a CSV string into a Column 
of \code{structType}
+#' with the specified \code{schema}.
+#' If the string is unparseable, the Column will contain the value NA.
+#'
+#' @rdname column_collection_functions
+#' @param schema a DDL-formatted string
+#' @aliases from_csv from_csv,Column,character-method
+#'
+#' @note from_csv since 3.0.0
+setMethod("from_csv", signature(x = "Column", schema = "character"),
+  function(x, schema, ...) {
--- End diff --

can you add to the doc for `...` (in column_collection_functions) to 
indicate the use options for this function? if there is anything new?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22357
  
if recall, parquet reader can have filter pushdown? only not so in spark 
parquet data source?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22376
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22370: don't link to deprecated function

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22370
  
I donât feel strongly either way.

I do think this is very minor since there are still many other ways to the 
doc page for createExternalTable (eg the index page) or via ? search within R 
etc. I am not sure how much difference this would make and we have already a) 
code spewing out warning when called b) clearly documented as Deprecated on the 
doc page title.

Should you find other deprecation that is not documentation we should be 
gladly having your help
to document it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21649: [SPARK-23648][R][SQL]Adds more types for hint in SparkR

2018-09-10 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21649
  
Right - I think we could inline it or simplify it further.






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22372: [SPARK-25385][BUILD] Upgrade Hadoop 3.1 jackson version ...

2018-09-09 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22372
  
do we have jenkins tests for 3.1 profile?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22371: [SPARK-25386][CORE] Don't need to synchronize the IndexS...

2018-09-09 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22371
  
+ @srowen @squito @JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotil CompressionCode...

2018-09-09 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22358#discussion_r216165218
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -398,10 +398,10 @@ object SQLConf {
   "`parquet.compression` is specified in the table-specific 
options/properties, the " +
   "precedence would be `compression`, `parquet.compression`, " +
   "`spark.sql.parquet.compression.codec`. Acceptable values include: 
none, uncompressed, " +
-  "snappy, gzip, lzo, brotli, lz4, zstd.")
+  "snappy, gzip, lzo, lz4.")
 .stringConf
 .transform(_.toLowerCase(Locale.ROOT))
-.checkValues(Set("none", "uncompressed", "snappy", "gzip", "lzo", 
"lz4", "brotli", "zstd"))
--- End diff --

I thought if you remove it from here the user would not be able to use zstd 
or brotli even if it is installed/enabled/available?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.memory lim...

2018-09-09 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22298
  
+1 for 2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-09-08 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21649#discussion_r216122842
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3905,6 +3905,16 @@ setMethod("rollup",
 groupedData(sgd)
   })
 
+isTypeAllowedForSqlHint <- function(x) {
+  if (is.character(x) || is.numeric(x)) {
+TRUE
+  } else if (is.list(x)) {
+all(sapply(x, (function(y) is.character(y) || is.numeric(y
--- End diff --

also, if it is a `list` could we clarify if it is supposed to work with 
multiple hint in different types in that list (this might be "unique" to R), 
for example

```
> x <- list("a", 3)
> all(sapply(x, function(y) { is.character(y) || is.numeric(y) } ))
[1] TRUE

> x <- list("a", NA)
> all(sapply(x, function(y) { is.character(y) || is.numeric(y) } ))
[1] FALSE
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...

2018-09-08 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21649#discussion_r216122804
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3905,6 +3905,16 @@ setMethod("rollup",
 groupedData(sgd)
   })
 
+isTypeAllowedForSqlHint <- function(x) {
+  if (is.character(x) || is.numeric(x)) {
+TRUE
+  } else if (is.list(x)) {
+all(sapply(x, (function(y) is.character(y) || is.numeric(y
--- End diff --

I look into this more deeply, I think this style seems a bit odd, as a nit, 
I think this should be

`all(sapply(x, function(y) { is.character(y) || is.numeric(y) } ))` 

think it's more readable this way. also see L2458 for an example 
https://github.com/apache/spark/blob/aec391c9dcb6362874736e663d435f9dd8400125/R/pkg/R/DataFrame.R#L2458


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22144: [SPARK-24935][SQL] : Problem with Executing Hive UDF's f...

2018-09-08 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22144
  
hey, this looks important, could someone review this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22362: [SPARK-25372][YARN][K8S] Deprecate and generalize...

2018-09-08 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22362#discussion_r216122659
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -199,8 +199,8 @@ private[deploy] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, S
 numExecutors = Option(numExecutors)
   .getOrElse(sparkProperties.get("spark.executor.instances").orNull)
 queue = 
Option(queue).orElse(sparkProperties.get("spark.yarn.queue")).orNull
-keytab = 
Option(keytab).orElse(sparkProperties.get("spark.yarn.keytab")).orNull
-principal = 
Option(principal).orElse(sparkProperties.get("spark.yarn.principal")).orNull
+keytab = 
Option(keytab).orElse(sparkProperties.get("spark.kerberos.keytab")).orNull
--- End diff --

agreed, shouldn't the "old" config still work? `spark.yarn.keytab` etc


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-08 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r216122621
  
--- Diff: R/pkg/R/functions.R ---
@@ -3404,19 +3404,24 @@ setMethod("collect_set",
 #' Equivalent to \code{split} SQL function.
 #'
 #' @rdname column_string_functions
+#' @param limit determines the size of the returned array. If `limit` is 
positive,
+#'size of the array will be at most `limit`. If `limit` is 
negative, the
--- End diff --

you can't use backtick in R doc


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22335: [SPARK-25091][SQL] reduce the storage memory in Executor...

2018-09-06 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22335
  
please fix the description for this PR - the top part contains the 
truncated title


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22145: [SPARK-25152][K8S] Enable SparkR Integration Tests for K...

2018-09-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22145
  
what's the latest on this, btw?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-09-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22274
  
merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql tests (timestamp comparison)

2018-09-03 Thread felixcheung

Repository: spark
Updated Branches:
  refs/heads/master 64bbd134e -> 39d3d6cc9


[SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql tests (timestamp 
comparison)

## What changes were proposed in this pull request?
The "date function on DataFrame" test fails consistently on my laptop. In this 
PR
i am fixing it by changing the way we compare the two timestamp values. With 
this change i am able to run the tests clean.

## How was this patch tested?
Fixed the failing test.

Author: Dilip Biswal 

Closes #22274 from dilipbiswal/r-sql-test-fix2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/39d3d6cc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/39d3d6cc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/39d3d6cc

Branch: refs/heads/master
Commit: 39d3d6cc965bd09b1719d245e672b013b8cee6f7
Parents: 64bbd13
Author: Dilip Biswal 
Authored: Mon Sep 3 00:38:08 2018 -0700
Committer: Felix Cheung 
Committed: Mon Sep 3 00:38:08 2018 -0700

--
 R/pkg/tests/fulltests/test_sparkSQL.R | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/39d3d6cc/R/pkg/tests/fulltests/test_sparkSQL.R
--
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index 17e4a97..5c07a02 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -1870,9 +1870,9 @@ test_that("date functions on a DataFrame", {
   expect_equal(collect(select(df2, minute(df2$b)))[, 1], c(34, 24))
   expect_equal(collect(select(df2, second(df2$b)))[, 1], c(0, 34))
   expect_equal(collect(select(df2, from_utc_timestamp(df2$b, "JST")))[, 1],
-   c(as.POSIXlt("2012-12-13 21:34:00 UTC"), as.POSIXlt("2014-12-15 
10:24:34 UTC")))
+   c(as.POSIXct("2012-12-13 21:34:00 UTC"), as.POSIXct("2014-12-15 
10:24:34 UTC")))
   expect_equal(collect(select(df2, to_utc_timestamp(df2$b, "JST")))[, 1],
-   c(as.POSIXlt("2012-12-13 03:34:00 UTC"), as.POSIXlt("2014-12-14 
16:24:34 UTC")))
+   c(as.POSIXct("2012-12-13 03:34:00 UTC"), as.POSIXct("2014-12-14 
16:24:34 UTC")))
   expect_gt(collect(select(df2, unix_timestamp()))[1, 1], 0)
   expect_gt(collect(select(df2, unix_timestamp(df2$b)))[1, 1], 0)
   expect_gt(collect(select(df2, unix_timestamp(lit("2015-01-01"), 
"-MM-dd")))[1, 1], 0)
@@ -3652,7 +3652,8 @@ test_that("catalog APIs, currentDatabase, 
setCurrentDatabase, listDatabases", {
   expect_equal(currentDatabase(), "default")
   expect_error(setCurrentDatabase("default"), NA)
   expect_error(setCurrentDatabase("zxwtyswklpf"),
-"Error in setCurrentDatabase : analysis error - Database 'zxwtyswklpf' 
does not exist")
+   paste0("Error in setCurrentDatabase : analysis error - Database 
",
+   "'zxwtyswklpf' does not exist"))
   dbs <- collect(listDatabases())
   expect_equal(names(dbs), c("name", "description", "locationUri"))
   expect_equal(which(dbs[, 1] == "default"), 1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-09-03 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22274
  
possible - but since this passes for you and in jenkins/appveyor you change 
seem to work both ways, which is good enough for me


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-09-02 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22298#discussion_r214550394
  
--- Diff: examples/src/main/python/worker_memory_check.py ---
@@ -0,0 +1,47 @@
+#
--- End diff --

I think the concern here is shipping a test as an example - this is the 
place where dev will be looking for example on how to use pyspark and having a 
memory test there is a bit strange.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-09-02 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22274
  
interesting. maybe something to do with newer R release - I scanned through 
the rel note though but didn't find what might be related.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22291: [SPARK-25007][R]Add array_intersect/array_except/array_u...

2018-09-02 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22291
  
merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-25007][R] Add array_intersect/array_except/array_union/shuffle to SparkR

2018-09-02 Thread felixcheung

Repository: spark
Updated Branches:
  refs/heads/master a3dccd24c -> a481794ca


[SPARK-25007][R] Add array_intersect/array_except/array_union/shuffle to SparkR

## What changes were proposed in this pull request?

Add the R version of array_intersect/array_except/array_union/shuffle

## How was this patch tested?
Add test in test_sparkSQL.R

Author: Huaxin Gao 

Closes #22291 from huaxingao/spark-25007.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a481794c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a481794c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a481794c

Branch: refs/heads/master
Commit: a481794ca9a5edb87982679cd0e95146f668fe78
Parents: a3dccd2
Author: Huaxin Gao 
Authored: Sun Sep 2 00:06:19 2018 -0700
Committer: Felix Cheung 
Committed: Sun Sep 2 00:06:19 2018 -0700

--
 R/pkg/NAMESPACE   |  4 ++
 R/pkg/R/functions.R   | 59 +-
 R/pkg/R/generics.R| 16 
 R/pkg/tests/fulltests/test_sparkSQL.R | 19 ++
 4 files changed, 97 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a481794c/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 0fd0848..96ff389 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -204,6 +204,8 @@ exportMethods("%<=>%",
   "approxQuantile",
   "array_contains",
   "array_distinct",
+  "array_except",
+  "array_intersect",
   "array_join",
   "array_max",
   "array_min",
@@ -212,6 +214,7 @@ exportMethods("%<=>%",
   "array_repeat",
   "array_sort",
   "arrays_overlap",
+  "array_union",
   "arrays_zip",
   "asc",
   "ascii",
@@ -355,6 +358,7 @@ exportMethods("%<=>%",
   "shiftLeft",
   "shiftRight",
   "shiftRightUnsigned",
+  "shuffle",
   "sd",
   "sign",
   "signum",

http://git-wip-us.apache.org/repos/asf/spark/blob/a481794c/R/pkg/R/functions.R
--
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index 2929a00..d157acc 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -208,7 +208,7 @@ NULL
 #' # Dataframe used throughout this doc
 #' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
 #' tmp <- mutate(df, v1 = create_array(df$mpg, df$cyl, df$hp))
-#' head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1)))
+#' head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1), shuffle(tmp$v1)))
 #' head(select(tmp, array_max(tmp$v1), array_min(tmp$v1), 
array_distinct(tmp$v1)))
 #' head(select(tmp, array_position(tmp$v1, 21), array_repeat(df$mpg, 3), 
array_sort(tmp$v1)))
 #' head(select(tmp, flatten(tmp$v1), reverse(tmp$v1), array_remove(tmp$v1, 
21)))
@@ -223,6 +223,8 @@ NULL
 #' head(select(tmp3, element_at(tmp3$v3, "Valiant")))
 #' tmp4 <- mutate(df, v4 = create_array(df$mpg, df$cyl), v5 = 
create_array(df$cyl, df$hp))
 #' head(select(tmp4, concat(tmp4$v4, tmp4$v5), arrays_overlap(tmp4$v4, 
tmp4$v5)))
+#' head(select(tmp4, array_except(tmp4$v4, tmp4$v5), array_intersect(tmp4$v4, 
tmp4$v5)))
+#' head(select(tmp4, array_union(tmp4$v4, tmp4$v5)))
 #' head(select(tmp4, arrays_zip(tmp4$v4, tmp4$v5), map_from_arrays(tmp4$v4, 
tmp4$v5)))
 #' head(select(tmp, concat(df$mpg, df$cyl, df$hp)))
 #' tmp5 <- mutate(df, v6 = create_array(df$model, df$model))
@@ -3025,6 +3027,34 @@ setMethod("array_distinct",
   })
 
 #' @details
+#' \code{array_except}: Returns an array of the elements in the first array 
but not in the second
+#'  array, without duplicates. The order of elements in the result is not 
determined.
+#'
+#' @rdname column_collection_functions
+#' @aliases array_except array_except,Column-method
+#' @note array_except since 2.4.0
+setMethod("array_except",
+  signature(x = "Column", y = "Column"),
+  function(x, y) {
+jc <- callJStatic("org.apache.spark.sql.functions", 
"array_except", x@jc, y@jc)
+column(jc)
+  })
+
+#' @details
+#' \code{array_intersect}: Returns an array of the elements in the 
intersection of the given two
+#'  arrays, without duplicates.
+#'
+#' @rdname column_collection_functions
+#' @aliases array_intersect array_intersect,Column-method
+#' @note array_intersect since 2.4.0
+setMethod("array_intersect",
+  signature(x = "Column", y = "Column"),
+  function(x, y) {
+jc <- callJStatic("org.apache.spark.sql.functions", 
"array_intersect", x@jc, y@jc)
+column(jc)

zeppelin git commit: [ZEPPELIN-3753] Fix indent with TAB

2018-09-02 Thread felixcheung

Repository: zeppelin
Updated Branches:
  refs/heads/master 26b554d64 -> 57601f819


[ZEPPELIN-3753] Fix indent with TAB

### What is this PR for?
Now when you select multiline text and press TAB, text replaces with "\t" char.
With this PR text just shift right if TAB have been pressed.

### What type of PR is it?
Bug Fix

### What is the Jira issue?
 
[ZEPPELIN-3753](https://issues.apache.org/jira/projects/ZEPPELIN/issues/ZEPPELIN-3753)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: oxygen311 

Closes #3168 from oxygen311/DW-18011 and squashes the following commits:

941b832 [oxygen311] Fix indent with TAB


Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/57601f81
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/57601f81
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/57601f81

Branch: refs/heads/master
Commit: 57601f819977063d622e3acbcc2f2b8710087697
Parents: 26b554d
Author: oxygen311 
Authored: Wed Aug 29 17:33:51 2018 +0300
Committer: Felix Cheung 
Committed: Sat Sep 1 23:50:32 2018 -0700

--
 zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/zeppelin/blob/57601f81/zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js
--
diff --git a/zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js 
b/zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js
index 9a766de..1a1569a 100644
--- a/zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js
+++ b/zeppelin-web/src/app/notebook/paragraph/paragraph.controller.js
@@ -930,7 +930,7 @@ function ParagraphCtrl($scope, $rootScope, $route, $window, 
$routeParams, $locat
 $scope.editor.execCommand('startAutocomplete');
   } else {
 ace.config.loadModule('ace/ext/language_tools', function() {
-  $scope.editor.insertSnippet('\t');
+  $scope.editor.indent();
 });
   }
 },

[GitHub] spark issue #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes for R sql...

2018-09-02 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22274
  
maybe also your laptop's system time zone? could you also check that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22295: [SPARK-25255][PYTHON]Add getActiveSession to Spar...

2018-09-02 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22295#discussion_r214530177
  
--- Diff: python/pyspark/sql/session.py ---
@@ -252,6 +252,16 @@ def newSession(self):
 """
 return self.__class__(self._sc, self._jsparkSession.newSession())
 
+@since(2.4)
+def getActiveSession(self):
+"""
+Returns the active SparkSession for the current thread, returned 
by the builder.
+>>> s = spark.getActiveSession()
+>>> spark._jsparkSession.getDefaultSession().get().equals(s.get())
--- End diff --

..and probably shouldn't access `_jsparkSession`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22298: [SPARK-25021][K8S] Add spark.executor.pyspark.mem...

2018-09-02 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22298#discussion_r214530079
  
--- Diff: examples/src/main/python/worker_memory_check.py ---
@@ -0,0 +1,47 @@
+#
--- End diff --

shouldn't this be in python tests (and get it to run only certain cluster 
manager)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-02 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214529571
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1669,20 +1669,36 @@ def repeat(col, n):
 return Column(sc._jvm.functions.repeat(_to_java_column(col), n))
 
 
-@since(1.5)
+@since(2.4)
 @ignore_unicode_prefix
-def split(str, pattern):
-"""
-Splits str around pattern (pattern is a regular expression).
-
-.. note:: pattern is a string represent the regular expression.
-
->>> df = spark.createDataFrame([('ab12cd',)], ['s',])
->>> df.select(split(df.s, '[0-9]+').alias('s')).collect()
-[Row(s=[u'ab', u'cd'])]
-"""
-sc = SparkContext._active_spark_context
-return Column(sc._jvm.functions.split(_to_java_column(str), pattern))
+def split(str, regex, limit=-1):
--- End diff --

yes, `regex` is the part breaking..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3168: [ZEPPELIN-3753] Fix indent with TAB

2018-08-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3168
  
merging if no more comment


---

[GitHub] spark issue #21743: [SPARK-24767][Launcher] Propagate MDC to spark-submit th...

2018-08-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21743
  
also, I don't recall anywhere in spark that depends/sets MDC...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18877: [SPARK-17742][core] Handle child process exit in SparkLa...

2018-08-31 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18877
  
yes @danelkotev `asfgit closed this in cba826d on Aug 15, 2017`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214244981
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1669,20 +1669,36 @@ def repeat(col, n):
 return Column(sc._jvm.functions.repeat(_to_java_column(col), n))
 
 
-@since(1.5)
+@since(2.4)
 @ignore_unicode_prefix
-def split(str, pattern):
-"""
-Splits str around pattern (pattern is a regular expression).
-
-.. note:: pattern is a string represent the regular expression.
-
->>> df = spark.createDataFrame([('ab12cd',)], ['s',])
->>> df.select(split(df.s, '[0-9]+').alias('s')).collect()
-[Row(s=[u'ab', u'cd'])]
-"""
-sc = SparkContext._active_spark_context
-return Column(sc._jvm.functions.split(_to_java_column(str), pattern))
+def split(str, regex, limit=-1):
--- End diff --

this would be a breaking API change I believe for python


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r214244918
  
--- Diff: R/pkg/R/functions.R ---
@@ -3410,13 +3410,14 @@ setMethod("collect_set",
 #' \dontrun{
 #' head(select(df, split_string(df$Sex, "a")))
 #' head(select(df, split_string(df$Class, "\\d")))
+#' head(select(df, split_string(df$Class, "\\d", 2)))
 #' # This is equivalent to the following SQL expression
 #' head(selectExpr(df, "split(Class, 'd')"))}
 #' @note split_string 2.3.0
 setMethod("split_string",
   signature(x = "Column", pattern = "character"),
-  function(x, pattern) {
-jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern)
+  function(x, pattern, limit = -1) {
+jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern, limit)
--- End diff --

you should have `as.integer(limit)` instead
could we add a test in R?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22274: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22274#discussion_r214244580
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3633,7 +3633,8 @@ test_that("catalog APIs, currentDatabase, 
setCurrentDatabase, listDatabases", {
   expect_equal(currentDatabase(), "default")
   expect_error(setCurrentDatabase("default"), NA)
   expect_error(setCurrentDatabase("zxwtyswklpf"),
-"Error in setCurrentDatabase : analysis error - Database 
'zxwtyswklpf' does not exist")
+   paste("Error in setCurrentDatabase : analysis error - 
Database",
--- End diff --

I'd use paste0 instead to make clear about the implicit space that should 
be after `Database`

ie. `paste0("Error in setCurrentDatabase : analysis error - Database ",  
"'zxwtyswklpf' does not exist"))


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22291: [SPARK-25007][R]Add array_intersect/array_except/...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22291#discussion_r214244359
  
--- Diff: R/pkg/R/generics.R ---
@@ -799,10 +807,18 @@ setGeneric("array_sort", function(x) { 
standardGeneric("array_sort") })
 #' @name NULL
 setGeneric("arrays_overlap", function(x, y) { 
standardGeneric("arrays_overlap") })
 
+#' @rdname column_collection_functions
+#' @name NULL
+setGeneric("array_union", function(x, y) { standardGeneric("array_union") 
})
+
 #' @rdname column_collection_functions
 #' @name NULL
 setGeneric("arrays_zip", function(x, ...) { standardGeneric("arrays_zip") 
})
 
+#' @rdname column_collection_functions
+#' @name NULL
+setGeneric("shuffle", function(x) { standardGeneric("shuffle") })
--- End diff --

this should go below - this part of the list should be sorted alphabetically


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-08-30 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/6#discussion_r214243115
  
--- Diff: R/pkg/R/functions.R ---
@@ -1697,8 +1697,8 @@ setMethod("to_date",
   })
 
 #' @details
-#' \code{to_json}: Converts a column containing a \code{structType}, array 
of \code{structType},
-#' a \code{mapType} or array of \code{mapType} into a Column of JSON 
string.
+#' \code{to_json}: Converts a column containing a \code{structType}, a 
\code{mapType}
+#' or an array into a Column of JSON string.
--- End diff --

it should
could we add some tests for this in R?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-08-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20146
  
seems like this was a thumbs-up from  @WeichenXu123 @jkbradley?
@dbtsai ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3158: [ZEPPELIN-3740] Adopt `google-java-format` and `fmt-ma...

2018-08-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3158
  
ok


---

[GitHub] spark issue #22192: [SPARK-24918] Executor Plugin API

2018-08-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22192
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3158: [ZEPPELIN-3740] Adopt `google-java-format` and `fmt-ma...

2018-08-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3158
  
I see. might be good to get some consensus first - we seem to be doing 
quite a bit of style changes in the last few months, it would make maintenance 
or backporting harder, for example.


---

[GitHub] zeppelin issue #3158: [ZEPPELIN-3740] Adopt `google-java-format` and `fmt-ma...

2018-08-22 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3158
  
what's wrong with `maven-checkstyle-plugin`?


---

[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-08-21 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20838
  
Or that Bryan opens a PR on your branch? that usually would be easier to 
get *this* PR through, just my 2c.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22161: [SPARK-25167][SPARKR][TEST][MINOR] Minor fixes fo...

2018-08-21 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22161#discussion_r211487544
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -3613,11 +3613,11 @@ test_that("Collect on DataFrame when NAs exists at 
the top of a timestamp column
 test_that("catalog APIs, currentDatabase, setCurrentDatabase, 
listDatabases", {
   expect_equal(currentDatabase(), "default")
   expect_error(setCurrentDatabase("default"), NA)
-  expect_error(setCurrentDatabase("foo"),
-   "Error in setCurrentDatabase : analysis error - Database 
'foo' does not exist")
+  expect_error(setCurrentDatabase("zxwtyswklpf"),
+"Error in setCurrentDatabase : analysis error - Database 
'zxwtyswklpf' does not exist")
   dbs <- collect(listDatabases())
   expect_equal(names(dbs), c("name", "description", "locationUri"))
-  expect_equal(dbs[[1]], "default")
+  expect_equal(which(dbs[, 1] == "default"), 1)
--- End diff --

I wonder if there is a better way to ensure the default database is named 
"default", perhaps? this checks for "exactly one database is named "default"" - 
I guess that's ok...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3153: [ZEPPELIN-3738] Fix enabling JMX in ZeppelinServer

2018-08-21 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3153
  
LGTM


---

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-08-17 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21584
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...

2018-08-17 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22107
  
merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support in R

2018-08-17 Thread felixcheung

Repository: spark
Updated Branches:
  refs/heads/master c1ffb3c10 -> 162326c0e


[SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support in R

## What changes were proposed in this pull request?
[SPARK-21274](https://issues.apache.org/jira/browse/SPARK-21274) added support 
for EXCEPT ALL and INTERSECT ALL. This PR adds the support in R.

## How was this patch tested?
Added test in test_sparkSQL.R

Author: Dilip Biswal 

Closes #22107 from dilipbiswal/SPARK-25117.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/162326c0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/162326c0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/162326c0

Branch: refs/heads/master
Commit: 162326c0ee8419083ebd1669796abd234773e9b6
Parents: c1ffb3c
Author: Dilip Biswal 
Authored: Fri Aug 17 00:04:04 2018 -0700
Committer: Felix Cheung 
Committed: Fri Aug 17 00:04:04 2018 -0700

--
 R/pkg/NAMESPACE   |  2 +
 R/pkg/R/DataFrame.R   | 59 +-
 R/pkg/R/generics.R|  6 +++
 R/pkg/tests/fulltests/test_sparkSQL.R | 19 ++
 4 files changed, 85 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/162326c0/R/pkg/NAMESPACE
--
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index adfd387..0fd0848 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -117,6 +117,7 @@ exportMethods("arrange",
   "dropna",
   "dtypes",
   "except",
+  "exceptAll",
   "explain",
   "fillna",
   "filter",
@@ -131,6 +132,7 @@ exportMethods("arrange",
   "hint",
   "insertInto",
   "intersect",
+  "intersectAll",
   "isLocal",
   "isStreaming",
   "join",

http://git-wip-us.apache.org/repos/asf/spark/blob/162326c0/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 471ada1..4f2d4c7 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -2848,6 +2848,35 @@ setMethod("intersect",
 dataFrame(intersected)
   })
 
+#' intersectAll
+#'
+#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
+#' and another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the intersect all 
operation.
+#' @family SparkDataFrame functions
+#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname intersectAll
+#' @name intersectAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' intersectAllDF <- intersectAll(df1, df2)
+#' }
+#' @note intersectAll since 2.4.0
+setMethod("intersectAll",
+  signature(x = "SparkDataFrame", y = "SparkDataFrame"),
+  function(x, y) {
+intersected <- callJMethod(x@sdf, "intersectAll", y@sdf)
+dataFrame(intersected)
+  })
+
 #' except
 #'
 #' Return a new SparkDataFrame containing rows in this SparkDataFrame
@@ -2867,7 +2896,6 @@ setMethod("intersect",
 #' df2 <- read.json(path2)
 #' exceptDF <- except(df, df2)
 #' }
-#' @rdname except
 #' @note except since 1.4.0
 setMethod("except",
   signature(x = "SparkDataFrame", y = "SparkDataFrame"),
@@ -2876,6 +2904,35 @@ setMethod("except",
 dataFrame(excepted)
   })
 
+#' exceptAll
+#'
+#' Return a new SparkDataFrame containing rows in this SparkDataFrame
+#' but not in another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{EXCEPT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the except all operation.
+#' @family SparkDataFrame functions
+#' @aliases exceptAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname exceptAll
+#' @name exceptAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' exceptAllDF <- exceptAll(df1, df2)
+#' }
+#' @note exceptAll since 2.4.0
+setMethod("exceptAll",
+  signature(x = "SparkDataFrame", y = "SparkDataFrame"),
+  function(x, y) {
+excepted <- callJMethod(x@sdf, "exceptAll", y@sdf)
+dataFrame(excepted)
+  })
+
 #' Save the contents of SparkDataFrame to a data source.
 #'

zeppelin git commit: [ZEPPELIN-3701].Missing first several '0' and losing digital accuracy in result table

2018-08-17 Thread felixcheung

Repository: zeppelin
Updated Branches:
  refs/heads/master 09d44d504 -> 1267e33a0


[ZEPPELIN-3701].Missing first several '0' and losing digital accuracy in result 
table

### What is this PR for?

Improvements:
-Datas like '00058806' will be displayed correctly instead of '58806'.
-Datas like '5880658806' will be displayed correctly instead of '5.880659E9'.

### What type of PR is it?
[Refactoring]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-3701

### How should this be tested?
* CI pass

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: heguozi 

Author: åæå <>

Closes #3132 from Deegue/master and squashes the following commits:

f539a9a [åæå] add '+' validation
09fc45d [åæå] hardcoding fixed
a5f9a8a [åæå] [ZEPPELIN-3701].Missing first several '0' and losing 
digital accuracy in result table


Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/1267e33a
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/1267e33a
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/1267e33a

Branch: refs/heads/master
Commit: 1267e33a0ce1bfc7b38bddaa066f89a5f98e8857
Parents: 09d44d5
Author: åæå <>
Authored: Mon Aug 13 18:52:50 2018 +0800
Committer: Felix Cheung 
Committed: Thu Aug 16 23:49:03 2018 -0700

--
 zeppelin-web/src/app/tabledata/tabledata.js | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/zeppelin/blob/1267e33a/zeppelin-web/src/app/tabledata/tabledata.js
--
diff --git a/zeppelin-web/src/app/tabledata/tabledata.js 
b/zeppelin-web/src/app/tabledata/tabledata.js
index 1f01bca..67c47be 100644
--- a/zeppelin-web/src/app/tabledata/tabledata.js
+++ b/zeppelin-web/src/app/tabledata/tabledata.js
@@ -36,6 +36,7 @@ export default class TableData extends Dataset {
 let textRows = paragraphResult.msg.split('\n');
 let comment = '';
 let commentRow = false;
+const float64MaxDigits = 16;
 
 for (let i = 0; i < textRows.length; i++) {
   let textRow = textRows[i];
@@ -60,8 +61,10 @@ export default class TableData extends Dataset {
   columnNames.push({name: col, index: j, aggr: 'sum'});
 } else {
   let valueOfCol;
-  if (!isNaN(valueOfCol = parseFloat(col)) && isFinite(col)) {
-col = valueOfCol;
+  if (!(col[0] === '0' || col[0] === '+' || col.length > 
float64MaxDigits)) {
+if (!isNaN(valueOfCol = parseFloat(col)) && isFinite(col)) {
+  col = valueOfCol;
+}
   }
   cols.push(col);
   cols2.push({key: (columnNames[i]) ? columnNames[i].name : undefined, 
value: col});

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-08-16 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21221
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-16 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r210492311
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -216,8 +217,7 @@ private[spark] class Executor(
 
   def stop(): Unit = {
 env.metricsSystem.report()
-heartbeater.shutdown()
-heartbeater.awaitTermination(10, TimeUnit.SECONDS)
+heartbeater.stop()
--- End diff --

future: `try {} catch {  case NonFatal(e)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...

2018-08-16 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21221#discussion_r210492513
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -69,6 +69,11 @@ package object config {
 .bytesConf(ByteUnit.KiB)
 .createWithDefaultString("100k")
 
+  private[spark] val EVENT_LOG_STAGE_EXECUTOR_METRICS =
+ConfigBuilder("spark.eventLog.logStageExecutorMetrics.enabled")
+  .booleanConf
+  .createWithDefault(true)
--- End diff --

should this be "false" for now until we could test this out more, just to 
be on the safe side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21835: [SPARK-24779]Add sequence / map_concat / map_from...

2018-08-16 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21835#discussion_r210489980
  
--- Diff: R/pkg/R/functions.R ---
@@ -3320,7 +3321,7 @@ setMethod("explode",
 #' @aliases sequence sequence,Column-method
 #' @note sequence since 2.4.0
 setMethod("sequence",
- signature(x = "Column", y = "Column"),
+ signature(),
--- End diff --

sorry, I didn't see the reply.
yes, we should try to make sequence callable. 

we shouldn't have to manually call it though and it is better to rely on R 
internal type/call routing. it's a bit hard to explain but check out

`attach` `setGeneric("attach")`
or
`str` `setGeneric("str")`

if you see what I mean.

also we should avoid `signature()` empty as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...

2018-08-16 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22107#discussion_r210488842
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2848,6 +2848,35 @@ setMethod("intersect",
 dataFrame(intersected)
   })
 
+#' intersectAll
+#'
+#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
+#' and another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the intersect all 
operation.
+#' @family SparkDataFrame functions
+#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname intersectAll
+#' @name intersectAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' intersectAllDF <- intersectAll(df1, df2)
+#' }
+#' @rdname intersectAll
+#' @note intersectAll since 2.4.0
+setMethod("intersectAll",
+  signature(x = "SparkDataFrame", y = "SparkDataFrame"),
+  function(x, y) {
+intersected <- callJMethod(x@sdf, "intersectAll", y@sdf)
+dataFrame(intersected)
+  })
--- End diff --

add extra empty line after code


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...

2018-08-16 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22107#discussion_r210488890
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2876,6 +2905,37 @@ setMethod("except",
 dataFrame(excepted)
   })
 
+#' exceptAll
+#'
+#' Return a new SparkDataFrame containing rows in this SparkDataFrame
+#' but not in another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{EXCEPT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the except all 
operation.
+#' @family SparkDataFrame functions
+#' @aliases exceptAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname exceptAll
+#' @name exceptAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' exceptAllDF <- exceptAll(df1, df2)
+#' }
+#' @rdname exceptAll
+#' @note exceptAll since 2.4.0
+setMethod("exceptAll",
+  signature(x = "SparkDataFrame", y = "SparkDataFrame"),
+  function(x, y) {
+excepted <- callJMethod(x@sdf, "exceptAll", y@sdf)
+dataFrame(excepted)
+  })
+
--- End diff --

nit: remove one of the two empty lines


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...

2018-08-16 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22107#discussion_r210488754
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2848,6 +2848,35 @@ setMethod("intersect",
 dataFrame(intersected)
   })
 
+#' intersectAll
+#'
+#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
+#' and another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the intersect all 
operation.
+#' @family SparkDataFrame functions
+#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname intersectAll
+#' @name intersectAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' intersectAllDF <- intersectAll(df1, df2)
+#' }
+#' @rdname intersectAll
--- End diff --

ditto here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...

2018-08-16 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22107#discussion_r210488641
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2876,6 +2905,37 @@ setMethod("except",
 dataFrame(excepted)
   })
 
+#' exceptAll
+#'
+#' Return a new SparkDataFrame containing rows in this SparkDataFrame
+#' but not in another SparkDataFrame while preserving the duplicates.
+#' This is equivalent to \code{EXCEPT ALL} in SQL. Also as standard in
+#' SQL, this function resolves columns by position (not by name).
+#'
+#' @param x a SparkDataFrame.
+#' @param y a SparkDataFrame.
+#' @return A SparkDataFrame containing the result of the except all 
operation.
+#' @family SparkDataFrame functions
+#' @aliases exceptAll,SparkDataFrame,SparkDataFrame-method
+#' @rdname exceptAll
+#' @name exceptAll
+#' @examples
+#'\dontrun{
+#' sparkR.session()
+#' df1 <- read.json(path)
+#' df2 <- read.json(path2)
+#' exceptAllDF <- exceptAll(df1, df2)
+#' }
+#' @rdname exceptAll
--- End diff --

this is a bug in `except` there should only be one `@rdname` for each


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3139: [ZEPPELIN-3712] Add `maxConnLifetime` parameter to JDB...

2018-08-16 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3139
  
LGTM


---

[GitHub] spark issue #22095: [SPARK-23984][K8S] Changed Python Version config to be c...

2018-08-16 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22095
  
@mccheah btw, please add a comment (say "merged to master") after you merge 
a PR - just a convention in this project. FYI. thx.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22095: [SPARK-23984][K8S] Changed Python Version config to be c...

2018-08-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22095
  
@mccheah @foxish 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22071
  
in this case maybe ok. perhaps just rel note this iff there's another 2.2.x 
or 2.1.x releases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3087: [ZEPPELIN-3644]: Adding SPARQL query language support ...

2018-08-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3087
  
this is just for syntax highlighting, there is no interpreter code here.
also even for syntax the ACE editor should be set with the language of 
choice - this PR does not have either of those.


---

[GitHub] zeppelin issue #3132: [ZEPPELIN-3701].Missing first several '0' and losing d...

2018-08-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3132
  
merging if no more comment


---

[GitHub] zeppelin issue #3136: ZEPPELIN-3699. Remove the logic of converting single r...

2018-08-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3136
  
Paragraph or REST API.
though looks like it will break all existing notebook saved since it 
changes the persistent json. is there a way to make them compatible?


---

[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...

2018-08-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22109
  
@vanzin @squito 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22084: [SPARK-25026][BUILD] Binary releases should conta...

2018-08-13 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22084#discussion_r209507960
  
--- Diff: dev/make-distribution.sh ---
@@ -188,6 +190,23 @@ if [ -f 
"$SPARK_HOME"/common/network-yarn/target/scala*/spark-*-yarn-shuffle.jar
   cp 
"$SPARK_HOME"/common/network-yarn/target/scala*/spark-*-yarn-shuffle.jar 
"$DISTDIR/yarn"
 fi
 
+# Only copy external jars if built
+if [ -f 
"$SPARK_HOME"/external/avro/target/spark-avro_${SCALA_VERSION}-${VERSION}.jar 
]; then
+  cp 
"$SPARK_HOME"/external/avro/target/spark-avro_${SCALA_VERSION}-${VERSION}.jar 
"$DISTDIR/external/jars/"
+fi
+if [ -f 
"$SPARK_HOME"/external/kafka-0-10/target/spark-streaming-kafka-0-10_${SCALA_VERSION}-${VERSION}.jar
 ]; then
+  cp 
"$SPARK_HOME"/external/kafka-0-10/target/spark-streaming-kafka-0-10_${SCALA_VERSION}-${VERSION}.jar
 "$DISTDIR/external/jars/"
--- End diff --

agree not kinesis or ganglia


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22081: [SPARK-23654][BUILD] remove jets3t as a dependenc...

2018-08-11 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22081#discussion_r209443568
  
--- Diff: pom.xml ---
@@ -984,24 +987,15 @@
   
 
   
-  
+  
   
-net.java.dev.jets3t
-jets3t
-${jets3t.version}
+javax.activation
+activation
+1.1.1
--- End diff --

this changes from `jets3t.version>0.9.4`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3118: [zeppelin-3693] Option to toggle chart settings of par...

2018-08-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3118
  
I'd agree, this seems like the intent of the report mode. maybe you can add 
a option to report mode instead to keep the frame for the chart?


---

[GitHub] spark issue #21027: [SPARK-23943][MESOS][DEPLOY] Improve observability of Me...

2018-08-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21027
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...

2018-08-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22071
  
@tnachen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22072
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-11 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22072
  
```
* checking CRAN incoming feasibility ...Error in 
.check_package_CRAN_incoming(pkgdir) : 
  dims [product 26] do not match the length of object [0]
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3107: [ZEPPELIN-3646] Add note for updating user permissions

2018-08-07 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/zeppelin/pull/3107
  
I think there is significant risk that some users are just running all 
âsampleâ notebook to check them out not fully aware that some might be 
modifying system state.

Agreed to suggestions above.



---

[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-07 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21927#discussion_r208123913
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1946,4 +1990,11 @@ private[spark] object DAGScheduler {
 
   // Number of consecutive stage attempts allowed before a stage is aborted
   val DEFAULT_MAX_CONSECUTIVE_STAGE_ATTEMPTS = 4
+
+  // Error message when running a barrier stage that have unsupported RDD 
chain pattern.
+  val ERROR_MESSAGE_RUN_BARRIER_WITH_UNSUPPORTED_RDD_CHAIN_PATTERN =
+"[SPARK-24820][SPARK-24821]: Barrier execution mode does not allow the 
following pattern of " +
+  "RDD chain within a barrier stage:\n1. Ancestor RDDs that have 
different number of " +
+  "partitions from the resulting RDD (eg. 
union()/coalesce()/first()/PartitionPruningRDD);\n" +
--- End diff --

collect() is expensive though?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 6371 matches

Mail list logo