from:"HyukjinKwon"

[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

2018-11-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22979
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-09 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r232208151
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

BTW, `lit` usage already works in many APIs although it looks a bit odd .. 
should be okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-09 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r232207412
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

It's just left as column for now .. in case we allow other more cases .. 
Yea, it's a bit odd that we should `schema_of_csv(lit("Amsterdam,2018")))`. 
Maybe later columns or other expressions might have to be supported - in that 
case we can just fix the documentation in R side ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

2018-11-09 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r232202773
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -189,19 +238,67 @@ createDataFrame <- function(data, schema = NULL, 
samplingRatio = 1.0,
   x
 }
   }
+  data[] <- lapply(data, cleanCols)
 
-  # drop factors and wrap lists
-  data <- setNames(lapply(data, cleanCols), NULL)
+  args <- list(FUN = list, SIMPLIFY = FALSE, USE.NAMES = FALSE)
+  if (arrowEnabled) {
+shouldUseArrow <- tryCatch({
+  stopifnot(length(data) > 0)
+  dataHead <- head(data, 1)
+  # Currenty Arrow optimization does not support POSIXct and raw 
for now.
+  # Also, it does not support explicit float type set by users. It 
leads to
+  # incorrect conversion. We will fall back to the path without 
Arrow optimization.
+  if (any(sapply(dataHead, function(x) is(x, "POSIXct" {
+stop("Arrow optimization with R DataFrame does not support 
POSIXct type yet.")
+  }
+  if (any(sapply(dataHead, is.raw))) {
+stop("Arrow optimization with R DataFrame does not support raw 
type yet.")
+  }
+  if (inherits(schema, "structType")) {
+if (any(sapply(schema$fields(), function(x) 
x$dataType.toString() == "FloatType"))) {
+  stop("Arrow optimization with R DataFrame does not support 
FloatType type yet.")
+}
+  }
+  firstRow <- do.call(mapply, append(args, dataHead))[[1]]
+  fileName <- writeToTempFileInArrow(data, numPartitions)
+  tryCatch(
+jrddInArrow <- 
callJStatic("org.apache.spark.sql.api.r.SQLUtils",
+   "readArrowStreamFromFile",
+   sparkSession,
+   fileName),
+  finally = {
+file.remove(fileName)
+  })
+  TRUE
+},
+error = function(e) {
+  message(paste0("WARN: createDataFrame attempted Arrow 
optimization because ",
--- End diff --

ooops


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22880
  
Let me take a look on this weekends.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22985: [SPARK-25510][SQL][TEST][FOLLOW-UP] Remove BenchmarkWith...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22985
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22939: [SPARK-25446][R] Add schema_of_json() and schema_of_csv(...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22939
  
Hey @felixcheung, it should be ready for another look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22979
  
Also let me leave a cc for @srowen.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22951
  
Actually let me leave a cc for @srowen. I remember we talked about it 
before.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22951
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r232115966
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,55 @@ getDefaultSqlSource <- function() {
   l[["spark.sql.sources.default"]]
 }
 
+writeToTempFileInArrow <- function(rdf, numPartitions) {
+  # R API in Arrow is not yet released. CRAN requires to add the package 
in requireNamespace
+  # at DESCRIPTION. Later, CRAN checks if the package is available or not. 
Therefore, it works
+  # around by avoiding direct requireNamespace.
+  requireNamespace1 <- requireNamespace
+  if (requireNamespace1("arrow", quietly = TRUE)) {
+record_batch <- get("record_batch", envir = asNamespace("arrow"), 
inherits = FALSE)
+record_batch_stream_writer <- get(
+  "record_batch_stream_writer", envir = asNamespace("arrow"), inherits 
= FALSE)
+file_output_stream <- get(
+  "file_output_stream", envir = asNamespace("arrow"), inherits = FALSE)
+write_record_batch <- get(
+  "write_record_batch", envir = asNamespace("arrow"), inherits = FALSE)
+
+# Currently arrow requires withr; otherwise, write APIs don't work.
+# Direct 'require' is not recommended by CRAN. Here's a workaround.
+require1 <- require
+if (require1("withr", quietly = TRUE)) {
--- End diff --

It's actually a bit odd that I need to manually require this package. 
Otherwise, it complains, for instance, here 
https://github.com/apache/arrow/blob/d3ec69069649013229366ebe01e22f389597dc19/r/R/on_exit.R#L20


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21363
  
Looks difficult because the behaviours themselves are different. One 
possibility is a fallback and the other possibility is configuration.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3206: [ZEPPELIN-3810] Support Spark 2.4

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
Hey all ~ could this get in by any change maybe?


---

[GitHub] spark issue #22954: [SPARK-25981][R] Enables Arrow optimization from R DataF...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
I have finished most of todos except waiting for R API of Arrow 0.12.0 and 
fixing some changes accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [WIP] Enables Arrow optimization from R DataFrame...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r231992763
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,55 @@ getDefaultSqlSource <- function() {
   l[["spark.sql.sources.default"]]
 }
 
+writeToTempFileInArrow <- function(rdf, numPartitions) {
+  # R API in Arrow is not yet released. CRAN requires to add the package 
in requireNamespace
+  # at DESCRIPTION. Later, CRAN checks if the package is available or not. 
Therefore, it works
+  # around by avoiding direct requireNamespace.
+  requireNamespace1 <- requireNamespace
+  if (requireNamespace1("arrow", quietly = TRUE)) {
+record_batch <- get("record_batch", envir = asNamespace("arrow"), 
inherits = FALSE)
+record_batch_stream_writer <- get(
+  "record_batch_stream_writer", envir = asNamespace("arrow"), inherits 
= FALSE)
+file_output_stream <- get(
+  "file_output_stream", envir = asNamespace("arrow"), inherits = FALSE)
+write_record_batch <- get(
+  "write_record_batch", envir = asNamespace("arrow"), inherits = FALSE)
--- End diff --

These workarounds will be removed when Arrow 0.12.0 is released. I did it 
to make CARN passed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22979: [SPARK-25977][SQL] Parsing decimals from CSV using local...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22979
  
Looks good. Will take a closer look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22951
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22977: [BUILD] Bump previousSparkVersion in MimaBuild.scala to ...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22977
  
Actually, similar changes were being made at 
https://github.com/apache/spark/pull/22967 FYI. cc @dbtsai 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22973
  
Yea .. actually that's documented in 
https://spark.apache.org/contributing.html . Strictly it should be` PYTHON`

> The PR title should be of the form [SPARK-][COMPONENT] Title, where 
SPARK- is the relevant JIRA number, COMPONENT is one of the PR categories 
shown at spark-prs.appspot.com and Title may be the JIRAâs title or a more 
specific title describing the PR itself.

![screen shot 2018-11-08 at 7 19 07 
pm](https://user-images.githubusercontent.com/6477701/48195647-2e353b80-e38b-11e8-8ac0-73a458ee00c0.png)

I tried to be clear for myself so I took a look before but .. not a big 
deal at all :D.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization from R Da...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
adding @falaki and @mengxr as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r231847272
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -215,14 +278,16 @@ createDataFrame <- function(data, schema = NULL, 
samplingRatio = 1.0,
   }
 
   if (is.null(schema) || (!inherits(schema, "structType") && 
is.null(names(schema {
-row <- firstRDD(rdd)
+if (is.null(firstRow)) {
+  firstRow <- firstRDD(rdd)
--- End diff --

Note that this PR optimizes the original code path as well here - when the 
input is local R DataFrame, here we avoid `firstRDD` operation.

In the master branch, the benchmark shows:

```
Exception in thread "dispatcher-event-loop-6" java.lang.OutOfMemoryError: 
Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
```

So, technically this PR improves

If I try this with `10.csv` (79MB) record, it takes longer to cut it 
short:

**Current master**:

```
Time difference of 8.502607 secs
```

**With this PR, but without Arrow**

```
Time difference of 5.143395 secs
```

**With this PR, but with Arrow**


```
Time difference of 0.6981369 secs
```

So, technically this PR improves more **1200%**


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization from R Da...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
@felixcheung! performance improvement was **955%** ! I described the 
benchmark I took in PR description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r231815154
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,30 @@ getDefaultSqlSource <- function() {
   l[["spark.sql.sources.default"]]
 }
 
+writeToTempFileInArrow <- function(rdf, numPartitions) {
+  stopifnot(require("arrow", quietly = TRUE))
--- End diff --

hmmm .. yea I will add the version check later when R API of Arrow is 
officially released to CRAN.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r231814143
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,30 @@ getDefaultSqlSource <- function() {
   l[["spark.sql.sources.default"]]
 }
 
+writeToTempFileInArrow <- function(rdf, numPartitions) {
+  stopifnot(require("arrow", quietly = TRUE))
--- End diff --

Hmhmhmhm for checking arrow version itself, it will be but I think it's 
fine. R API for arrow will likely be from 0.12.0 and we will support 0.12.0+ 
from now on.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22973
  
Oh, let's name it `[PYTHON]` BTW.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-11-08 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20944
  
Let's revert this then if this only targeted to fix the test. We can bring 
this back later when it's needed - tho, yea . This caused a specific case 
failure in Livy' when restarting Hive enabled Spark session.

@gatorsmile, I will revert this but I don't mind getting this back again if 
it actually fixes a usecase since I either don't know how exactly this fixes 
the case above and how it causes to make the Livy case failed - one thing clear 
is this specific commit is the cause.

Please revert me action if I missed an actual usecase fixed by this, I am 
okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22967#discussion_r231783302
  
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` 
commands, or if initiali
 
 
 {% highlight r %}
-sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
+sparkR.session()
--- End diff --

Let me try to take a look as well this weekends.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22967#discussion_r231783339
  
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` 
commands, or if initiali
 
 
 {% highlight r %}
-sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
+sparkR.session()
--- End diff --

adding @JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22967#discussion_r231783212
  
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` 
commands, or if initiali
 
 
 {% highlight r %}
-sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
+sparkR.session()
--- End diff --

I am not an expert but just know a bit. The mima change look right from a 
cursory look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22967#discussion_r231781880
  
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` 
commands, or if initiali
 
 
 {% highlight r %}
-sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
+sparkR.session()
--- End diff --

I mean it's related with using external package, it looks so but Avro is 
kind of internal source now .. so it's out of date.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22967#discussion_r231781676
  
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` 
commands, or if initiali
 
 
 {% highlight r %}
-sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
+sparkR.session()
--- End diff --

oh, but the problem is other packages probably wouldn't have _2.12 
distribution. hm, I think this can be left as was for now.

At least I am going to release spark-xml before Spark 3.0.0 anyway. I can 
try to include 2.12 distribution as well and fix it here later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22967: [SPARK-25956] Make Scala 2.12 as default Scala ve...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22967#discussion_r231780839
  
--- Diff: docs/sparkr.md ---
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` 
commands, or if initiali
 
 
 {% highlight r %}
-sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
+sparkR.session()
--- End diff --

Eh, @dbtsai, I think you can just switch this to other datasources like 
`spark-redshift` or `spark-xml`, and fix the description above `you can find 
data source connectors for popular file formats like Avro`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22960: [SPARK-25955][TEST] Porting JSON tests for CSV functions

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22960
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22958: [SPARK-25952][SQL] Passing actual schema to JacksonParse...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22958
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22958: [SPARK-25952][SQL] Passing actual schema to JacksonParse...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22958
  
@MaxGekk, BTW, can you call `verifyColumnNameOfCorruptRecord` here and 
datasource as well for JSON and CSV? Of course in a separate PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22958: [SPARK-25952][SQL] Passing actual schema to JacksonParse...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22958
  
For CSV, looks we are already doing so:


https://github.com/apache/spark/blob/76813cfa1e2607ea3b669a79e59b568e96395b2e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala#L109-L111


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r231777190
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out
 ---
@@ -93,7 +93,7 @@ Partition Values  [ds=2017-08-01, hr=10]
 Location [not included in 
comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10   
 
 Created Time [not included in comparison]
 Last Access [not included in comparison]
-Partition Statistics   1121 bytes, 3 rows  
+Partition Statistics   1229 bytes, 3 rows  
--- End diff --

Hmmm .. yea, I think we should avoid ..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22951#discussion_r231776739
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -349,7 +353,7 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
 negativeInf=None, dateFormat=None, timestampFormat=None, 
maxColumns=None,
 maxCharsPerColumn=None, maxMalformedLogPerPartition=None, 
mode=None,
 columnNameOfCorruptRecord=None, multiLine=None, 
charToEscapeQuoteEscaping=None,
-samplingRatio=None, enforceSchema=None, emptyValue=None):
+samplingRatio=None, enforceSchema=None, emptyValue=None, 
locale=None):
--- End diff --

Let's add `emptyValue` in `streaming.py` in the same separate PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22951#discussion_r231776568
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -267,7 +270,8 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 mode=mode, 
columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat,
 timestampFormat=timestampFormat, multiLine=multiLine,
 allowUnquotedControlChars=allowUnquotedControlChars, 
lineSep=lineSep,
-samplingRatio=samplingRatio, 
dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding)
+samplingRatio=samplingRatio, 
dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding,
--- End diff --

@MaxGekk, let's also add `dropFieldIfAllNull` and `encoding` in 
`sql/streaming.py` in a separate PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22951#discussion_r231776396
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -267,7 +270,8 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 mode=mode, 
columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat,
 timestampFormat=timestampFormat, multiLine=multiLine,
 allowUnquotedControlChars=allowUnquotedControlChars, 
lineSep=lineSep,
-samplingRatio=samplingRatio, 
dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding)
+samplingRatio=samplingRatio, 
dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding,
+locale=locale)
--- End diff --

@MaxGekk, looks `sql/streaming.py` is missed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22951#discussion_r231775987
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -446,6 +450,9 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
   If None is set, it uses the default value, 
``1.0``.
 :param emptyValue: sets the string representation of an empty 
value. If None is set, it uses
the default value, empty string.
+:param locale: sets a locale as language tag in IETF BCP 47 
format. If None is set,
+   it uses the default value, ``en-US``. For instance, 
``locale`` is used while
+   parsing dates and timestamps.
--- End diff --

I think ideally we should apply to decimal parsing too actually. But yea we 
can leave it separate.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22951
  
OMG, what does `Ð½Ð¾Ñ 2018` mean BTW? haha


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22932: [SPARK-25102][SQL] Write Spark version to ORC/Par...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22932#discussion_r231775020
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out
 ---
@@ -93,7 +93,7 @@ Partition Values  [ds=2017-08-01, hr=10]
 Location [not included in 
comparison]sql/core/spark-warehouse/t/ds=2017-08-01/hr=10   
 
 Created Time [not included in comparison]
 Last Access [not included in comparison]
-Partition Statistics   1121 bytes, 3 rows  
+Partition Statistics   1229 bytes, 3 rows  
--- End diff --

Hm, does it mean that basically the tests will be failed or fixed for 
official releases (since it doesn't have `-SNAPSHOT`)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21679: [SPARK-24695] [SQL]: To add support to return Calendar i...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21679
  
I think we should close this for now then.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22948: [SPARK-25944][R][BUILD] AppVeyor change to latest...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22948#discussion_r231762531
  
--- Diff: dev/appveyor-install-dependencies.ps1 ---
@@ -115,7 +115,7 @@ $env:Path += ";$env:HADOOP_HOME\bin"
 Pop-Location
 
 # == R
-$rVer = "3.4.1"
+$rVer = "3.5.1"
 $rToolsVer = "3.4.0"
--- End diff --

Oh~ latest looks 3.5. Let me try to update aswell shortly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22963: [SPARK-25962][BUILD][PYTHON] Specify minimum versions fo...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22963
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22963: [SPARK-25962][BUILD][PYTHON] Specify minimum versions fo...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22963
  
Thanks, @srowen and @dongjoon-hyun.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20944
  
To me, it sounds we made a fix because it was difficult to figure out 
exactly what's going on internally. It's okay if it's difficult to reproduce 
but it can be reproduce in production; however, the problem is that the fix of 
this caused another problem.

I am asking those to see why and how important this fix was.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20944
  
I understood the reproducer step in JIRA but how and why it matters? Did it 
cause an actual problem in your production environment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] incubator-livy issue #121: [LIVY-518][BUILD] Support Spark 2.4

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/incubator-livy/pull/121
  
Thanks guys!


---

[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20944
  
ping @wangyum, I'm going to revert this today if there's no response today.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22938
  
@attilapiros, mind showing rough small test codes for it please? just want 
to see if this is something we should fix or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22938: [SPARK-25935][SQL] Prevent null rows from JSON parser

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22938
  
Yea, looks fine in general. Will take a look within this week or weekends.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231476884
  
--- Diff: R/pkg/R/functions.R ---
@@ -205,11 +205,18 @@ NULL
 #'  also supported for the schema.
 #'  \item \code{from_csv}: a DDL-formatted string
 #'  }
-#' @param ... additional argument(s). In \code{to_json}, \code{to_csv} and 
\code{from_json},
-#'this contains additional named properties to control how it 
is converted, accepts
-#'the same options as the JSON/CSV data source. Additionally 
\code{to_json} supports
-#'the "pretty" option which enables pretty JSON generation. In 
\code{arrays_zip},
-#'this contains additional Columns of arrays to be merged.
+#' @param ... additional argument(s).
+#'  \itemize{
+#'  \item \code{to_json}, \code{from_json} and 
\code{schema_of_json}: this contains
+#'  additional named properties to control how it is converted 
and accepts the
+#'  same options as the JSON data source.
+#'  \item \code{to_json}: it supports the "pretty" option which 
enables pretty
--- End diff --

OK. I added an example 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22963: [SPARK-25962][BUILD][PYTHON] Specify minimum versions fo...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22963
  
cc @srowen, @holdenk and @rekhajoshm


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22963: [SPARK-25962][BUILD][PYTHON] Specify minimum vers...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22963#discussion_r231463118
  
--- Diff: dev/lint-python ---
@@ -87,27 +91,46 @@ else
 rm "$PYCODESTYLE_REPORT_PATH"
 fi
 
-# stop the build if there are Python syntax errors or undefined names
-flake8 . --count --select=E901,E999,F821,F822,F823 --max-line-length=100 
--show-source --statistics
--- End diff --

BTW, it was never a good idea to use external executable program directly 
without any check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22963: [SPARK-25962][BUILD][PYTHON] Specify minimum vers...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22963#discussion_r231462182
  
--- Diff: dev/lint-python ---
@@ -26,9 +26,13 @@ 
PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt"
 PYDOCSTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pydocstyle-report.txt"
 PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt"
 PYLINT_INSTALL_INFO="$SPARK_ROOT_DIR/dev/pylint-info.txt"
+
 PYDOCSTYLEBUILD="pydocstyle"
-EXPECTED_PYDOCSTYLEVERSION="3.0.0"
-PYDOCSTYLEVERSION=$(python -c 'import pkg_resources; 
print(pkg_resources.get_distribution("pydocstyle").version)' 2> /dev/null)
+MINIMUM_PYDOCSTYLEVERSION="3.0.0"
+
+FLAKE8BUILD="flake8"
+MINIMUM_FLAKE8="3.4.0"
+
 SPHINXBUILD=${SPHINXBUILD:=sphinx-build}
--- End diff --

BTW, i tested flake8 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0, 3.5.0 and 3.6.0 
while I am here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22963: [SPARK-25962][BUILD][PYTHON] Specify minimum vers...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22963#discussion_r231461879
  
--- Diff: dev/lint-python ---
@@ -26,9 +26,13 @@ 
PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt"
 PYDOCSTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pydocstyle-report.txt"
 PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt"
 PYLINT_INSTALL_INFO="$SPARK_ROOT_DIR/dev/pylint-info.txt"
+
 PYDOCSTYLEBUILD="pydocstyle"
-EXPECTED_PYDOCSTYLEVERSION="3.0.0"
-PYDOCSTYLEVERSION=$(python -c 'import pkg_resources; 
print(pkg_resources.get_distribution("pydocstyle").version)' 2> /dev/null)
+MINIMUM_PYDOCSTYLEVERSION="3.0.0"
+
+FLAKE8BUILD="flake8"
+MINIMUM_FLAKE8="3.4.0"
+
 SPHINXBUILD=${SPHINXBUILD:=sphinx-build}
--- End diff --

@shaneknapp and @cloud-fan, it seems flake8 does work but indeed the 
problem was compatibility with pycodestyle within flake8 itself.

Our script uses pycodestyle 2.4.0 but it's manually downloaded. So, 
Jenkins's pycodestyle might be different or old and this might be the actual 
root cause.

I locally tested with latest pycodestyle and flake 3.6.0 and it worked.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22963: [SPARK-25962][BUILD][PYTHON] Specify minimum vers...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22963#discussion_r231460594
  
--- Diff: dev/lint-python ---
@@ -26,9 +26,13 @@ 
PYCODESTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pycodestyle-report.txt"
 PYDOCSTYLE_REPORT_PATH="$SPARK_ROOT_DIR/dev/pydocstyle-report.txt"
 PYLINT_REPORT_PATH="$SPARK_ROOT_DIR/dev/pylint-report.txt"
 PYLINT_INSTALL_INFO="$SPARK_ROOT_DIR/dev/pylint-info.txt"
+
 PYDOCSTYLEBUILD="pydocstyle"
-EXPECTED_PYDOCSTYLEVERSION="3.0.0"
-PYDOCSTYLEVERSION=$(python -c 'import pkg_resources; 
print(pkg_resources.get_distribution("pydocstyle").version)' 2> /dev/null)
+MINIMUM_PYDOCSTYLEVERSION="3.0.0"
+
+FLAKE8BUILD="flake8"
+MINIMUM_FLAKE8="3.5.0"
--- End diff --

@shaneknapp and @cloud-fan, it seems flake8 does work but indeed the 
problem was compatibility with pycodestyle within flake8 itself.

Our script uses pycodestyle 2.4.0 but it's manually downloaded. So, 
Jenkins's pycodestyle might be different or old and this might be the actual 
root cause.

I locally tested with latest pycodestyle and flake 3.6.0 and it worked.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22963: [SPARK-25962][BUILD][PYTHON] Specify minimum vers...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22963#discussion_r231460041
  
--- Diff: dev/lint-python ---
@@ -87,27 +91,46 @@ else
 rm "$PYCODESTYLE_REPORT_PATH"
 fi
 
-# stop the build if there are Python syntax errors or undefined names
-flake8 . --count --select=E901,E999,F821,F822,F823 --max-line-length=100 
--show-source --statistics
-flake8_status="${PIPESTATUS[0]}"
+# Check by flake8
+if hash "$FLAKE8BUILD" 2> /dev/null; then
--- End diff --

I manually tested each branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22963: [SPARK-25962][BUILD][PYTHON]Specify minimum versi...

2018-11-07 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22963

[SPARK-25962][BUILD][PYTHON]Specify minimum versions for both pydocstyle 
and flake8 in 'lint-python' script

## What changes were proposed in this pull request?

This PR explicitly specifies `flake8` and `pydocstyle` versions.

## How was this patch tested?

Manually tested.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-25962

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22963.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22963


commit e59724316ef5a37b0c9b8cb3311c4867d0e5e7af
Author: hyukjinkwon 
Date:   2018-11-07T10:47:22Z

Specify minimum versions for both pydocstyle and flake8 in 'lint-python' 
script




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization from R Da...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
For encryption stuff, I will try to handle that as well (maybe as a 
followup(?)) so that we support it even when that's enabled.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization from R Da...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
Thanks, @felixcheung. I will address those comments during cleaning up.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22954#discussion_r231414561
  
--- Diff: R/pkg/R/SQLContext.R ---
@@ -147,6 +147,30 @@ getDefaultSqlSource <- function() {
   l[["spark.sql.sources.default"]]
 }
 
+writeToTempFileInArrow <- function(rdf, numPartitions) {
+  stopifnot(require("arrow", quietly = TRUE))
+  stopifnot(require("withr", quietly = TRUE))
--- End diff --

It's actually dependent on Arrow's API calls .. Those APIs were only added 
in few weeks ago. I will make it clean when R API of Arrow is close to release 
(at 0.12.0)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22960#discussion_r231413853
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
 checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") 
:: Nil)
   }
+
+  test("from_csv uses DDL strings for defining a schema - java") {
+val df = Seq("""1,"haa"""").toDS()
+checkAnswer(
+  df.select(
+from_csv($"value", lit("a INT, b STRING"), new 
java.util.HashMap[String, String]())),
+  Row(Row(1, "haa")) :: Nil)
+  }
+
+  test("roundtrip to_csv -> from_csv") {
+val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct")
+val schema = df.schema(0).dataType.asInstanceOf[StructType]
+val options = Map.empty[String, String]
+val readback = df.select(to_csv($"struct").as("csv"))
+  .select(from_csv($"csv", schema, options).as("struct"))
+
+checkAnswer(df, readback)
+  }
+
+  test("roundtrip from_csv -> to_csv") {
+val df = Seq(Some("1"), None).toDF("csv")
+val schema = new StructType().add("a", IntegerType)
+val options = Map.empty[String, String]
+val readback = df.select(from_csv($"csv", schema, 
options).as("struct"))
+  .select(to_csv($"struct").as("csv"))
+
+checkAnswer(df, readback)
+  }
+
+  test("infers schemas of a CSV string and pass to to from_csv") {
+val in = Seq("""0.123456789,987654321,"San Francisco"""").toDS()
+val options = Map.empty[String, String].asJava
+val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), 
options) as "parsed")
+val expected = StructType(Seq(StructField(
+  "parsed",
+  StructType(Seq(
+StructField("_c0", DoubleType, true),
+StructField("_c1", IntegerType, true),
+StructField("_c2", StringType, true))
+
+assert(out.schema == expected)
+  }
+
+  test("Support to_csv in SQL") {
--- End diff --

I think we can just get rid of it. I can't imagine both functions are 
specifically broken alone in `selectExpr`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

2018-11-07 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22951
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3206: [ZEPPELIN-3810] Support Spark 2.4

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
Thank you guys!


---

[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20944
  
Please describe manual tests and how it relates to actual usecase.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231404180
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

Yup.. only literal works but columns don't work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/incubator-livy/pull/121
  
@vanzin, looks 
https://github.com/apache/spark/commit/a75571b46f813005a6d4b076ec39081ffab11844#diff-f697551d2f00bfb336406b6fe6b516fe
 causes the test failure. At the very least, I can see the unittests pass 
without that specific commit. I suspect somehow different derby is picked up 
and it behaves differently, which causes a test failure related - which is 
pretty a Spark issue or misuse of restarting Hive session.

The test case fixes are actually more correct because it enables what it 
needs.


---

[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20944
  
Sorry, why was this change required? I don't see 
https://github.com/apache/spark/pull/20944#issuecomment-379525776 is addressed 
Can you elaborate please? Why do we make `org.apache.derby` as shared?

Ideally, minor or maintenance versions of `derby` can be dumped up, and 
they shouldn't be shared unless there's a strong reason to keep it shared, for 
instance, making class resolution failed. How did you reproduce this and why 
the unit test is not added?

I found an actual issue while working on Apache Livy Spark 2.4 support. I 
am still investigating how it relates with the test failures but at the very 
list I see this specific commit matters since Apache Livy unittests pass 
without this specific commit.

Adding @vanzin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15899
  
+1 for the decision and closing it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/incubator-livy/pull/121
  
@vanzin, weird.

```
$ ./bin/spark-shell
scala> sql("CREATE TABLE tblA(a int)")
scala> spark.stop()
```

```
$ rm -fr metastore_db
$ rm -fr spark-warehouse
$ rm -fr derby.log
```

```
scala> import org.apache.spark.sql.SparkSession
scala> val spark = SparkSession.builder().enableHiveSupport().getOrCreate()
scala> sql("CREATE TABLE tblA(a int)")
```

These steps reproduce the same error all in Spark 2.3.0, 2.3.1 and 2.4.0 - 
I remember we never properly supported to stop and start Hive enabled support 
Spark session in this way. SparkR tests were failed due to the same reason. Hmm 
.. I am taking a look.


---

[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22951
  
Looks good. I or someone else should take a closer look before getting this 
in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22956: [SPARK-25950][SQL] from_csv should respect to spark.sql....

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22956
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22956: [SPARK-25950][SQL] from_csv should respect to spa...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22956#discussion_r231370599
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
 ---
@@ -92,8 +93,14 @@ case class CsvToStructs(
 }
   }
 
+  val nameOfCorruptRecord = 
SQLConf.get.getConf(SQLConf.COLUMN_NAME_OF_CORRUPT_RECORD)
--- End diff --

Yea, I think so.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22590: [SPARK-25574][SQL]Add an option `keepQuotes` for parsing...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22590
  
I wonder how important it is. I know `spark-csv` at Databricks supported 
different quote modes and that's gone when we ported that into Spark - the root 
cause was due to replacing the library from apache-common into univocity.

After few years, I only saw one request about reviving the quote mode 
proposed here - so I suspect how important it is. 

Basically, @MaxGekk described my stand correctly. Can we investigate a way 
to set the arbitrary parse settings options?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22590: [SPARK-25574][SQL]Add an option `keepQuotes` for parsing...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22590
  
They should be documented in API doc like `DataFrameReader.scala`. For 
site, we should avoid doc duplication - It's a general issue to document 
options.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization from R Da...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
So far, the regressions tests are passed and newly added test for R 
optimization is verified locally. Let me fix CRAN test and some nits.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22956: [SPARK-25950][SQL] from_csv should respect to spark.sql....

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22956
  
Looks good. I or someone else should take a closer look before getting this 
in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22960#discussion_r231344120
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
 checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") 
:: Nil)
   }
+
+  test("from_csv uses DDL strings for defining a schema - java") {
+val df = Seq("""1,"haa"""").toDS()
+checkAnswer(
+  df.select(
+from_csv($"value", lit("a INT, b STRING"), new 
java.util.HashMap[String, String]())),
+  Row(Row(1, "haa")) :: Nil)
+  }
+
+  test("roundtrip to_csv -> from_csv") {
+val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct")
+val schema = df.schema(0).dataType.asInstanceOf[StructType]
+val options = Map.empty[String, String]
+val readback = df.select(to_csv($"struct").as("csv"))
+  .select(from_csv($"csv", schema, options).as("struct"))
+
+checkAnswer(df, readback)
+  }
+
+  test("roundtrip from_csv -> to_csv") {
+val df = Seq(Some("1"), None).toDF("csv")
+val schema = new StructType().add("a", IntegerType)
+val options = Map.empty[String, String]
+val readback = df.select(from_csv($"csv", schema, 
options).as("struct"))
+  .select(to_csv($"struct").as("csv"))
+
+checkAnswer(df, readback)
+  }
+
+  test("infers schemas of a CSV string and pass to to from_csv") {
+val in = Seq("""0.123456789,987654321,"San Francisco"""").toDS()
+val options = Map.empty[String, String].asJava
+val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), 
options) as "parsed")
+val expected = StructType(Seq(StructField(
+  "parsed",
+  StructType(Seq(
+StructField("_c0", DoubleType, true),
+StructField("_c1", IntegerType, true),
+StructField("_c2", StringType, true))
+
+assert(out.schema == expected)
+  }
+
+  test("Support to_csv in SQL") {
--- End diff --

@MaxGekk, wouldn't the tests in `csv-functions.sql` be enough for SQL 
support test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] incubator-livy pull request #121: [LIVY-518][BUILD] Support Spark 2.4

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/incubator-livy/pull/121#discussion_r23134
  
--- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
@@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
 
   @Test
   public void testSparkSQLJob() throws Exception {
-runTest(true, new TestFunction() {
+runTest(true, false, new TestFunction() {
--- End diff --

Okay, let me take a look and get back to you soon. 


---

[GitHub] incubator-livy pull request #121: [LIVY-518][BUILD] Support Spark 2.4

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/incubator-livy/pull/121#discussion_r231342303
  
--- Diff: rsc/src/test/java/org/apache/livy/rsc/TestSparkClient.java ---
@@ -271,7 +275,7 @@ public void call(LivyClient client) throws Exception {
 
   @Test
   public void testSparkSQLJob() throws Exception {
-runTest(true, new TestFunction() {
+runTest(true, false, new TestFunction() {
--- End diff --

@vanzin, thanks. I will investigate it within few days ... but would you 
mind if I do that as a followup job? I still suspect if it's specific to tests 
and shouldn't affect the functionailities within Livy.


---

[GitHub] spark issue #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization from R Da...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
Adding @yanghaogn as well fyi


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization from R Da...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22954
  
Let me leave a cc @felixcheung, @BryanCutler FYI.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [DO-NOT-MERGE][POC] Enables Arrow optimization fr...

2018-11-06 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22954

[DO-NOT-MERGE][POC] Enables Arrow optimization from R DataFrame to Spark 
DataFrame

## What changes were proposed in this pull request?

This PR is not for merging it but targets to demonstrates the feasibility 
(with reusing PyArrow code path at its best) and performance improvement when 
converting R dataframes to Spark's dataframe. This can be tested as below:

```bash
$ ./bin/sparkR --conf spark.sql.execution.arrow.enabled=true
```

```r
collect(createDataFrame(mtcars))
```

**Requirements:**
  - R 3.5.x 
  - Arrow package 0.12+ (not released yet)
  - CRAN released (ARROW-3204)
  - withr package

**TODOs:**
- [ ] Performance measurement
- [ ] TDB

## How was this patch tested?

Small test was added.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark r-arrow-createdataframe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22954.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22954


commit 90011a5ff48f2c5fa5fae0e2573fcdaa85d44976
Author: hyukjinkwon 
Date:   2018-11-06T02:38:37Z

[POC] Enables Arrow optimization from R DataFrame to Spark DataFrame




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22939#discussion_r231029629
  
--- Diff: R/pkg/R/functions.R ---
@@ -2230,6 +2237,32 @@ setMethod("from_json", signature(x = "Column", 
schema = "characterOrstructType")
 column(jc)
   })
 
+#' @details
+#' \code{schema_of_json}: Parses a JSON string and infers its schema in 
DDL format.
+#'
+#' @rdname column_collection_functions
+#' @aliases schema_of_json schema_of_json,characterOrColumn-method
+#' @examples
+#'
+#' \dontrun{
+#' json <- '{"name":"Bob"}'
+#' df <- sql("SELECT * FROM range(1)")
+#' head(select(df, schema_of_json(json)))}
+#' @note schema_of_json since 3.0.0
+setMethod("schema_of_json", signature(x = "characterOrColumn"),
+  function(x, ...) {
+if (class(x) == "character") {
+  col <- callJStatic("org.apache.spark.sql.functions", "lit", 
x)
+} else {
+  col <- x@jc
--- End diff --

Yea .. that was discussed at https://github.com/apache/spark/pull/22775. 
The usecase of `schema_of_csv` or `schema_of_json` will usually be like .. 
copying and pasting one record from the actual data manually. That's disallowed 
for now conservatively.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22940: [MINOR][R] Rename SQLUtils name to RSQLUtils in R

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22940
  
Hm .. okay. let me close this for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22940: [MINOR][R] Rename SQLUtils name to RSQLUtils in R

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/22940


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22940: [MINOR][R] Rename SQLUtils name to RSQLUtils in R

2018-11-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22940
  
Hmmm yea but like .. some of classes similar with this case have been 
renamed time to time, for instancem `json  InferSchema` -> `json  
JSONInferSchema` when CSV datasource was added (because there was `csv  
InferSchema`, which renamed `csv ... CSVInferSchema` as well).
 
Every time I see `SQLUtils`, it's pretty confusing and somehow a bit 
annoying. The only reason I didn't change this so far was that it needs to 
change here and there .. This is somewhat what I guess we should eventually 
change tho. I thought it's better to do given it's 3.0.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22305
  
Let me try to take a look this weekends. Sorry it's been delayed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] zeppelin issue #3206: [ZEPPELIN-3810] Support Spark 2.4

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
Ah, yeap. sure.


---

[GitHub] zeppelin issue #3206: [ZEPPELIN-3810] Support Spark 2.4

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/zeppelin/pull/3206
  
This should be ready for a look as is. I already tested Spark 2.4.0.


---

[GitHub] incubator-livy issue #121: [WIP][LIVY-518][BUILD] Support Spark 2.4

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/incubator-livy/pull/121
  
Yup, let me rebase and get rid of [WIP] tag.


---

[GitHub] spark issue #22953: [SPARK-25946] [BUILD] Upgrade ASM to 7.x to support JDK1...

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22953
  
Looks good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22919: [SPARK-25906][SHELL] Documents '-I' option (from Scala R...

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22919
  
Merged to master and branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22940: [MINOR][R] Rename SQLUtils name to RSQLUtils in R

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22940
  
hmhm .. it's trivial and yea it is a logical change. I happened to take a 
look some codes around here lately, and the name `SQLUtils` actually annoyed me 
few times :(. I will leave it to @felixcheung. I don;t mind closing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22948: [SPARK-25944][R][BUILD] AppVeyor change to latest R vers...

2018-11-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22948
  
cc @felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22948: [SPARK-25944][R][BUILD] AppVeyor change to latest...

2018-11-05 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22948

[SPARK-25944][R][BUILD] AppVeyor change to latest R version (3.5.1)

## What changes were proposed in this pull request?

R 3.5.1 is released 2018-07-02. This PR targets to changes R version from 
3.4.1 to 3.5.1.

## How was this patch tested?

AppVeyor

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-25944

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22948.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22948


commit 472d339254e86d14ed420320821bec26abe3062e
Author: hyukjinkwon 
Date:   2018-11-05T12:19:54Z

AppVeyor change to latest R version (3.5.1)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 12680 matches

Mail list logo