[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21004
  
**[Test build #89049 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89049/testReport)**
 for PR 21004 at commit 
[`10536a6`](https://github.com/apache/spark/commit/10536a6dbf2ab37d7066915223a64e914cf53b5f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21001: [SPARK-19724][SQL][FOLLOW-UP]Check location of managed t...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21001
  
**[Test build #89050 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89050/testReport)**
 for PR 21001 at commit 
[`3fe648f`](https://github.com/apache/spark/commit/3fe648fa03e81b8a2f5ec23182cae3b977164646).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2089/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpark as a...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21007
  
**[Test build #89053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89053/testReport)**
 for PR 21007 at commit 
[`db1987f`](https://github.com/apache/spark/commit/db1987f63370c6c2f9434aea76da7d326565be5a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21005
  
**[Test build #89052 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89052/testReport)**
 for PR 21005 at commit 
[`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89048/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20981
  
**[Test build #89048 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89048/testReport)**
 for PR 20981 at commit 
[`2eb2bf1`](https://github.com/apache/spark/commit/2eb2bf1853a0ba4de8f4a3adfe8407d04a075b22).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-04-09 Thread attilapiros
Github user attilapiros commented on the issue:

https://github.com/apache/spark/pull/20925
  
I have finished my review and have not found any additional issue.

LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20937
  
> I don't know about you but I used to think if something doesn't work it 
means it doesn't work in ALL cases. 

I agree with that there are always rooms for improvement. Trust me, I don't 
usually try to get in a way hard like this.

I would like to avoid to document the auto-detection is supported 
particularly in this case. This is incomplete and we found many holes and we 
also found these are pretty tricky to fix. For example, there was a case that 
`DROPMALFORMED` was required too IIRC. I think really you know what cases don't 
work @MaxGekk because we talked so far. If you really don't know, I will try to 
test, look back and list up the cases that don't work.

I really want to avoid complaints why auto-detection doesn't work and just 
want to clarify _the current status as is_ because this PR targets to add the 
explicit encoding.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180050283
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
 ---
@@ -92,26 +93,30 @@ object TextInputJsonDataSource extends JsonDataSource {
   sparkSession: SparkSession,
   inputPaths: Seq[FileStatus],
   parsedOptions: JSONOptions): StructType = {
-val json: Dataset[String] = createBaseDataset(
-  sparkSession, inputPaths, parsedOptions.lineSeparator)
+val json: Dataset[String] = createBaseDataset(sparkSession, 
inputPaths, parsedOptions)
+
 inferFromDataset(json, parsedOptions)
   }
 
   def inferFromDataset(json: Dataset[String], parsedOptions: JSONOptions): 
StructType = {
 val sampled: Dataset[String] = JsonUtils.sample(json, parsedOptions)
-val rdd: RDD[UTF8String] = 
sampled.queryExecution.toRdd.map(_.getUTF8String(0))
-JsonInferSchema.infer(rdd, parsedOptions, 
CreateJacksonParser.utf8String)
+val rdd: RDD[InternalRow] = sampled.queryExecution.toRdd
+val rowParser = parsedOptions.encoding.map { enc =>
+  CreateJacksonParser.internalRow(enc, _: JsonFactory, _: InternalRow, 
0)
--- End diff --

I didn't it originally but rejected the solution because overhead of 
wrapping the array by `ByteArrayInputStream` per-each row is very high. It 
increases execution time up to 20% in some cases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21007: [SPARK-23942][PYTHON][SQL] Makes collect in PySpa...

2018-04-09 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/21007

[SPARK-23942][PYTHON][SQL] Makes collect in PySpark as action for a query 
executor listener

## What changes were proposed in this pull request?

This PR proposes to add `collect` to  a query executor as an action.

Seems `collect` / `collect` with Arrow are not recognised via 
`QueryExecutionListener` as an action. For example, if we have a custom 
listener as below:

```scala
package org.apache.spark.sql

import org.apache.spark.internal.Logging
import org.apache.spark.sql.execution.QueryExecution
import org.apache.spark.sql.util.QueryExecutionListener


class TestQueryExecutionListener extends QueryExecutionListener with 
Logging {
  override def onSuccess(funcName: String, qe: QueryExecution, durationNs: 
Long): Unit = {
logError("Look at me! I'm 'onSuccess'")
  }

  override def onFailure(funcName: String, qe: QueryExecution, exception: 
Exception): Unit = { }
}
```

**Before**

```python
>>> sql("SELECT * FROM range(1)").collect()
```
```
[Row(id=0)]
```

```python
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>> sql("SELECT * FROM range(1)").toPandas()
```
```
   id
0   0
```

**After**

```python
>>> sql("SELECT * FROM range(1)").collect()
```
```
18/04/09 16:57:58 ERROR TestQueryExecutionListener: Look at me! I'm 
'onSuccess'
[Row(id=0)]
```

```python
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>> sql("SELECT * FROM range(1)").toPandas()
```
```
18/04/09 17:53:26 ERROR TestQueryExecutionListener: Look at me! I'm 
'onSuccess'
   id
0   0
```


Other operations in PySpark or Scala side seems fine:

```python
>>> sql("SELECT * FROM range(1)").show()
```
```
18/04/09 17:02:04 ERROR TestQueryExecutionListener: Look at me! I'm 
'onSuccess'
+---+
| id|
+---+
|  0|
+---+
```

```scala
scala> sql("SELECT * FROM range(1)").collect()
```
```
18/04/09 16:58:41 ERROR TestQueryExecutionListener: Look at me! I'm 
'onSuccess'
res1: Array[org.apache.spark.sql.Row] = Array([0])
```

## How was this patch tested?

I have manually tested as described above.

It's possible to add a test but I should make a mock 
`QueryExecutionListener`, static object with a variable updated by the mock 
`QueryExecutionListener` and check the variable via Py4J. This will also need 
manual skip condition in PySpark side.

I can add this test but .. I usually try to avoid a test with JVM access .. 
let me know if anyone feels ^ is required.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-23942

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21007.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21007


commit db1987f63370c6c2f9434aea76da7d326565be5a
Author: hyukjinkwon 
Date:   2018-04-09T09:54:44Z

Makes collect in PySpark as action for a query executor listener




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21005
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89046/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20944
  
**[Test build #89046 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89046/testReport)**
 for PR 20944 at commit 
[`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/21005
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/20904
  
Jenkins, test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89047/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21005
  
**[Test build #89047 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89047/testReport)**
 for PR 21005 at commit 
[`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/21004
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/20937
  
@HyukjinKwon Let's sync.

> Automatic encoding detection doesn't work for newlines and schema 
inference when multiLine is disabled

I don't know about you but I used to think if something doesn't work it 
means it doesn't work in ALL cases. You write some statements that are 
partially correct or incorrect. About this statement, here are counterexamples:
1. File in UTF-8, multiline is disabled - newline and schema will be 
inferred correctly? Yes
2. File in ISO 8859-1, multiline is disabled. Does it work? Yes.
3. Encoding is CP1251 - the same

All those examples show that your statement is wrong in mathematical 
meaning. 

> I thought this PR targets to add the **explicit encoding** support mainly

EXACTLY. I don't know why do you push me to do something with 
auto-detection. The PR doesn't change behavior in the case if `encoding` is not 
specified. The PR is not about supporting any encoding in any cases. It is 
about the cases when the `encoding` is specified by an user explicitly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20235: [Spark-22887][ML][TESTS][WIP] ML test for Structu...

2018-04-09 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20235#discussion_r180027926
  
--- Diff: mllib/src/test/scala/org/apache/spark/ml/fpm/FPGrowthSuite.scala 
---
@@ -34,86 +35,122 @@ class FPGrowthSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
   }
 
   test("FPGrowth fit and transform with different data types") {
-Array(IntegerType, StringType, ShortType, LongType, ByteType).foreach 
{ dt =>
-  val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt)))
-  val model = new FPGrowth().setMinSupport(0.5).fit(data)
-  val generatedRules = model.setMinConfidence(0.5).associationRules
-  val expectedRules = spark.createDataFrame(Seq(
-(Array("2"), Array("1"), 1.0),
-(Array("1"), Array("2"), 0.75)
-  )).toDF("antecedent", "consequent", "confidence")
-.withColumn("antecedent", col("antecedent").cast(ArrayType(dt)))
-.withColumn("consequent", col("consequent").cast(ArrayType(dt)))
-  assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
-generatedRules.sort("antecedent").rdd.collect()))
-
-  val transformed = model.transform(data)
-  val expectedTransformed = spark.createDataFrame(Seq(
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "2"), Array.emptyIntArray),
-(0, Array("1", "3"), Array(2))
-  )).toDF("id", "items", "prediction")
-.withColumn("items", col("items").cast(ArrayType(dt)))
-.withColumn("prediction", col("prediction").cast(ArrayType(dt)))
-  assert(expectedTransformed.collect().toSet.equals(
-transformed.collect().toSet))
+  class DataTypeWithEncoder[A](val a: DataType)
+  (implicit val encoder: Encoder[(Int, 
Array[A], Array[A])])
+
+  Array(
+new DataTypeWithEncoder[Int](IntegerType),
+new DataTypeWithEncoder[String](StringType),
+new DataTypeWithEncoder[Short](ShortType),
+new DataTypeWithEncoder[Long](LongType)
+// , new DataTypeWithEncoder[Byte](ByteType)
+// TODO: using ByteType produces error, as Array[Byte] is handled 
as Binary
+// cannot resolve 'CAST(`items` AS BINARY)' due to data type 
mismatch:
+// cannot cast array to binary;
+  ).foreach { dt => {
+val data = dataset.withColumn("items", 
col("items").cast(ArrayType(dt.a)))
+val model = new FPGrowth().setMinSupport(0.5).fit(data)
+val generatedRules = model.setMinConfidence(0.5).associationRules
+val expectedRules = Seq(
+  (Array("2"), Array("1"), 1.0),
+  (Array("1"), Array("2"), 0.75)
+).toDF("antecedent", "consequent", "confidence")
+  .withColumn("antecedent", 
col("antecedent").cast(ArrayType(dt.a)))
+  .withColumn("consequent", 
col("consequent").cast(ArrayType(dt.a)))
+assert(expectedRules.sort("antecedent").rdd.collect().sameElements(
+  generatedRules.sort("antecedent").rdd.collect()))
+
+val expectedTransformed = Seq(
+  (0, Array("1", "2"), Array.emptyIntArray),
--- End diff --

I think the "id" column should be of values "0, 1, 2, 3".
Here id column is useless, we can remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21006: [SPARK-22256][MESOS] - Introduce spark.mesos.driver.memo...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21006
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21006: [SPARK-22256][MESOS] - Introduce spark.mesos.driver.memo...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21006
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21006: [SPARK-22256][MESOS] - Introduce spark.mesos.driv...

2018-04-09 Thread pmackles
GitHub user pmackles opened a pull request:

https://github.com/apache/spark/pull/21006

[SPARK-22256][MESOS] - Introduce spark.mesos.driver.memoryOverhead

When running spark driver in a container such as when using the Mesos 
dispatcher service, we need to apply the same rules as for executors in order 
to avoid the JVM going over the allotted limit and then killed.

Tested manually on spark 2.3 branch



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pmackles/spark paul-SPARK-22256

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21006.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21006


commit 1197c0b2e4ae72c1353ab4cd132285da4cfed61e
Author: Paul Mackles 
Date:   2018-04-06T17:44:38Z

[SPARK-22256] - Introduce spark.mesos.driver.memoryOverhead




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20937
  
Wait .. @MaxGekk, I think we should be synced first.

Automatic encoding detection doesn't work for newlines and schema inference 
when `multiLine` is disabled, and I want to clarify this in documentation and 
error messages. I thought this PR targets to add the explicit encoding support 
mainly, as I talked with @cloud-fan and you if I haven't missed. Did I maybe 
misread the discussion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21005
  
yea, I think so and I just suggested we'd better to file a new jira for 
that. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180016246
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -361,6 +361,15 @@ class JacksonParser(
 // For such records, all fields other than the field configured by
 // `columnNameOfCorruptRecord` are set to `null`.
 throw BadRecordException(() => recordLiteral(record), () => None, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """Failed to parse a character. Encoding was detected 
automatically.
--- End diff --

> I don't think `Encoding was detected automatically` is not quite correct.

It is absolutely correct. If `encoding` is not set, it is detected 
automatically by jackson.  Look at the condition `if options.encoding.isEmpty 
=>`. 

> It might not help user solve the issue but it gives less correct 
information.

It gives absolutely correct information.

> They could thought it detects encoding correctly regardless of multiline 
option.

The message DOESN'T say that `encoding` detected correctly.

> Think about this scenario: users somehow get this exception and read 
Failed to parse a character. Encoding was detected automatically.. What would 
they think?

They will look at the proposed solution `You might want to set it 
explicitly via the encoding option like` and will set `encoding`

> I would think somehow the file is somehow failed to read

It could be true even `encoding` is set correctly

> but it looks detecting the encoding in the file correctly automatically 

I don't know why you decided that. I see nothing about `encoding` 
correctness in the message.

> It's annoying to debug encoding related stuff in my experience. It would 
be nicer if we give the correct information as much as we can.

What is your suggestion for the error message?

> I am saying let's document the automatic encoding detection feature only 
for multiLine officially, which is true.

I agree let's document that thought it is not related to this PR. This PR 
doesn't change behavior of encoding auto detection. And it must not change the 
behavior from my point of view. If you want to restrict the encoding 
auto-detection mechanism somehow, please, create separate PR. We will discuss 
separately what kind of customer's apps it will break. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180014636
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -86,14 +85,34 @@ private[sql] class JSONOptions(
 
   val multiLine = 
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
-  val lineSeparator: Option[String] = parameters.get("lineSep").map { sep 
=>
-require(sep.nonEmpty, "'lineSep' cannot be an empty string.")
-sep
+  /**
+   * A string between two consecutive JSON records.
+   */
+  val lineSeparator: Option[String] = parameters.get("lineSep")
+
+  /**
+   * Standard encoding (charset) name. For example UTF-8, UTF-16LE and 
UTF-32BE.
+   * If the encoding is not specified (None), it will be detected 
automatically.
+   */
+  val encoding: Option[String] = parameters.get("encoding")
+.orElse(parameters.get("charset")).map { enc =>
+  val blacklist = List("UTF16", "UTF32")
--- End diff --

Not important but it's more usual and was thinking of doing it if there 
isn't specific reason to make an exception from a norm.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180014167
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -366,6 +366,9 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `java.text.SimpleDateFormat`. This applies to timestamp type.
* `multiLine` (default `false`): parse one record, which may span 
multiple lines,
* per file
+   * `encoding` (by default it is not set): allows to forcibly set one 
of standard basic
+   * or extended charsets for input jsons. For example UTF-8, UTF-16BE, 
UTF-32. If the encoding
+   * is not specified (by default), it will be detected automatically.
--- End diff --

> If encoding is not set, it will be detected by Jackson independently from 
multiline.

Jackson detects but Spark doesn't correctly when `multiLine` is disabled 
even with this PR, as we talked. We found many holes. Why did you bring this 
again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180013348
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
 ---
@@ -92,26 +93,30 @@ object TextInputJsonDataSource extends JsonDataSource {
   sparkSession: SparkSession,
   inputPaths: Seq[FileStatus],
   parsedOptions: JSONOptions): StructType = {
-val json: Dataset[String] = createBaseDataset(
-  sparkSession, inputPaths, parsedOptions.lineSeparator)
+val json: Dataset[String] = createBaseDataset(sparkSession, 
inputPaths, parsedOptions)
+
 inferFromDataset(json, parsedOptions)
   }
 
   def inferFromDataset(json: Dataset[String], parsedOptions: JSONOptions): 
StructType = {
 val sampled: Dataset[String] = JsonUtils.sample(json, parsedOptions)
-val rdd: RDD[UTF8String] = 
sampled.queryExecution.toRdd.map(_.getUTF8String(0))
-JsonInferSchema.infer(rdd, parsedOptions, 
CreateJacksonParser.utf8String)
+val rdd: RDD[InternalRow] = sampled.queryExecution.toRdd
+val rowParser = parsedOptions.encoding.map { enc =>
+  CreateJacksonParser.internalRow(enc, _: JsonFactory, _: InternalRow, 
0)
--- End diff --

Can we do something like

```scala
(factory JsonFactory, row: InternalRow) =>
  val bais = new ByteArrayInputStream(row.getBinary(0)))
  CreateJacksonParser.inputStream(enc, factory, bais)
```
?

Looks `internalRow` doesn't actually deduplicate codes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/21005
  
@maropu it seems a bit of overkill to add a separate trait for this, it 
also kinda nullifies the effect of this PR.

As for the `CalendarInterval`'s support for `divide` and `multiply`. These 
operations have not been implemented yet, and - correct me if I am wrong - 
involve a `CalendarInterval` on the left side and an `Integral` on the right 
side; this violates the contract of `BinaryArithmetic`. Anyway I am not opposed 
to this, but I think we should do this as a part of a separate JIRA/PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180009422
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -361,6 +361,15 @@ class JacksonParser(
 // For such records, all fields other than the field configured by
 // `columnNameOfCorruptRecord` are set to `null`.
 throw BadRecordException(() => recordLiteral(record), () => None, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """Failed to parse a character. Encoding was detected 
automatically.
--- End diff --

I am saying let's document the automatic encoding detection feature only 
for `multiLine` officially, which is true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21005
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2088/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Supp...

2018-04-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20937#discussion_r180009312
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -361,6 +361,15 @@ class JacksonParser(
 // For such records, all fields other than the field configured by
 // `columnNameOfCorruptRecord` are set to `null`.
 throw BadRecordException(() => recordLiteral(record), () => None, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """Failed to parse a character. Encoding was detected 
automatically.
--- End diff --

I don't think `Encoding was detected automatically` is not quite correct. 
It might not help user solve the issue but it gives less correct information. 
They could thought it detects encoding correctly regardless of `multiline` 
option.

Think about this scenario: users somehow get this exception and read  
`Failed to parse a character. Encoding was detected automatically.`. What would 
they think? I would think somehow the file is somehow failed to read but it 
looks detecting the encoding in the file correctly automatically regardless of 
other options.

It's annoying to debug encoding related stuff in my experience. It would be 
nicer if we give the correct information as much as we can.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180008583
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala ---
@@ -119,4 +119,25 @@ object InternalRow {
 case v: MapData => v.copy()
 case _ => value
   }
+
+  /**
+   * Returns an accessor for an InternalRow with given data type and 
ordinal.
+   */
+  def getAccessor(dataType: DataType, ordinal: Int): (InternalRow) => Any 
= dataType match {
+case BooleanType => (input) => input.getBoolean(ordinal)
+case ByteType => (input) => input.getByte(ordinal)
+case ShortType => (input) => input.getShort(ordinal)
+case IntegerType | DateType => (input) => input.getInt(ordinal)
+case LongType | TimestampType => (input) => input.getLong(ordinal)
+case FloatType => (input) => input.getFloat(ordinal)
+case DoubleType => (input) => input.getDouble(ordinal)
+case StringType => (input) => input.getUTF8String(ordinal)
+case BinaryType => (input) => input.getBinary(ordinal)
+case CalendarIntervalType => (input) => input.getInterval(ordinal)
+case t: DecimalType => (input) => input.getDecimal(ordinal, 
t.precision, t.scale)
+case t: StructType => (input) => input.getStruct(ordinal, t.size)
+case _: ArrayType => (input) => input.getArray(ordinal)
+case _: MapType => (input) => input.getMap(ordinal)
+case _ => (input) => input.get(ordinal, dataType)
--- End diff --

Handle `UDT`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21005: [SPARK-23898][SQL] Simplify add & subtract code generati...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21005
  
**[Test build #89047 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89047/testReport)**
 for PR 21005 at commit 
[`433`](https://github.com/apache/spark/commit/43314b1d443fac5ca27ecef80677dbe70ab7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20981: [SPARK-23873][SQL] Use accessors in interpreted L...

2018-04-09 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/20981#discussion_r180008527
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala
 ---
@@ -33,28 +33,14 @@ case class BoundReference(ordinal: Int, dataType: 
DataType, nullable: Boolean)
 
   override def toString: String = s"input[$ordinal, 
${dataType.simpleString}, $nullable]"
 
+  private lazy val accessor: InternalRow => Any = 
InternalRow.getAccessor(dataType, ordinal)
--- End diff --

Do we need to be lazy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20981
  
**[Test build #89048 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89048/testReport)**
 for PR 20981 at commit 
[`2eb2bf1`](https://github.com/apache/spark/commit/2eb2bf1853a0ba4de8f4a3adfe8407d04a075b22).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2087/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20981
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20944
  
**[Test build #89046 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89046/testReport)**
 for PR 20944 at commit 
[`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2086/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20944
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89042/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89040/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20904
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89039/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20981
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20904
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20981
  
**[Test build #89040 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89040/testReport)**
 for PR 20981 at commit 
[`a8cdbe8`](https://github.com/apache/spark/commit/a8cdbe8baf2d508fb2583862042f1213cf0eae7b).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20904: [SPARK-23751][ML][PySpark] Kolmogorov-Smirnoff test Pyth...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20904
  
**[Test build #89039 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89039/testReport)**
 for PR 20904 at commit 
[`49a7ddb`](https://github.com/apache/spark/commit/49a7ddb45cb9a0035e3faed5906ecd37890333e1).
 * This patch **fails due to an unknown error code, -9**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21004
  
**[Test build #89044 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89044/testReport)**
 for PR 21004 at commit 
[`10536a6`](https://github.com/apache/spark/commit/10536a6dbf2ab37d7066915223a64e914cf53b5f).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20937
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20937
  
**[Test build #89045 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89045/testReport)**
 for PR 20937 at commit 
[`b817184`](https://github.com/apache/spark/commit/b817184d35d0e2589682f1dcd88b9f29b2063f5b).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89044/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20944
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89043/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...

2018-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20937
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89045/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20981: [SPARK-23873][SQL] Use accessors in interpreted LambdaVa...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20981
  
**[Test build #89042 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89042/testReport)**
 for PR 20981 at commit 
[`2eb2bf1`](https://github.com/apache/spark/commit/2eb2bf1853a0ba4de8f4a3adfe8407d04a075b22).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...

2018-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20944
  
**[Test build #89043 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89043/testReport)**
 for PR 20944 at commit 
[`1c801f1`](https://github.com/apache/spark/commit/1c801f1e673b3d6f9e94eeade08d5b309a105061).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5