[GitHub] spark issue #21496: docs: fix typo

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21496
  
**[Test build #91688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91688/testReport)**
 for PR 21496 at commit 
[`fea9616`](https://github.com/apache/spark/commit/fea9616fb35e3fcf886073767da040aef3a408e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21496: docs: fix typo

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21496
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread jainaks
Github user jainaks commented on the issue:

https://github.com/apache/spark/pull/21320
  
@mallman It does work fine with "name.First".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21518: [SPARK-24502][SQL] flaky test: UnsafeRowSerialize...

2018-06-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21518


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21535
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21535
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3925/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21535
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/35/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21535
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21518: [SPARK-24502][SQL] flaky test: UnsafeRowSerializerSuite

2018-06-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21518
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21535
  
**[Test build #91687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91687/testReport)**
 for PR 21535 at commit 
[`b8c7238`](https://github.com/apache/spark/commit/b8c7238aec9d6d79b8528eb3f47c0de7a48d23e8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21496: docs: fix typo

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21496
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91683/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21535: [SPARK-23596][SQL][WIP] Test interpreted path on Dataset...

2018-06-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21535
  
cc @cloud-fan @hvanhovell @kiszk @mgaido91 @maropu 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21496: docs: fix typo

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21496
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21535: [SPARK-23596][SQL][WIP] Test interpreted path on ...

2018-06-11 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/21535

[SPARK-23596][SQL][WIP] Test interpreted path on Dataset and DataFrame test 
suites

## What changes were proposed in this pull request?

We have completed a significant subset of the object related Expressions to 
provide an interpreted fallback. This PR is going to modify the Dataset tests 
to also test the interpreted code paths.

One concern right now is that by testing the interpreted code paths too, we 
will double current test time or more. Otherwise, we can only choose to test 
the interpreted code paths for just few test suites such as DatasetSuite, 
DataFrameSuite.

This is in WIP status now for discussing the approach and also the test 
scope of interpreted code paths.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-23596

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21535.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21535


commit b8c7238aec9d6d79b8528eb3f47c0de7a48d23e8
Author: Liang-Chi Hsieh 
Date:   2018-06-12T05:00:06Z

Test interpreted path on Dataset and DataFrame test suites.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21496: docs: fix typo

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21496
  
**[Test build #91683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91683/testReport)**
 for PR 21496 at commit 
[`fea9616`](https://github.com/apache/spark/commit/fea9616fb35e3fcf886073767da040aef3a408e0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21357
  
**[Test build #91686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91686/testReport)**
 for PR 21357 at commit 
[`8ad2a3f`](https://github.com/apache/spark/commit/8ad2a3f8112662a865ee1dbaf7c5269197c3ee4f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21357
  
retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21534: [SPARK-24526][build] Spaces in the build dir causes fail...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21534
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21534: [SPARK-24526][build] Spaces in the build dir causes fail...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21534
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21534: [SPARK-24526][build] Spaces in the build dir causes fail...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21534
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21534: [SPARK-24526][build] Spaces in the build dir caus...

2018-06-11 Thread trystanleftwich
GitHub user trystanleftwich opened a pull request:

https://github.com/apache/spark/pull/21534

[SPARK-24526][build] Spaces in the build dir causes failures in the 
build/mvn script


## What changes were proposed in this pull request?

Fix the call to ${MVN_BIN} to be wrapped in quotes so it will handle having 
spaces in the path.

## How was this patch tested?

Ran the following to confirm using the build/mvn tool with a space in the 
build dir now works without error

```
mkdir /tmp/test\ spaces
cd /tmp/test\ spaces
git clone https://github.com/apache/spark.git
cd spark
# Remove all mvn references in PATH so the script will download mvn to the 
local dir
./build/mvn -DskipTests clean package
```

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/trystanleftwich/spark SPARK-24526

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21534


commit bb12f3e2ad74f9d4c89e1c7adab4d306fa87b101
Author: trystanleftwich 
Date:   2018-06-12T04:44:33Z

[SPARK-24526][build] Spaces in the build dir causes failures in the 
build/mvn script




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/spark/pull/21469#discussion_r194613720
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -112,14 +122,19 @@ trait StateStoreWriter extends StatefulOperator { 
self: SparkPlan =>
 val storeMetrics = store.metrics
 longMetric("numTotalStateRows") += storeMetrics.numKeys
 longMetric("stateMemory") += storeMetrics.memoryUsedBytes
-storeMetrics.customMetrics.foreach { case (metric, value) =>
-  longMetric(metric.name) += value
+storeMetrics.customMetrics.foreach {
+  case (metric: StateStoreCustomAverageMetric, value) =>
+longMetric(metric.name).set(value * 1.0d)
--- End diff --

We would be better to think about the actual benefit of exposing the value, 
rather than how to expose the value to somewhere. If we define it as count and 
do aggregation as summation, the aggregated value will be `(partition count * 
versions)` which might be hard for end users to find the meaning from the value.

I'm afraid that exposing this to StreamingQuery as average is not trivial, 
especially SQLMetric is defined as `AccumulatorV2[Long, Long]` so only single 
Long value can be passed. Under the restriction, we couldn't define `merge` 
operation for `average metric`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21427
  
> But we marked this as experimental ...

That's also special for this case, we marked it as experimental in 2.3.1.

Not a lot of behavior changes are similar to this one. To highlight:
1. it's not marked as experimental in the first release.
2. it missed 2.3.1, so the old behavior will be there for some time, until 
the next release(2.3.2 or 2.4.0)
3. it turns runnable code into failure, and the old behavior is kind of 
self-consistent(by-position match). it's not like turning failures into 
runnable or fix a correctness bug.

To summary:
1. I agree the new behavior makes more sense, we should have done that in 
the first place.
2. This is a special case like I mentioned above. We should be a little 
more conservative here.
3. Adding a config is not hard. Maybe @ueshin can build the framework first 
for passing configs to the python worker?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91684/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21320
  
**[Test build #91684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91684/testReport)**
 for PR 21320 at commit 
[`7f67ec0`](https://github.com/apache/spark/commit/7f67ec0a82dd09dd867d5882dda0965fcab28974).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r194611249
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -468,6 +468,18 @@ def input_file_name():
 return Column(sc._jvm.functions.input_file_name())
 
 
+@since(2.4)
+def isinf(col):
--- End diff --

Shall we expose this to column.py too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r194610745
  
--- Diff: R/pkg/NAMESPACE ---
@@ -281,6 +281,8 @@ exportMethods("%<=>%",
   "initcap",
   "input_file_name",
   "instr",
+  "isInf",
+  "isinf",
--- End diff --

Ah, I got it now. I believe we should match it to one side though. I 
roughly remember we keep functions this_naming_style in 
functions[.py|.R|.scala], 
e.g.([SPARK-10621](https://issues.apache.org/jira/browse/SPARK-10621)). Shall 
we stick to `isinf` then?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3924/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21319
  
**[Test build #91685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91685/testReport)**
 for PR 21319 at commit 
[`91fdedc`](https://github.com/apache/spark/commit/91fdedc4d91a7abde5f6b64dbfcf354b67d89a48).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/34/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21319
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-11 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21501#discussion_r194606928
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, 
HasInputCol, HasOutputCol, JavaMLReadabl
   typeConverter=TypeConverters.toListString)
 caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do 
a case sensitive " +
   "comparison over the stop words", 
typeConverter=TypeConverters.toBoolean)
+locale = Param(Params._dummy(), "locale", "locale of the input. 
ignored when case sensitive " +
+   "is true", typeConverter=TypeConverters.toString)
 
 @keyword_only
-def __init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=False):
+def __init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=False,
+ locale=None):
 """
-__init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=false)
+__init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=false,
+locale=None)
--- End diff --

Please add \ to the end of L2592 and use the right indentation here. 
Unfortunately, we need this to make the doc correctly displayed. See 
https://github.com/apache/spark/blob/master/python/pyspark/ml/feature.py#L3112.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-11 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21501#discussion_r194607278
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, 
HasInputCol, HasOutputCol, JavaMLReadabl
   typeConverter=TypeConverters.toListString)
 caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do 
a case sensitive " +
   "comparison over the stop words", 
typeConverter=TypeConverters.toBoolean)
+locale = Param(Params._dummy(), "locale", "locale of the input. 
ignored when case sensitive " +
+   "is true", typeConverter=TypeConverters.toString)
 
 @keyword_only
-def __init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=False):
+def __init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=False,
+ locale=None):
 """
-__init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=false)
+__init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=false,
+locale=None)
 """
 super(StopWordsRemover, self).__init__()
 self._java_obj = 
self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover",
 self.uid)
 
self._setDefault(stopWords=StopWordsRemover.loadDefaultStopWords("english"),
- caseSensitive=False)
+ caseSensitive=False, 
locale=StopWordsRemover.defaultLocale())
--- End diff --

You already have the `_java_obj`, call `_java_object.getLocale()` would 
give you the default locale. And then Python users only need 
`stopWordsRemover.getLocale()` to get the default value. In the param doc, we 
should make it clear that the default would be the JVM default locale.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-11 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21501#discussion_r194606981
  
--- Diff: python/pyspark/ml/feature.py ---
@@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, 
HasInputCol, HasOutputCol, JavaMLReadabl
   typeConverter=TypeConverters.toListString)
 caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do 
a case sensitive " +
   "comparison over the stop words", 
typeConverter=TypeConverters.toBoolean)
+locale = Param(Params._dummy(), "locale", "locale of the input. 
ignored when case sensitive " +
+   "is true", typeConverter=TypeConverters.toString)
 
 @keyword_only
-def __init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=False):
+def __init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=False,
+ locale=None):
 """
-__init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=false)
+__init__(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=false,
+locale=None)
 """
 super(StopWordsRemover, self).__init__()
 self._java_obj = 
self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover",
 self.uid)
 
self._setDefault(stopWords=StopWordsRemover.loadDefaultStopWords("english"),
- caseSensitive=False)
+ caseSensitive=False, 
locale=StopWordsRemover.defaultLocale())
 kwargs = self._input_kwargs
 self.setParams(**kwargs)
 
 @keyword_only
 @since("1.6.0")
-def setParams(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=False):
+def setParams(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=False,
+  locale=None):
 """
-setParams(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=false)
+setParams(self, inputCol=None, outputCol=None, stopWords=None, 
caseSensitive=false,
+locale=None)
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-11 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21501#discussion_r194606418
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") 
override val uid: String
   @Since("1.5.0")
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> 
StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
+  /**
+   * Locale of the input for case insensitive matching. Ignored when 
[[caseSensitive]]
+   * is true.
+   * Default: Locale.getDefault.toString
+   * @see `StopWordsRemover.getDefaultLocale()`
--- End diff --

I feel it is unnecessary to expose it as a public API. This is the same as 
`Locale.getDefault.toString` or `stopWordsRemover.getLocale` when nothing is 
set. See my comments on the Python API.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21357
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21357
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91681/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21357
  
**[Test build #91681 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91681/testReport)**
 for PR 21357 at commit 
[`8ad2a3f`](https://github.com/apache/spark/commit/8ad2a3f8112662a865ee1dbaf7c5269197c3ee4f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21486: [SPARK-24387][Core] Heartbeat-timeout executor is...

2018-06-11 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21486#discussion_r194606075
  
--- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ---
@@ -197,14 +197,14 @@ private[spark] class HeartbeatReceiver(sc: 
SparkContext, clock: Clock)
   if (now - lastSeenMs > executorTimeoutMs) {
 logWarning(s"Removing executor $executorId with no recent 
heartbeats: " +
   s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " 
+
-  s"timed out after ${now - lastSeenMs} ms"))
   // Asynchronously kill the executor to avoid blocking the 
current thread
 killExecutorThread.submit(new Runnable {
   override def run(): Unit = Utils.tryLogNonFatalError {
 // Note: we want to get an executor back after expiring this 
one,
 // so do not simply call `sc.killExecutor` here (SPARK-8119)
 sc.killAndReplaceExecutor(executorId)
--- End diff --

To be more specific, `killAndReplaceExecutor#killExecutors` will block 
until we get response from cluster manager or overtime after 120s (by default 
RPC timeout config).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21045: [SPARK-23931][SQL] Adds arrays_zip function to sparksql

2018-06-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21045
  
LGTM. I'm fine with this function name `arrays_zip` but wondering if others 
all agree on it too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21370
  
@xuanyuanking Just for your reference, for this PR, the PR description can 
be improved to something like 


> This PR is to add eager execution into the __repr__ and _repr_html_ of 
the DataFrame APIs in PySpark. When eager evaluation is enabled, _repr_html_ 
returns a rich HTML version of the top-K rows of the DataFrame output. If 
`_repr_html_` is not called by REPL,  `_repr_` will return the plain text of 
the top-K rows. 

> This PR adds three new external SQL confs for controlling the behavior of 
eager evaluation:

> - spark.sql.repl.eagerEval.enabled: Enables eager evaluation or not. When 
true, the top K rows of Dataset will be displayed if and only if the REPL 
supports the eager evaluation. Currently, the eager evaluation is only 
supported in PySpark. For the notebooks like Jupyter, the HTML table (generated 
by _repr_html_) will be returned. For plain Python REPL, the returned outputs 
are formatted like dataframe.show(). 

> - spark.sql.repl.eagerEval.maxNumRows: The max number of rows that are 
returned by eager evaluation. This only takes effect when 
spark.sql.repl.eagerEval.enabled is set to true.

> - spark.sql.repl.eagerEval.truncate: The max number of characters of each 
row that is returned by eager evaluation. This only takes effect when 
spark.sql.repl.eagerEval.enabled is set to true.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21096: [SPARK-24011][CORE][WIP] cache rdd's immediate pa...

2018-06-11 Thread Ngone51
Github user Ngone51 closed the pull request at:

https://github.com/apache/spark/pull/21096


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20996: [SPARK-23884][CORE] hasLaunchedTask should be tru...

2018-06-11 Thread Ngone51
Github user Ngone51 closed the pull request at:

https://github.com/apache/spark/pull/20996


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-06-11 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/21370
  
```
Test coverage is the most critical when we refactor the existing code and 
add new features. Hopefully, when you submit new PRs in the future, could you 
also improve this part?
```
Of cause, I'll do this in a follow up PR and answer all question from Xiao 
this night. Thanks for all your comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91680/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21319
  
**[Test build #91680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91680/testReport)**
 for PR 21319 at commit 
[`91fdedc`](https://github.com/apache/spark/commit/91fdedc4d91a7abde5f6b64dbfcf354b67d89a48).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21320
  
**[Test build #91684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91684/testReport)**
 for PR 21320 at commit 
[`7f67ec0`](https://github.com/apache/spark/commit/7f67ec0a82dd09dd867d5882dda0965fcab28974).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/33/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3923/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91679/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21320
  
**[Test build #91679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91679/testReport)**
 for PR 21320 at commit 
[`89febc8`](https://github.com/apache/spark/commit/89febc8e978d606e32911088e9589462805b8697).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21533
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3922/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21533
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21533
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/32/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21533
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21496: docs: fix typo

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21496
  
**[Test build #91683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91683/testReport)**
 for PR 21496 at commit 
[`fea9616`](https://github.com/apache/spark/commit/fea9616fb35e3fcf886073767da040aef3a408e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21533
  
**[Test build #91682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91682/testReport)**
 for PR 21533 at commit 
[`f922fd8`](https://github.com/apache/spark/commit/f922fd8c995164cada4a8b72e92c369a827def16).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-06-11 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/21533
  
cc @felixcheung. Please take a look about this when you have time. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21533: [SPARK-24195][Core] Bug fix for local:/ path in S...

2018-06-11 Thread xuanyuanking
GitHub user xuanyuanking opened a pull request:

https://github.com/apache/spark/pull/21533

[SPARK-24195][Core] Bug fix for local:/ path in SparkContext.addFile

## What changes were proposed in this pull request?

In the chagnes in 
[SPARK-6300](https://issues.apache.org/jira/browse/SPARK-6300), essentially it 
change schemePath to
```
new File(path).getCanonicalFile.toURI.toString
```
. This has problem when path is local:, as `java.io.File` doesn't handle it.

eg.

new 
File("local:///home/user/demo/logger.config").getCanonicalFile.toURI.toString
res1: String = file:/user/anotheruser/local:/home/user/demo/logger.config

## How was this patch tested?

Add test in `SparkContextSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuanyuanking/spark SPARK-24195

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21533.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21533


commit f922fd8c995164cada4a8b72e92c369a827def16
Author: Yuanjian Li 
Date:   2018-06-12T01:51:44Z

bug fix for local:/ path in sc.addFile




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21496: docs: fix typo

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21496
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21496: docs: fix typo

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21496
  
**[Test build #4199 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4199/testReport)**
 for PR 21496 at commit 
[`fea9616`](https://github.com/apache/spark/commit/fea9616fb35e3fcf886073767da040aef3a408e0).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21521: [SPARK-23732][docs] Fix source links in generated...

2018-06-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21521


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21258
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91677/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21258
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21357
  
**[Test build #91681 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91681/testReport)**
 for PR 21357 at commit 
[`8ad2a3f`](https://github.com/apache/spark/commit/8ad2a3f8112662a865ee1dbaf7c5269197c3ee4f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21258: [SPARK-23933][SQL] Add map_from_arrays function

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21258
  
**[Test build #91677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91677/testReport)**
 for PR 21258 at commit 
[`38d0868`](https://github.com/apache/spark/commit/38d086877385324ae872652e9dbeb484a0915557).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21521: [SPARK-23732][docs] Fix source links in generated scalad...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21521
  
Merged to master and branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21357
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-11 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21469#discussion_r194592510
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
 ---
@@ -112,14 +122,19 @@ trait StateStoreWriter extends StatefulOperator { 
self: SparkPlan =>
 val storeMetrics = store.metrics
 longMetric("numTotalStateRows") += storeMetrics.numKeys
 longMetric("stateMemory") += storeMetrics.memoryUsedBytes
-storeMetrics.customMetrics.foreach { case (metric, value) =>
-  longMetric(metric.name) += value
+storeMetrics.customMetrics.foreach {
+  case (metric: StateStoreCustomAverageMetric, value) =>
+longMetric(metric.name).set(value * 1.0d)
--- End diff --

Not sure if SQLAppstatusListener comes into play for reporting query 
progress. (e.g. StreamingQueryWrapper.lastProgress)


https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L193

Based on my understanding, the SQLMetric is an Accumulator so the merged 
values of the accumulators across all the tasks is returned. The merge 
operation in SQLMetric just adds the value so it makes sense only for count or 
size values. We would be able to display the (min, med, max) values for now in 
the UI and not in the "query status". I was thinking if we make it a count 
metric, it may work (similar to  number of state rows). I am fine with either 
way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21532
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91676/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21532
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21532
  
**[Test build #91676 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91676/testReport)**
 for PR 21532 at commit 
[`f58b944`](https://github.com/apache/spark/commit/f58b94411d6564d66338f97b9e753cd3267dd0cf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21527: [SPARK-24519] MapStatus has 2000 hardcoded

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21527
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21527: [SPARK-24519] MapStatus has 2000 hardcoded

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21527
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91675/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21527: [SPARK-24519] MapStatus has 2000 hardcoded

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21527
  
**[Test build #91675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91675/testReport)**
 for PR 21527 at commit 
[`4c8acfa`](https://github.com/apache/spark/commit/4c8acfa5899ccbdafeb630f38ce44b23332b80f2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21319
  
**[Test build #91680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91680/testReport)**
 for PR 21319 at commit 
[`91fdedc`](https://github.com/apache/spark/commit/91fdedc4d91a7abde5f6b64dbfcf354b67d89a48).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/31/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21319
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3921/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
> we can't just change the behavior. We think the old behavior doesn't make 
sense and users should change their code, but users may not think in this way.

I think this basically mean we will have every configuration for each 
behaviour change whether it's a bug or not.

If we failed to explain why users could think it makes sense in a way, how 
about elaborating it rather then thinking hypothetically there might be.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
Okay, but I get it can be smooth to go ahead. I am okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19364: [SPARK-22144][SQL] ExchangeCoordinator combine th...

2018-06-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19364


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
But we marked this as experimental. If we treat old API and new 
experimental API in the same way, I wonder why we have them. One thing I am 
less clear is, what kind of scenario we are worried of. I reread the discussion 
here and I still don't know which case we are worried of breaking.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21523: [SPARK-24506][UI] Add UI filters also to thriftserver ta...

2018-06-11 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21523
  
@mgaido91 Please fix the PR title and description to reflect the new 
changes you made.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19364: [SPARK-22144][SQL] ExchangeCoordinator combine the parti...

2018-06-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19364
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21517: Testing k9s change - please ignore (13)

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21517
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/29/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21357
  
Kindly ping again to @tdas 

And cc. to @jose-torres @jerryshao @HyukjinKwon @arunmahadevan for 
reviewing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21517: Testing k9s change - please ignore (13)

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21517
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21222
  
Kindly ping again to @tdas


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/30/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21469
  
Kindly ping again to @tdas 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >