[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23227
  
**[Test build #99719 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99719/testReport)**
 for PR 23227 at commit 
[`5cb416d`](https://github.com/apache/spark/commit/5cb416df5f03b0d750c83e1a8a344b8ea44b1735).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23159
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99715/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc fo...

2018-12-05 Thread seancxmao
GitHub user seancxmao opened a pull request:

https://github.com/apache/spark/pull/23238

[SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-insensitive field 
resolution when reading from Parquet

## What changes were proposed in this pull request?
#22148 introduces a behavior change. According to discussion at #22184, 
this PR updates migration guide when upgrade from Spark 2.3 to 2.4.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/seancxmao/spark SPARK-25132-doc-2.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23238


commit 5bbcf41f34f2ca160da7ef4ebe4c54d15a2d09b5
Author: seancxmao 
Date:   2018-12-05T15:05:38Z

[SPARK-25132][SQL][FOLLOWUP] Update migration doc for case-insensitive 
field resolution when reading from Parquet




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23233: [SPARK-26233][SQL][BACKPORT-2.3] CheckOverflow when enco...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23233
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99717/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23159
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22957
  
**[Test build #99713 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99713/testReport)**
 for PR 22957 at commit 
[`e4f617f`](https://github.com/apache/spark/commit/e4f617fc7e47d7c49f3d773ac2d91c5508c0a239).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23159
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5774/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23159
  
**[Test build #99735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99735/testReport)**
 for PR 23159 at commit 
[`e0aa626`](https://github.com/apache/spark/commit/e0aa626c886976489348a6c0179d160bbe3252da).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23196
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-05 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23196
  
jenkins, retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23223
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99733/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23223
  
**[Test build #99733 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99733/testReport)**
 for PR 23223 at commit 
[`65a70dc`](https://github.com/apache/spark/commit/65a70dcbb7993731104deab2592a5b969a31414e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5778/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-05 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/23218
  
shouldn't be too hard, but it will require some downtime.

On Wed, Dec 5, 2018 at 5:41 AM Sean Owen  wrote:

> Ah OK, so all of them were a JVM crash. It would probably be a good idea
> to update the JVM on all the workers as _60 is over 3 years old. It's
> probably not as simple as it sounds but WDYT @shaneknapp
>  ?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...

2018-12-05 Thread ptkool
Github user ptkool commented on the issue:

https://github.com/apache/spark/pull/23222
  
I'm not sure how this ended up being omitted. `TransposeWindowSuite` will 
be fine since it creates a simple optimizer from this rule and a few others. 
The new test added to `DataFrameWindowFunctionsSuite` should pass with or 
without this rule added to the optimizer batch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23207
  
**[Test build #99736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99736/testReport)**
 for PR 23207 at commit 
[`76d1ca0`](https://github.com/apache/spark/commit/76d1ca0036bbb50a005e9d12f8b22bf21697af7f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...

2018-12-05 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/23227
  
thanks,@cloud-fan,@HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23235
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23231
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99712/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5777/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23239
  
cc @adoron @kiszk @viirya @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23239
  
Yes, the 3 cases I pointed that need to handle NaN and -0.0 do not change 
the value in `UnsafeRow`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5776/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23223
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23231
  
**[Test build #99712 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99712/testReport)**
 for PR 23231 at commit 
[`453d60f`](https://github.com/apache/spark/commit/453d60f42b99de621a7ee3fab6bc6138fc20ed05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial resul...

2018-12-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23235


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-12-05 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/22305
  
Hi @BryanCutler @HyukjinKwon @ueshin , mind taking another look? I think 
this is in a good shape. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23239
  
**[Test build #99737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99737/testReport)**
 for PR 23239 at commit 
[`797ade3`](https://github.com/apache/spark/commit/797ade3eb175c41866efbffa3cb4c30f90e49ca7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23239
  
**[Test build #99738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99738/testReport)**
 for PR 23239 at commit 
[`7d5ff06`](https://github.com/apache/spark/commit/7d5ff06969cd1fe8d05adcf8c79d6b56d0865f44).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23231: [SPARK-26273][ML] Add OneHotEncoderEstimator as alias to...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23231
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

2018-12-05 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/23239

[SPARK-26021][SQL][followup] only deal with NaN and -0.0 in UnsafeWriter

## What changes were proposed in this pull request?

A followup of https://github.com/apache/spark/pull/23043

There are 4 places we need to deal with NaN and -0.0:
1. Range partitioner(the sorter). `-0.0` and `0.0` should be assigned to 
the same partition. Different NaNs should be assigned to the same partition.
2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs 
should be treated as same.
3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. 
Different NaNs should be assigned to the same group.
4. comparison expressions. `-0.0` and `0.0` should be treated as same. 
Different NaNs should be treated as same.

The case 4 is OK. Our comparison already handles NaN and -0.0, and for 
struct/array/map, we will recursively compare the fields/elements.

Case 1, 2 and 3 are problematic, as they compare `UnsafeRow` binary 
directly, and different NaNs have different binary representation, and the same 
thing happens for -0.0 and 0.0.

To fix it, a simple solution is: let `UnsafeProjection` always produce 
`UnsafeRow`s with NaN and -0.0 normalized(use the standard NaN and replace -0.0 
with 0.0). The `UnsafeRow`s in case 1, 2 and 3 are all created by 
`UnsafeProjection`.

Following this direction, this PR moves the handling of NaN and -0.0 from 
`Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not 
handle them, which reduces the perf overhead. It's also easier to add comments 
explaining why we do it in `UnsafeWriter`.

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23239


commit 797ade3eb175c41866efbffa3cb4c30f90e49ca7
Author: Wenchen Fan 
Date:   2018-12-05T15:05:39Z

only deal with NaN and -0.0 in UnsafeWriter




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when enco...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23232
  
**[Test build #99716 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99716/testReport)**
 for PR 23232 at commit 
[`821db48`](https://github.com/apache/spark/commit/821db4854c0e685aac3168da75a1c839681dbfc4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when enco...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23232
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread adoron
Github user adoron commented on the issue:

https://github.com/apache/spark/pull/23239
  
@cloud-fan what about UnsafeRow::setDouble/Float? It doesn't go through the 
same flow. Is it not used?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23239
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99738/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239228923
  
--- Diff: examples/src/main/r/ml/powerIterationClustering.R ---
@@ -0,0 +1,37 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/ml/powerIterationClustering.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-ML-powerIterationCLustering-example")
+
+# $example on$
+df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+   list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+   list(4L, 0L, 0.1)), schema = c("src", "dst", 
"weight"))
+#assign clusters
--- End diff --

nit. `#assign` -> `# assign`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239238873
  
--- Diff: docs/ml-clustering.md ---
@@ -265,3 +265,44 @@ Refer to the [R API 
docs](api/R/spark.gaussianMixture.html) for more details.
 
 
 
+
+## Power Iteration Clustering (PIC)
+
+Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
+developed by http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and 
Cohen.
+From the abstract: PIC finds a very low-dimensional embedding of a dataset
+using truncated power iteration on a normalized pair-wise similarity 
matrix of the data.
+
+`spark.ml`'s PowerIterationClustering implementation takes the following 
parameters:
+
+* `k`: the number of clusters to create
+* `initMode`: param for the initialization algorithm
+* `maxIter`: param for maximum number of iterations
+* `srcCol`: param for the name of the input column for source vertex IDs
+* `dstCol`: name of the input column for destination vertex IDs
+* `weightCol`: Param for weight column name
+
+**Examples**
+
+
+
+
+Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.PowerIterationClustering)
 for more details.
+
+{% include_example 
scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala %}
+
+
+
+Refer to the [Java API 
docs](api/java/org/apache/spark/ml/clustering/PowerIterationClustering.html) 
for more details.
+
+{% include_example 
java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java %}
+
+
+
--- End diff --

@dongjoon-hyun 
https://github.com/apache/spark/pull/22996
I will add the python example in the doc once the above PR is merged in. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

2018-12-05 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21465#discussion_r239243683
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -1174,9 +1165,31 @@ def trees(self):
 return [DecisionTreeClassificationModel(m) for m in 
list(self._call_java("trees"))]
 
 
+class GBTClassifierParams(GBTParams, HasVarianceImpurity):
+"""
+Private class to track supported GBTClassifier params.
+
+.. versionadded:: 3.0.0
+"""
+
+supportedLossTypes = ["logistic"]
+
+lossType = Param(Params._dummy(), "lossType",
+ "Loss function which GBT tries to minimize 
(case-insensitive). " +
+ "Supported options: " + ", ".join(supportedLossTypes),
+ typeConverter=TypeConverters.toString)
+
+@since("3.0.0")
+def setLossType(self, value):
--- End diff --

please address the above comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239250376
  
--- Diff: docs/ml-clustering.md ---
@@ -265,3 +265,44 @@ Refer to the [R API 
docs](api/R/spark.gaussianMixture.html) for more details.
 
 
 
+
+## Power Iteration Clustering (PIC)
+
+Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
+developed by http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and 
Cohen.
--- End diff --

I normally check the md file on the github. The link works OK. Is there a 
better way to check? @dongjoon-hyun @felixcheung 

https://github.com/apache/spark/blob/9158da8cb76cc13f3011deaa7ac2c290eef62389/docs/ml-clustering.md
I guess I will still remove the ```a href=``` since no other places in the 
doc uses ``


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/23072
  
@dongjoon-hyun Thank you very much for your review. I will make the changes 
soon. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239250335
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -610,3 +616,58 @@ setMethod("write.ml", signature(object = "LDAModel", 
path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data A SparkDataFrame.
+#' @param k The number of clusters to create.
+#' @param initMode Param for the initialization algorithm.
+#' @param maxIter Param for maximum number of iterations.
+#' @param sourceCol Param for the name of the input column for source 
vertex IDs.
+#' @param destinationCol Name of the input column for destination vertex 
IDs.
+#' @param weightCol Param for weight column name. If this is not set or 
\code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the 
corresponding cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases 
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+#'   list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+#'   list(4L, 0L, 0.1)), schema = c("src", "dst", 
"weight"))
+#' clusters <- spark.assignClusters(df, initMode="degree", 
weightCol="weight")
+#' showDF(clusters)
+#' }
+#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
+setMethod("spark.assignClusters",
+  signature(data = "SparkDataFrame"),
+  function(data, k = 2L, initMode = c("random", "degree"), maxIter 
= 20L,
+sourceCol = "src", destinationCol = "dst", weightCol = NULL) {
+if (!is.numeric(k) || k < 1) {
+  stop("k should be a number with value >= 1.")
+}
+if (!is.integer(maxIter) || maxIter <= 0) {
+  stop("maxIter should be a number with value > 0.")
+}
--- End diff --

@dongjoon-hyun ```src``` and ```dst``` are character columns. I have the 
check for character type. 
```
as.character(sourceCol),
as.character(destinationCol)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

2018-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23239#discussion_r239256853
  
--- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
---
@@ -174,11 +174,6 @@ public static float getFloat(Object object, long 
offset) {
   }
 
   public static void putFloat(Object object, long offset, float value) {
-if (Float.isNaN(value)) {
-  value = Float.NaN;
-} else if (value == -0.0f) {
-  value = 0.0f;
-}
--- End diff --

These change are expected to cause the following test case failure in 
`PlatformUtilSuite`, but it seems to be missed. Could you fix the test case or 
remove together, @cloud-fan ?
- 
https://github.com/apache/spark/blob/master/common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java#L162-L163


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21465
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5784/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-05 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r239266908
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -186,6 +186,82 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
 }
   }
 
+  protected def testORCTableLocation(isConvertMetastore: Boolean): Unit = {
--- End diff --

ok, I moved.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-05 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r239266747
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -190,4 +190,12 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
   }
 }
   }
+
+  test("SPARK-25993 Add test cases for resolution of ORC table location") {
--- End diff --

Changed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-05 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r239266791
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -2370,4 +2370,51 @@ class HiveDDLSuite
   ))
 }
   }
+
+  test("SPARK-25993 Add test cases for resolution of Parquet table 
location") {
+withTempPath { path =>
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).toDF("c1", 
"c2", "c3").repartition(1)
+withTable("tbl1", "tbl2", "tbl3") {
+val dataDir = s"${path.getCanonicalPath}/l3/l2/l1/"
--- End diff --

fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-05 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r239266720
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -2370,4 +2370,51 @@ class HiveDDLSuite
   ))
 }
   }
+
+  test("SPARK-25993 Add test cases for resolution of Parquet table 
location") {
--- End diff --

sure, changed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-05 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r239266816
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -2370,4 +2370,51 @@ class HiveDDLSuite
   ))
 }
   }
+
+  test("SPARK-25993 Add test cases for resolution of Parquet table 
location") {
+withTempPath { path =>
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).toDF("c1", 
"c2", "c3").repartition(1)
+withTable("tbl1", "tbl2", "tbl3") {
+val dataDir = s"${path.getCanonicalPath}/l3/l2/l1/"
+val parentDir = s"${path.getCanonicalPath}/l3/l2/"
+val l3Dir = s"${path.getCanonicalPath}/l3/"
+val wildcardParentDir = new File(s"${path}/l3/l2/*").toURI
+val wildcardL3Dir = new File(s"${path}/l3/*").toURI
+someDF1.write.parquet(dataDir)
+val parentDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl1(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${parentDir}'""".stripMargin
+sql(parentDirStatement)
+checkAnswer(sql("select * from tbl1"), Nil)
+
+val wildcardStatement =
+  s"""
+ |CREATE EXTERNAL TABLE tbl2(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS parquet
+ |LOCATION '${wildcardParentDir}'""".stripMargin
+sql(wildcardStatement)
+checkAnswer(sql("select * from tbl2"),
+  (1 to 2).map(i => Row(i, i, s"parq$i")))
+
+val wildcardL3Statement =
+s"""
--- End diff --

fixed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-05 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r239266775
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -2370,4 +2370,51 @@ class HiveDDLSuite
   ))
 }
   }
+
+  test("SPARK-25993 Add test cases for resolution of Parquet table 
location") {
+withTempPath { path =>
+val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).toDF("c1", 
"c2", "c3").repartition(1)
--- End diff --

fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-05 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r239266673
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
 ---
@@ -186,6 +186,54 @@ abstract class OrcSuite extends OrcTest with 
BeforeAndAfterAll {
 }
   }
 
+  protected def testORCTableLocation(isConvertMetastore: Boolean): Unit = {
+val tableName1 = "spark_orc1"
+val tableName2 = "spark_orc2"
+
+withTempDir { dir =>
+  val someDF1 = Seq((1, 1, "orc1"), (2, 2, "orc2")).toDF("c1", "c2", 
"c3").repartition(1)
+  withTable(tableName1, tableName2) {
+val dataDir = s"${dir.getCanonicalPath}/dir1/"
+val parentDir = s"${dir.getCanonicalPath}/"
+val wildCardDir = new File(s"${dir}/*").toURI
+someDF1.write.orc(dataDir)
+val parentDirStatement =
+  s"""
+ |CREATE EXTERNAL TABLE $tableName1(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS orc
+ |LOCATION '${parentDir}'""".stripMargin
+sql(parentDirStatement)
+val parentDirSqlStatement = s"select * from ${tableName1}"
+if (isConvertMetastore) {
+  checkAnswer(sql(parentDirSqlStatement), Nil)
+} else {
+ checkAnswer(sql(parentDirSqlStatement),
+   (1 to 2).map(i => Row(i, i, s"orc$i")))
+}
+
+val wildCardStatement =
+  s"""
+ |CREATE EXTERNAL TABLE $tableName2(
+ |  c1 int,
+ |  c2 int,
+ |  c3 string)
+ |STORED AS orc
+ |LOCATION '$wildCardDir'""".stripMargin
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-05 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r239272037
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty)
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isDefined) s"context 
${kubeContext.getOrElse("?")}" else "current context"}" +
--- End diff --

I think using `kubeContext.map("context " + _).getOrElse("current 
context")` would make this cleaner.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23108
  
**[Test build #99746 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99746/testReport)**
 for PR 23108 at commit 
[`51d1d78`](https://github.com/apache/spark/commit/51d1d78d1e1c4f56f5f07dc18bc9fcbe9a00fbbf).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99746/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99742/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23241
  
**[Test build #99742 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99742/testReport)**
 for PR 23241 at commit 
[`7604eab`](https://github.com/apache/spark/commit/7604eab9be72a857e01e24f19ce38db85509a4a3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23228: [MINOR][DOC]The condition description of serializ...

2018-12-05 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23228#discussion_r239215561
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala ---
@@ -33,10 +33,10 @@ import org.apache.spark.shuffle._
  * Sort-based shuffle has two different write paths for producing its map 
output files:
  *
  *  - Serialized sorting: used when all three of the following conditions 
hold:
- *1. The shuffle dependency specifies no aggregation or output 
ordering.
+ *1. The shuffle dependency specifies no map-side combine.
--- End diff --

Does this sound right @JoshRosen ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239258564
  
--- Diff: docs/ml-clustering.md ---
@@ -265,3 +265,44 @@ Refer to the [R API 
docs](api/R/spark.gaussianMixture.html) for more details.
 
 
 
+
+## Power Iteration Clustering (PIC)
+
+Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
+developed by http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and 
Cohen.
--- End diff --

Actually, I built this PR on my Mac, and found that the hyperlink is not 
generated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23240
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99739/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22612
  
**[Test build #4454 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4454/testReport)**
 for PR 22612 at commit 
[`0a7402e`](https://github.com/apache/spark/commit/0a7402e92e87aef13d2f91043083b92abfa80233).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23240
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239258366
  
--- Diff: docs/ml-clustering.md ---
@@ -265,3 +265,44 @@ Refer to the [R API 
docs](api/R/spark.gaussianMixture.html) for more details.
 
 
 
+
+## Power Iteration Clustering (PIC)
+
+Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
+developed by http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and 
Cohen.
--- End diff --

You need to build from Spark repository because Jekyll handles it 
differently from GitHub. Please try to build in `docs` directory. There is 
`README.md` for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-12-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r239264207
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -95,9 +77,116 @@ case class CreateHiveTableAsSelectCommand(
 Seq.empty[Row]
   }
 
+  // Returns `DataWritingCommand` used to write data when the table exists.
+  def writingCommandForExistingTable(
+catalog: SessionCatalog,
+tableDesc: CatalogTable): DataWritingCommand
+
+  // Returns `DataWritingCommand` used to write data when the table 
doesn't exist.
+  def writingCommandForNewTable(
+catalog: SessionCatalog,
+tableDesc: CatalogTable): DataWritingCommand
+
   override def argString: String = {
 s"[Database:${tableDesc.database}, " +
 s"TableName: ${tableDesc.identifier.table}, " +
 s"InsertIntoHiveTable]"
   }
 }
+
+/**
+ * Create table and insert the query result into it.
+ *
+ * @param tableDesc the table description, which may contain serde, 
storage handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class CreateHiveTableAsSelectCommand(
+tableDesc: CatalogTable,
+query: LogicalPlan,
+outputColumnNames: Seq[String],
+mode: SaveMode)
+  extends CreateHiveTableAsSelectBase {
+
+  override def writingCommandForExistingTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+InsertIntoHiveTable(
+  tableDesc,
+  Map.empty,
--- End diff --

Why this is empty? Do we have a test case for partitioning tables? 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23108
  
**[Test build #99746 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99746/testReport)**
 for PR 23108 at commit 
[`51d1d78`](https://github.com/apache/spark/commit/51d1d78d1e1c4f56f5f07dc18bc9fcbe9a00fbbf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5786/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22904
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5785/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22904
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5785/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22904
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...

2018-12-05 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239218209
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Yeah, so this new "partial file" method can call the existing method by 
default, and do something different for zstd. Then this one call site can ask 
for the 'partial file' stream. Some comments about the difference here would be 
helpful.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99736/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...

2018-12-05 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21465#discussion_r239245388
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -705,12 +710,59 @@ def getNumTrees(self):
 return self.getOrDefault(self.numTrees)
 
 
-class GBTParams(TreeEnsembleParams):
+class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, 
HasValidationIndicatorCol):
 """
 Private class to track supported GBT params.
 """
+
+stepSize = Param(Params._dummy(), "stepSize",
+ "Step size (a.k.a. learning rate) in interval (0, 1] 
for shrinking " +
+ "the contribution of each estimator.",
+ typeConverter=TypeConverters.toFloat)
+
+validationTol = Param(Params._dummy(), "validationTol",
+  "Threshold for stopping early when fit with 
validation is used. " +
+  "If the error rate on the validation input 
changes by less than the " +
+  "validationTol, then learning will stop early 
(before `maxIter`). " +
+  "This parameter is ignored when fit without 
validation is used.",
+  typeConverter=TypeConverters.toFloat)
+
+@since("3.0.0")
+def setValidationTol(self, value):
+"""
+Sets the value of :py:attr:`validationTol`.
+"""
+return self._set(validationTol=value)
+
+@since("3.0.0")
+def getValidationTol(self):
+"""
+Gets the value of validationTol or its default value.
+"""
+return self.getOrDefault(self.validationTol)
+
+
+class GBTRegressorParams(GBTParams, TreeRegressorParams):
+"""
+Private class to track supported GBTRegressor params.
+
+.. versionadded:: 3.0.0
+"""
+
 supportedLossTypes = ["squared", "absolute"]
 
+lossType = Param(Params._dummy(), "lossType",
+ "Loss function which GBT tries to minimize 
(case-insensitive). " +
+ "Supported options: " + ", ".join(supportedLossTypes),
+ typeConverter=TypeConverters.toString)
+
+@since("1.4.0")
+def setLossType(self, value):
--- End diff --

`setLossType` should be in the estimator and `getLossType` should be here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...

2018-12-05 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/23236
  
True, the test is not that long under light resources. Locally, I saw a 
couple seconds difference with the changes I mentioned. The weird thing is the 
unmodified test completes after the 11th batch with errors
```
['0.67', '0.71', '0.78', '0.7', '0.75', '0.74', '0.73', '0.69', '0.62', 
'0.71', '0.31']
```
Compared to the error values from the test failures above, they match up 
until the 10th batch but then these continue until the 16th where it has a 
timeout
```
0.67, 0.71, 0.78, 0.7, 0.75, 0.74, 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 
0.72, 0.77, 0.71, 0.74
```
I would expect the seed to produce the same values (or all diff if the 
random func is different), which makes me think something else is going on..

Anyway, I think it's fine to increase the timeout and if it's still flaky, 
we can look at making changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239216282
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Thanks @srowen for the review.
Since, the CompressionCodec class used by many classes, we need to see any 
use case for, whether to read open frame for zstd case. As far as the 
eventLoggingListener class is concerned, it needs the open frame data also. So, 
I tried to change as minimal as possible without impacting the other calls.


> I think that if we introduce a new method we might try to make it a 
little more general, like: compressedInputStreamForPartialFile or something. It 
would be good to avoid the isInstanceOf below.

Yeah. This is a cleaner solution. Thanks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239224970
  
--- Diff: docs/ml-clustering.md ---
@@ -265,3 +265,44 @@ Refer to the [R API 
docs](api/R/spark.gaussianMixture.html) for more details.
 
 
 
+
+## Power Iteration Clustering (PIC)
+
+Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
+developed by http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and 
Cohen.
+From the abstract: PIC finds a very low-dimensional embedding of a dataset
+using truncated power iteration on a normalized pair-wise similarity 
matrix of the data.
+
+`spark.ml`'s PowerIterationClustering implementation takes the following 
parameters:
+
+* `k`: the number of clusters to create
+* `initMode`: param for the initialization algorithm
+* `maxIter`: param for maximum number of iterations
+* `srcCol`: param for the name of the input column for source vertex IDs
+* `dstCol`: name of the input column for destination vertex IDs
+* `weightCol`: Param for weight column name
+
+**Examples**
+
+
+
+
+Refer to the [Scala API 
docs](api/scala/index.html#org.apache.spark.ml.clustering.PowerIterationClustering)
 for more details.
+
+{% include_example 
scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala %}
+
+
+
+Refer to the [Java API 
docs](api/java/org/apache/spark/ml/clustering/PowerIterationClustering.html) 
for more details.
+
+{% include_example 
java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java %}
+
+
+
--- End diff --

It seems that `Python` is missed here. Could you check and add it?
cc @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239232597
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

@srowen I updated the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-12-05 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/22905
  
I think I've made my case for this patch as best I can. It does not appear 
this PR has unanimous support, but I continue to believe we should merge it to 
master. So where do we take it from here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99740/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23240: [SPARK-26281][WebUI] Duration column of task tabl...

2018-12-05 Thread gengliangwang
Github user gengliangwang closed the pull request at:

https://github.com/apache/spark/pull/23240


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/23240
  
Oh,  I see. Close this one now.
Please change the title in #23160


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23207
  
**[Test build #99736 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99736/testReport)**
 for PR 23207 at commit 
[`76d1ca0`](https://github.com/apache/spark/commit/76d1ca0036bbb50a005e9d12f8b22bf21697af7f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23239
  
**[Test build #99737 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99737/testReport)**
 for PR 23239 at commit 
[`797ade3`](https://github.com/apache/spark/commit/797ade3eb175c41866efbffa3cb4c30f90e49ca7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23240: [SPARK-26281][WebUI] Duration column of task table shoul...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23240
  
Thanks. I will update the PR title


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when enco...

2018-12-05 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/23232
  
Done, thanks @dongjoon-hyun


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22275
  
**[Test build #99743 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99743/testReport)**
 for PR 22275 at commit 
[`00c7b8c`](https://github.com/apache/spark/commit/00c7b8cd8a724ccc34af5fc59fb965a43ae6cbe5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23156: [SPARK-24063][SS] Add maximum epoch queue threshold for ...

2018-12-05 Thread arunmahadevan
Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/23156
  
Rather than controlling the queue sizes it would be better to limit the max 
epoch backlog and fail the query once that threshold is reached.  There already 
seems to be patch that attempted to address this 
https://github.com/apache/spark/pull/21392


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23233: [SPARK-26233][SQL][BACKPORT-2.3] CheckOverflow wh...

2018-12-05 Thread mgaido91
Github user mgaido91 closed the pull request at:

https://github.com/apache/spark/pull/23233


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23234: [SPARK-26233][SQL][BACKPORT-2.2] CheckOverflow wh...

2018-12-05 Thread mgaido91
Github user mgaido91 closed the pull request at:

https://github.com/apache/spark/pull/23234


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow wh...

2018-12-05 Thread mgaido91
Github user mgaido91 closed the pull request at:

https://github.com/apache/spark/pull/23232


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r239259444
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -610,3 +616,58 @@ setMethod("write.ml", signature(object = "LDAModel", 
path = "character"),
   function(object, path, overwrite = FALSE) {
 write_internal(object, path, overwrite)
   })
+
+#' PowerIterationClustering
+#'
+#' A scalable graph clustering algorithm. Users can call 
\code{spark.assignClusters} to
+#' return a cluster assignment for each input vertex.
+#'
+#  Run the PIC algorithm and returns a cluster assignment for each input 
vertex.
+#' @param data A SparkDataFrame.
+#' @param k The number of clusters to create.
+#' @param initMode Param for the initialization algorithm.
+#' @param maxIter Param for maximum number of iterations.
+#' @param sourceCol Param for the name of the input column for source 
vertex IDs.
+#' @param destinationCol Name of the input column for destination vertex 
IDs.
+#' @param weightCol Param for weight column name. If this is not set or 
\code{NULL},
+#'  we treat all instance weights as 1.0.
+#' @param ... additional argument(s) passed to the method.
+#' @return A dataset that contains columns of vertex id and the 
corresponding cluster for the id.
+#' The schema of it will be:
+#' \code{id: Long}
+#' \code{cluster: Int}
+#' @rdname spark.powerIterationClustering
+#' @aliases 
assignClusters,PowerIterationClustering-method,SparkDataFrame-method
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
+#'   list(1L, 2L, 1.0), list(3L, 4L, 1.0),
+#'   list(4L, 0L, 0.1)), schema = c("src", "dst", 
"weight"))
+#' clusters <- spark.assignClusters(df, initMode="degree", 
weightCol="weight")
+#' showDF(clusters)
+#' }
+#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
+setMethod("spark.assignClusters",
+  signature(data = "SparkDataFrame"),
+  function(data, k = 2L, initMode = c("random", "degree"), maxIter 
= 20L,
+sourceCol = "src", destinationCol = "dst", weightCol = NULL) {
+if (!is.numeric(k) || k < 1) {
+  stop("k should be a number with value >= 1.")
+}
+if (!is.integer(maxIter) || maxIter <= 0) {
+  stop("maxIter should be a number with value > 0.")
+}
--- End diff --

I mean the `data` SparkDataFrame's column types, if possible. If you remove 
'L' from '0L' in your example dataset, you can see the failure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21465: [SPARK-24333][ML][PYTHON]Add fit with validation set to ...

2018-12-05 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/21465
  
@BryanCutler Thank you very much for your review! I will submit changes 
soon. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22904: [SPARK-25887][K8S] Configurable K8S context support

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22904
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5785/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23241
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23241: [SPARK-26283][CORE]When zstd compression enabled, Inprog...

2018-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23241
  
**[Test build #99742 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99742/testReport)**
 for PR 23241 at commit 
[`7604eab`](https://github.com/apache/spark/commit/7604eab9be72a857e01e24f19ce38db85509a4a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23241: [SPARK-26283][CORE]When zstd compression enabled,...

2018-12-05 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/23241#discussion_r239219205
  
--- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala ---
@@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends 
CompressionCodec {
 // avoid overhead excessive of JNI call while trying to uncompress 
small amount of data.
 new BufferedInputStream(new ZstdInputStream(s), bufferSize)
   }
+
+  override def zstdEventLogCompressedInputStream(s: InputStream): 
InputStream = {
+new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), 
bufferSize)
--- End diff --

Thanks. I will update the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   >