[GitHub] spark pull request #20241: [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder p...

2018-01-11 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/20241#discussion_r161116909 --- Diff: python/pyspark/ml/feature.py --- @@ -1577,6 +1577,8 @@ class OneHotEncoder(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable,

[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

2018-01-11 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/20239 @BryanCutler Yes there is no error currently. This should make the code cleaner though. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20241: [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python A...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20241 **[Test build #86004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86004/testReport)** for PR 20241 at commit

[GitHub] spark issue #20236: [SPARK-23044] Error handling for jira assignment

2018-01-11 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20236 @squito thanks for the fix. I also don't have PRs to verify the changes, but I think catching exception should be enough. ---

[GitHub] spark pull request #20241: [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder p...

2018-01-11 Thread WeichenXu123
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/20241 [SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated ## What changes were proposed in this pull request? mark OneHotEncoder python API deprecated ## How was

[GitHub] spark pull request #20237: [SPARK-22980][PYTHON][SQL] Clarify the length of ...

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20237#discussion_r161115654 --- Diff: python/pyspark/sql/functions.py --- @@ -2184,6 +2184,11 @@ def pandas_udf(f=None, returnType=None, functionType=None): |

[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20222 I think we should fix: https://github.com/apache/spark/blob/7c7bc8fc0ff85fe70968b47433bb7757326a6b12/dev/run-tests-jenkins.py#L183-L185 too. cc @JoshRosen, since you

[GitHub] spark pull request #20237: [SPARK-22980][PYTHON][SQL] Clarify the length of ...

2018-01-11 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20237#discussion_r161114987 --- Diff: python/pyspark/sql/functions.py --- @@ -2184,6 +2184,11 @@ def pandas_udf(f=None, returnType=None, functionType=None): |

[GitHub] spark pull request #20237: [SPARK-22980][PYTHON][SQL] Clarify the length of ...

2018-01-11 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20237#discussion_r161114750 --- Diff: python/pyspark/sql/functions.py --- @@ -2184,6 +2184,11 @@ def pandas_udf(f=None, returnType=None, functionType=None): |

[GitHub] spark issue #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with returnType=S...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20163 **[Test build #86003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86003/testReport)** for PR 20163 at commit

[GitHub] spark pull request #20232: [SPARK-23042][ML] Use OneHotEncoderModel to encod...

2018-01-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20232#discussion_r161113685 --- Diff: R/pkg/tests/fulltests/test_mllib_classification.R --- @@ -382,10 +382,10 @@ test_that("spark.mlp", { trainidxs <- base::sample(nrow(data),

[GitHub] spark issue #20232: [SPARK-23042][ML] Use OneHotEncoderModel to encode label...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20232 **[Test build #86002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86002/testReport)** for PR 20232 at commit

[GitHub] spark issue #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with returnType=S...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20163 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with returnType=S...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20163 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86000/ Test FAILed. ---

[GitHub] spark pull request #20209: [SPARK-23008][ML] OnehotEncoderEstimator python A...

2018-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20209 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with returnType=S...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20163 **[Test build #86000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86000/testReport)** for PR 20163 at commit

[GitHub] spark issue #20209: [SPARK-23008][ML] OnehotEncoderEstimator python API

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20209 **[Test build #86001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86001/testReport)** for PR 20209 at commit

[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20224 > Would you (@kiszk and @maropu ) agree that at least having both (1) and (2) is a good idea? Without (3), is this still useful if we only have (1) and (2)? It may not much useful if only

[GitHub] spark issue #20209: [SPARK-23008][ML] OnehotEncoderEstimator python API

2018-01-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20209 Thanks @WeichenXu123 ! LGTM As I just mentioned in the JIRA, I'm going to backport this to branch-2.3 for release with 2.3.0 since this is arguably a bug fix for ML in streaming. ---

[GitHub] spark issue #20163: [SPARK-22966][PYTHON][SQL] Python UDFs with returnType=S...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20163 **[Test build #86000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86000/testReport)** for PR 20163 at commit

[GitHub] spark pull request #20209: [SPARK-23008][ML] OnehotEncoderEstimator python A...

2018-01-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/20209#discussion_r161110909 --- Diff: python/pyspark/ml/feature.py --- @@ -1641,6 +1642,118 @@ def getDropLast(self): return self.getOrDefault(self.dropLast)

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20225 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85990/ Test FAILed. ---

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20225 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20225 **[Test build #85990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85990/testReport)** for PR 20225 at commit

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20072 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85988/ Test FAILed. ---

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20072 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20209: [SPARK-23008][ML] OnehotEncoderEstimator python API

2018-01-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20209 I'll review this now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20072 **[Test build #85988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85988/testReport)** for PR 20072 at commit

[GitHub] spark issue #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` ...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20240 **[Test build #85999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85999/testReport)** for PR 20240 at commit

[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20239 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85989/ Test PASSed. ---

[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20239 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20239 **[Test build #85989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85989/testReport)** for PR 20239 at commit

[GitHub] spark pull request #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorrupt...

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20240#discussion_r161108103 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala --- @@ -608,4 +609,33 @@ class OrcQuerySuite

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20225 **[Test build #85997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85997/testReport)** for PR 20225 at commit

[GitHub] spark issue #20189: [SPARK-22975][SS] MetricsReporter should not throw excep...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20189 **[Test build #85998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85998/testReport)** for PR 20189 at commit

[GitHub] spark issue #20189: [SPARK-22975][SS] MetricsReporter should not throw excep...

2018-01-11 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20189 LGTM pending tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorrupt...

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20240#discussion_r161107677 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala --- @@ -50,23 +50,35 @@ object OrcUtils extends

[GitHub] spark pull request #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorrupt...

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20240#discussion_r161107654 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala --- @@ -608,4 +609,33 @@ class OrcQuerySuite

[GitHub] spark pull request #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorrupt...

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20240#discussion_r161107632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala --- @@ -50,23 +50,35 @@ object OrcUtils extends

[GitHub] spark pull request #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorrupt...

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20240#discussion_r161107320 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala --- @@ -608,4 +609,33 @@ class OrcQuerySuite

[GitHub] spark pull request #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorrupt...

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20240#discussion_r161107216 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala --- @@ -608,4 +609,33 @@ class OrcQuerySuite

[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20223 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20223 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85986/ Test PASSed. ---

[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20223 **[Test build #85986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85986/testReport)** for PR 20223 at commit

[GitHub] spark issue #20224: [SPARK-23032][SQL] Add a per-query codegenStageId to Who...

2018-01-11 Thread rednaxelafx
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20224 Thanks for your comments and questions, @kiszk and @maropu ! Let me address them in a couple of separate points. **tl;dr** On top of my original proposal in the PR

[GitHub] spark pull request #20225: [SPARK-23033] Don't use task level retry for cont...

2018-01-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/20225#discussion_r161101942 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -258,13 +276,9 @@ class ContinuousStressSuite

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20225 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20225 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85987/ Test PASSed. ---

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20225 **[Test build #85987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85987/testReport)** for PR 20225 at commit

[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20222 **[Test build #85996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85996/testReport)** for PR 20222 at commit

[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-11 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20222 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...

2018-01-11 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20138#discussion_r161099778 --- Diff: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala --- @@ -663,6 +665,95 @@ class FsHistoryProviderSuite extends

[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...

2018-01-11 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20138 I was actually suggesting have the "aggressive" option default turned on, and I'm also fine to not have that config at all. Will take a closer look at this later, thanks you for ping me @squito

[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...

2018-01-11 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20138#discussion_r161099310 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -834,6 +906,9 @@ private[history] case class

[GitHub] spark issue #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` ...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20240 **[Test build #85995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85995/testReport)** for PR 20240 at commit

[GitHub] spark pull request #20225: [SPARK-23033] Don't use task level retry for cont...

2018-01-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20225#discussion_r161098563 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDDIter.scala --- @@ -52,6 +52,10 @@ class

[GitHub] spark pull request #20240: [SPARK-23049][SQL] `spark.sql.files.ignoreCorrupt...

2018-01-11 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/20240 [SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` should work for ORC files ## What changes were proposed in this pull request? When `spark.sql.files.ignoreCorruptFiles=true`,

[GitHub] spark pull request #20225: [SPARK-23033] Don't use task level retry for cont...

2018-01-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20225#discussion_r161098483 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -219,6 +201,42 @@ class ContinuousSuite extends

[GitHub] spark pull request #20225: [SPARK-23033] Don't use task level retry for cont...

2018-01-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20225#discussion_r161098531 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTaskRetryException.scala --- @@ -0,0 +1,23 @@ +/* + *

[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...

2018-01-11 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20138#discussion_r161098356 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -544,73 +621,75 @@ private[history] class

[GitHub] spark pull request #20225: [SPARK-23033] Don't use task level retry for cont...

2018-01-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20225#discussion_r161098109 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -280,6 +294,7 @@ class ContinuousStressSuite extends

[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...

2018-01-11 Thread ajbozarth
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/20138 Ok, no problems here on that front then. If I have time later to do a proper review and this has't been merged yet I'll take better a look at the whole PR ---

[GitHub] spark pull request #20225: [SPARK-23033] Don't use task level retry for cont...

2018-01-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20225#discussion_r161098063 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -258,13 +276,9 @@ class ContinuousStressSuite

[GitHub] spark pull request #20225: [SPARK-23033] Don't use task level retry for cont...

2018-01-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20225#discussion_r161097981 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -219,6 +201,42 @@ class ContinuousSuite extends

[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...

2018-01-11 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/20138 well, perhaps I mis-represented this -- you still need to turn the event log cleaning on explicitly with the old option, "spark.history.fs.cleaner.enabled". This just doesn't include the

[GitHub] spark issue #20138: [SPARK-20664][core] Delete stale application data from S...

2018-01-11 Thread ajbozarth
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/20138 I haven't had a chance to read though your code, but as @squito said, I am against any default feature that deletes files from the eventLog dir. Many users, such as myself, use one log dir for

[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...

2018-01-11 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20138#discussion_r161095561 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -405,49 +404,70 @@ private[history] class

[GitHub] spark issue #20204: [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage genera...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20204 **[Test build #85994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85994/testReport)** for PR 20204 at commit

[GitHub] spark issue #20204: [SPARK-7721][PYTHON][TESTS] Adds PySpark coverage genera...

2018-01-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20204 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20138: [SPARK-20664][core] Delete stale application data...

2018-01-11 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/20138#discussion_r161082423 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -405,49 +404,70 @@ private[history] class

[GitHub] spark issue #20189: [SPARK-22975][SS] MetricsReporter should not throw excep...

2018-01-11 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20189 @mgaido91 Ah, sorry. There is a race condition in this test since the query may make progress when we are checking the results. Please revert to your previous test. Thanks! ---

[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20223 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20225 **[Test build #85993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85993/testReport)** for PR 20225 at commit

[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20223 **[Test build #85984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85984/testReport)** for PR 20223 at commit

[GitHub] spark issue #20223: [SPARK-23020][core] Fix races in launcher code, test.

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20223 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85984/ Test PASSed. ---

[GitHub] spark pull request #20238: [SPARK-23046][ML][SparkR] Have RFormula include V...

2018-01-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20238 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2018-01-11 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/19001 Now that https://github.com/apache/spark/pull/19080 has been merged to trunk, I am rebasing this PR. A small part of this PR is put in https://github.com/apache/spark/pull/20206 and ready for

[GitHub] spark issue #20238: [SPARK-23046][ML][SparkR] Have RFormula include VectorSi...

2018-01-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20238 LGTM I'm going to merge this with master and backport it to branch-2.3 since this fixes 1 of 2 bugs in RFormulaModel for use with Structured Streaming. Thanks @MrBago ! ---

[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20222 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85980/ Test FAILed. ---

[GitHub] spark pull request #20211: [SPARK-23011][PYTHON][SQL] Prepend missing groupi...

2018-01-11 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/20211#discussion_r161086349 --- Diff: python/pyspark/sql/group.py --- @@ -233,6 +233,27 @@ def apply(self, udf): | 2| 1.1094003924504583|

[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20222 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20222: [SPARK-23028] Bump master branch version to 2.4.0-SNAPSH...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20222 **[Test build #85980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85980/testReport)** for PR 20222 at commit

[GitHub] spark pull request #20211: [SPARK-23011][PYTHON][SQL] Prepend missing groupi...

2018-01-11 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/20211#discussion_r161085442 --- Diff: python/pyspark/sql/group.py --- @@ -233,6 +233,27 @@ def apply(self, udf): | 2| 1.1094003924504583|

[GitHub] spark issue #20238: [SPARK-23046][ML][SparkR] Have RFormula include VectorSi...

2018-01-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20238 Reviewing now --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20231: [SPARK-23000][TEST-HADOOP2.6] Fix Flaky test suite DataS...

2018-01-11 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/20231 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20231: [SPARK-23000][TEST-HADOOP2.6] Fix Flaky test suite DataS...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20231 **[Test build #85992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85992/testReport)** for PR 20231 at commit

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/20146 @viirya Discuss with @jkbradley offline, we're now busy fixing some issues (e.g. #20238) in ML structured streaming support, it looks bad after the code freeze, and we may not be able to

[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85985/ Test PASSed. ---

[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19054 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19054: [SPARK-18067] Avoid shuffling child if join keys are sup...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19054 **[Test build #85985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85985/testReport)** for PR 19054 at commit

[GitHub] spark issue #20225: [SPARK-23033] Don't use task level retry for continuous ...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20225 **[Test build #85990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85990/testReport)** for PR 20225 at commit

[GitHub] spark issue #20203: [SPARK-22577] [core] executor page blacklist status shou...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20203 **[Test build #85991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85991/testReport)** for PR 20203 at commit

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85983/ Test PASSed. ---

[GitHub] spark pull request #20225: [SPARK-23033] Don't use task level retry for cont...

2018-01-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/20225#discussion_r161077122 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala --- @@ -219,6 +201,44 @@ class ContinuousSuite extends

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20208 **[Test build #85983 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85983/testReport)** for PR 20208 at commit

[GitHub] spark pull request #20167: Allow providing Mesos principal & secret via file...

2018-01-11 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20167#discussion_r161075477 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -80,10 +80,27 @@ trait

[GitHub] spark pull request #20167: Allow providing Mesos principal & secret via file...

2018-01-11 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20167#discussion_r161074854 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -80,10 +80,27 @@ trait

[GitHub] spark pull request #20167: Allow providing Mesos principal & secret via file...

2018-01-11 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20167#discussion_r161075359 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -80,10 +80,27 @@ trait

[GitHub] spark pull request #20167: Allow providing Mesos principal & secret via file...

2018-01-11 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20167#discussion_r161075101 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -80,10 +80,27 @@ trait

<    1   2   3   4   5   6   >