[GitHub] spark issue #23063: [SPARK-26033][PYTHON][TESTS] Break large ml/tests.py fil...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23063 Let me leave a cc for @holdenk, @MLnick, @jkbradley and @mengxr FYI. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #23034: [SPARK-26035][PYTHON] Break large streaming/tests.py fil...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 @zsxwing sure. Sorry that I rushed. Will do next time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 I think now it should be good timing to match the behaviours. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23056 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23056 @BryanCutler, let me merge this. Let's do the ML one and then clean up both comments throughout ML and MLlib at once

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 related another try https://github.com/apache/spark/pull/13252 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 One try to add some tests for reading/writing empty dataframes was here https://github.com/apache/spark/pull/13253 fyi

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 Which should be ... this https://github.com/apache/spark/pull/12855 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #23054: [SPARK-26085][SQL] Key attribute of primitive type under...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23054 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...

2018-11-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23056 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23056 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22309 adding @liancheng BTW. IIRC, he took a look for this one before and abandoned the change (fix me if I'm wrongly remembering

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234086569 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark pull request #23055: [SPARK-26080][PYTHON] Disable 'spark.executor.pys...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23055#discussion_r234081475 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner

[GitHub] spark issue #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py ...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23056 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23056#discussion_r234080468 --- Diff: python/pyspark/mllib/tests/test_linalg.py --- @@ -0,0 +1,642 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request #23056: [SPARK-26034][PYTHON][TESTS] Break large mllib/te...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23056#discussion_r234080249 --- Diff: python/pyspark/testing/mllibutils.py --- @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234073703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec

[GitHub] spark issue #23055: [SPARK-26080][SQL] Disable 'spark.executor.pyspark.memor...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23055 cc @rdblue, @vanzin and @haydenjeune --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #23055: [SPARK-26080][SQL] Disable 'spark.executor.pyspar...

2018-11-15 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/23055 [SPARK-26080][SQL] Disable 'spark.executor.pyspark.memory' always on Windows ## What changes were proposed in this pull request? `resource` package is a Unit specific package. See

[GitHub] spark issue #23034: [SPARK-26035][PYTHON] Break large streaming/tests.py fil...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 Thank you @BryanCutler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23046: [SPARK-23207][SQL][FOLLOW-UP] Use `SQLConf.get.en...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23046#discussion_r234063905 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -280,7 +280,7 @@ object ShuffleExchangeExec

[GitHub] spark pull request #23052: [SPARK-26081][SQL] Prevent empty files for empty ...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23052#discussion_r234062564 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -174,13 +174,18 @@ private[csv] class

[GitHub] spark issue #23052: [SPARK-26081][SQL] Prevent empty files for empty partiti...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23052 @MaxGekk, actually this is kind of important behaviour change. This basically means we're unable to read the empty files back. Similar changes were proposed in Parquet few years ago (by me

[GitHub] spark issue #23047: [BACKPORT][SPARK-25883][SQL][MINOR] Override method `pre...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23047 Merged to branch-2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23034: [SPARK-26035][PYTHON] Break large streaming/tests.py fil...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 Also, @BryanCutler, I think we can talk about locations of `testing/...util.py` later when we finished to split the tests. Moving utils would probably cause less conflicts and should be good

[GitHub] spark issue #23034: [SPARK-26035][PYTHON] Break large streaming/tests.py fil...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23034: [SPARK-26035][PYTHON] Break large streaming/tests.py fil...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 @BryanCutler, should be ready to work on ML and MLlib as well. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #23034: [SPARK-26035][PYTHON] Break large streaming/tests...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23034#discussion_r233829511 --- Diff: python/pyspark/testing/streamingutils.py --- @@ -0,0 +1,189 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark issue #23034: [WIP][SPARK-26035][PYTHON] Break large streaming/tests.p...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 Will go and merge this tomorrow if there's no outstanding issues. cc @zsxwing and @tdas. --- - To unsubscribe, e

[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23012 Merged to master. Thanks @felixcheung. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #23034: [WIP][SPARK-26035][PYTHON] Break large streaming/tests.p...

2018-11-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23033: [SPARK-26036][PYTHON] Break large tests.py files into sm...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23033 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23033: [SPARK-26036][PYTHON] Break large tests.py files into sm...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23033 I am merging this for the same reason with #23021. Let me know if there's any concern even after this got merged

[GitHub] spark issue #23033: [SPARK-26036][PYTHON] Break large tests.py files into sm...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23033 @BryanCutler, looks we should add `pyspark.ml.tests` at https://github.com/apache/spark/blob/master/python/run-tests.py#L252-L253 so that we can run unittests first over doc tests (because

[GitHub] spark pull request #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hi...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20788#discussion_r233678942 --- Diff: python/pyspark/sql/tests/test_dataframe.py --- @@ -375,6 +375,19 @@ def test_generic_hints(self): plan = df1.join(df2.hint

[GitHub] spark issue #20788: [SPARK-23647][PYTHON][SQL] Adds more types for hint in p...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20788 Thanks. @DylanGuedes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23014 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23039: [SPARK-26066][SQL] Moving truncatedString to sql/catalys...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23039 @MaxGekk, I think the main purpose of this PR is rather to introduce `spark.sql.debug.maxToStringFields` .. let's fix PR description and title

[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23012 Yup, will address the other comments and update the PR accordingly. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #23034: [WIP][SPARK-26035][PYTHON] Break large streaming/tests.p...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23033: [SPARK-26036][PYTHON] Break large tests.py files into sm...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23033 Yup will do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #23034: [WIP][SPARK-26035][PYTHON] Break large streaming/tests.p...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23034 I haven't tested the kinesis logic yet. I will check it via Jenkins. Line counts: ``` 751 ./test_dstream.py 89 ./test_kinesis.py 158

[GitHub] spark pull request #23034: [WIP][SPARK-26035][PYTHON] Break large streaming/...

2018-11-14 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/23034 [WIP][SPARK-26035][PYTHON] Break large streaming/tests.py files into smaller files ## What changes were proposed in this pull request? This PR continues to break down a big large file

[GitHub] spark issue #21914: [SPARK-24967][SQL] Avro: Use internal.Logging instead fo...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21914 Please ask that to the mailing list. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23033: [SPARK-26036][PYTHON] Break large tests.py files into sm...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23033 Rough line distributions look like this: ``` 237 ./test_serializers.py 739 ./test_rdd.py 499 ./test_readwrite.py 69 ./test_join.py 161

[GitHub] spark issue #23033: [SPARK-26036][PYTHON] Break large tests.py files into sm...

2018-11-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23033 cc'ing @BryanCutler and @squito. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #23033: [SPARK-26036][PYTHON] Break large tests.py files ...

2018-11-14 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/23033 [SPARK-26036][PYTHON] Break large tests.py files into smaller files ## What changes were proposed in this pull request? This PR continues to break down a big large file into smaller

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 I am merging this in - maybe I am rushing it but please allow me to go ahead since it's going to block other PySpark PRs. At worst case, I am willing to revert and propose this again

[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23012 Ah .. right makes sense to me. Thanks @shaneknapp. +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 adding @holdenk, @ueshin and @icexelloss as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 adding @icexelloss as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 > Did you test on python3 as well? Of course! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.

[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r233292436 --- Diff: R/pkg/R/SQLContext.R --- @@ -189,19 +238,67 @@ createDataFrame <- function(data, schema = NULL, samplingRatio = 1.0,

[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23012 @shaneknapp, do you roughly know how difficult it is (and do you have some time shortly) to upgrade R from 3.1 to 3.4? I am asking this because I had some difficulties when I tried to manually

[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23012#discussion_r233290797 --- Diff: docs/index.md --- @@ -31,7 +31,8 @@ Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It's easy locally on one

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 > Could you add some descriptions to run a single test file or a single test case if exists? D

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 Yup! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22994: [BUILD] refactor dev/lint-python in to something readabl...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22994 I haven't taken a look super closely but the idea looks itself okay. Is it urgent? if yes, yup. I don't object to go ahead right away. Otherwise, might be good to leave open for few days

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 I am going to push after testing and double checking. The line counts would look like this ``` 54 ./test_utils.py 199 ./test_catalog.py 503

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 > I'd break the pandas udf one into smaller pieces too, as you suggested. We should also investigate why the runtime didn't improve ... One suspection from my investigat

[GitHub] spark pull request #23021: [SPARK-26032][PYTHON] Break large sql/tests.py fi...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23021#discussion_r233269827 --- Diff: python/pyspark/testing/sqlutils.py --- @@ -0,0 +1,268 @@ +# --- End diff -- Yea, similar thought. One thing is though testing

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 Yup, will break pandas one into smaller ones as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22962 Also please fix the test. The test doesn't really look clear. I actually quite didn't like the test written here now

[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22962#discussion_r233131375 --- Diff: python/pyspark/taskcontext.py --- @@ -147,8 +147,8 @@ def __init__(self): @classmethod def _getOrCreate(cls

[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22962#discussion_r233130494 --- Diff: python/pyspark/taskcontext.py --- @@ -147,8 +147,8 @@ def __init__(self): @classmethod def _getOrCreate(cls

[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22962#discussion_r233130221 --- Diff: python/pyspark/taskcontext.py --- @@ -147,8 +147,8 @@ def __init__(self): @classmethod def _getOrCreate(cls

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 Elapsed time looks virtually same. All tests looks running fine. The last commit should show skipped tests fine as well. Should be ready for a look

[GitHub] spark pull request #23004: [SPARK-26004][SQL] InMemoryTable support StartsWi...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23004#discussion_r233012392 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -237,6 +237,13 @@ case class

[GitHub] spark pull request #22979: [SPARK-25977][SQL] Parsing decimals from CSV usin...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22979#discussion_r233009506 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala --- @@ -104,6 +106,14 @@ class UnivocityParser

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 To all, so how about we start the fix @wangyum tried before? If we are generally agreed upon the direction itself, upgrading Hive to 2.3 (or 3), I would like to encourage him to continue #20659

[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 The test failure itself doesn't look caused by this change. The tests will fail anyway with a different error message. If the goal is really just to check if the tests pass or not, you

[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22962#discussion_r232991503 --- Diff: python/pyspark/taskcontext.py --- @@ -147,8 +147,8 @@ def __init__(self): @classmethod def _getOrCreate(cls

[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22962 The main code change LGTM too in any event --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22962#discussion_r232990319 --- Diff: python/pyspark/tests.py --- @@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self): """

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 For your information, here's the line counts for each file: ``` 52 ./test_utils.py 197 ./test_catalog.py 43 ./test_group.py 318 ./test_session.py

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 FWIW, I at least double checked if they are any tests missing, and if they are actually being ran (via coverage

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 adding @rxin (derived from mailing list) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23021: [SPARK-26032][PYTHON] Break large sql/tests.py files int...

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23021 @BryanCutler and @squito, Here is the official first attempt to break `pyspark/sql/tests.py` into multiple small files. If there are no outstanding issues (for instance, if we

[GitHub] spark pull request #23021: [SPARK-26032][PYTHON] Break large sql/tests.py fi...

2018-11-13 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/23021 [SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files ## What changes were proposed in this pull request? This is the official first attempt to break huge single

[GitHub] spark issue #23020: [MINOR][BUILD] Remove *.crc from .gitignore

2018-11-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23020 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22954#discussion_r232895848 --- Diff: R/pkg/R/SQLContext.R --- @@ -172,36 +257,72 @@ getDefaultSqlSource <- function() { createDataFrame <- function(data, schema

[GitHub] spark issue #23006: [SPARK-26007][SQL] DataFrameReader.csv() respects to spa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23006 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23014#discussion_r232893546 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -101,10 +101,11 @@ private void

[GitHub] spark issue #23014: [MINOR][SQL] Add disable bucketedRead workaround when th...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23014 > The reason is that each bucket file is too big Can you elaborate please? Is it because we don't chunk each file into multiple splits when we read bucketed ta

[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23014#discussion_r232885260 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -101,7 +101,8 @@ private void

[GitHub] spark issue #23018: [SPARK-26023][SQL] Dumping truncated plans and generated...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23018 Looks fine to me. adding @cloud-fan and @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #23018: [SPARK-26023][SQL] Dumping truncated plans and ge...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23018#discussion_r232883084 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -469,7 +471,21 @@ abstract class TreeNode[BaseType

[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23012 In this way, we could postpone R upgrade after Spark 3.0.0 release in Jenkins, and could still test the deprecated R version 3.1

[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23012 Nice. Thanks!. BTW Felix, are you maybe worrying about that we happen to upgrade R version in Jenkins to 3.4 and .. we could break lower deprecated R version support in Spark 3.0 I guess

[GitHub] zeppelin issue #3206: [ZEPPELIN-3810] Support Spark 2.4

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/zeppelin/pull/3206 Thank you all!! ---

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23008 BTW, let.s test them in end-to-end. For instance, `spark.range(1).rdd.map(lambda blabla).count()` --- - To unsubscribe

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23008 If the perf diff is big, let's don't change but document that we can use `CloudPickleSerializer()` to avoid breaking change. If the perf diff is rather trivial, let's check if we can

[GitHub] spark issue #23008: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23008 Nope, it should be manually done.. should be great to have it FWIW. I am not yet sure how we're going to measure the performance. I think you can show the performance diff

[GitHub] spark issue #23011: [SPARK-26013][R][BUILD] Upgrade R tools version from 3.4...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23011 Merged to master. Thanks, @srowen. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #23012: [SPARK-26014][R] Deprecate R prior to version 3.4...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23012#discussion_r232742917 --- Diff: R/pkg/R/sparkR.R --- @@ -283,6 +283,10 @@ sparkR.session <- function( enableHiveSupport = TRUE, ...) { + if (ut

[GitHub] spark issue #23012: [SPARK-26014][R] Deprecate R prior to version 3.4 in Spa...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23012 Yea will take a look to address. But about documenting unsupported, if we explicitly are going to say it's unsupported and dropped, for instance, we should remove the compatibility change

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 Ooops i rished to read. Yea but still sounds related but orthogonal. Let's move it to mailing list. That should be the best place to discuss further

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22429 @boy-uber, for structured streaming, let's do it out of this PR. I think the actual change of this PR can be small (1.). We can change this API for structured streaming later if needed since

<    1   2   3   4   5   6   7   8   9   10   >