[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-07 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/23248#discussion_r239925749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -131,8 +131,20 @@ object ExtractPythonUDFs

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-12-07 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239922856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -144,24 +282,107 @@ case class

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-12-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239587375 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -144,24 +282,107 @@ case class

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-12-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239587065 --- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py --- @@ -87,8 +96,34 @@ def ordered_window(self): def unpartitioned_window(self

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-12-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239587136 --- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py --- @@ -245,11 +278,101 @@ def test_invalid_args(self): foo_udf

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-12-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239587089 --- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py --- @@ -231,12 +266,10 @@ def test_array_type(self): self.assertEquals(result1

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-12-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239587020 --- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py --- @@ -44,9 +44,18 @@ def python_plus_one(self): @property def

[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-06 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/23248#discussion_r239565253 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -131,8 +131,20 @@ object ExtractPythonUDFs

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-12-05 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 Hi @BryanCutler @HyukjinKwon @ueshin , mind taking another look? I think this is in a good shape. Thanks

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-21 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r235417425 --- Diff: python/pyspark/worker.py --- @@ -154,6 +154,47 @@ def wrapped(*series): return lambda *a: (wrapped(*a), arrow_return_type

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-20 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 @BryanCutler @HyukjinKwon @ueshin I have addressed all the comments so far. Could you please take another look? Thanks

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-20 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r235182927 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -63,7 +65,7 @@ private[spark] object PythonEvalType

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-19 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r234790479 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -27,17 +27,62 @@ import

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-19 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r234790633 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -73,68 +118,151 @@ case class

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-19 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r234790364 --- Diff: python/pyspark/sql/tests.py --- @@ -89,6 +89,7 @@ from pyspark.sql.types import _merge_type from pyspark.tests import QuietTest

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-19 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r234790403 --- Diff: python/pyspark/sql/tests.py --- @@ -7064,12 +7098,104 @@ def test_invalid_args(self): foo_udf = pandas_udf(lambda x: x, 'v

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r232393476 --- Diff: python/pyspark/sql/tests.py --- @@ -6323,6 +6333,33 @@ def ordered_window(self): def unpartitioned_window(self): return

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r232393452 --- Diff: python/pyspark/sql/tests.py --- @@ -6323,6 +6333,33 @@ def ordered_window(self): def unpartitioned_window(self): return

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r232393335 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -27,17 +27,62 @@ import

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r232393305 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -73,68 +118,147 @@ case class

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r232393187 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -73,68 +118,147 @@ case class

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r232388369 --- Diff: python/pyspark/worker.py --- @@ -154,6 +154,47 @@ def wrapped(*series): return lambda *a: (wrapped(*a), arrow_return_type

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-11-08 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r232084279 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -73,68 +118,147 @@ case class

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-06 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 No worries. Thank you @HyukjinKwon and @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-11-05 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 Hey @gatorsmile it has been quite a while with no review progress on this. @BryanCutler has some initial comments but I want to get more people's feedback before addressing those. Since now 2.4

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-10-23 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r227591746 --- Diff: python/pyspark/sql/tests.py --- @@ -6323,6 +6333,33 @@ def ordered_window(self): def unpartitioned_window(self): return

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-10-23 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r227591518 --- Diff: python/pyspark/sql/tests.py --- @@ -6481,12 +6516,116 @@ def test_invalid_args(self): foo_udf = pandas_udf(lambda x: x, 'v

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-10-23 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r227591428 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -63,7 +65,7 @@ private[spark] object PythonEvalType

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-10-23 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 @felixcheung I am waiting for some in-depth review. @ueshin do you have some time to review this in the near future? Thanks

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-10-11 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r224548624 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -63,7 +65,7 @@ private[spark] object PythonEvalType

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-10-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r223762966 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala --- @@ -63,7 +65,7 @@ private[spark] object PythonEvalType

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-10-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r223761447 --- Diff: python/pyspark/worker.py --- @@ -154,6 +154,47 @@ def wrapped(*series): return lambda *a: (wrapped(*a), arrow_return_type

[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...

2018-10-09 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r223754106 --- Diff: python/pyspark/worker.py --- @@ -154,6 +154,47 @@ def wrapped(*series): return lambda *a: (wrapped(*a), arrow_return_type

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-10-08 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 @BryanCutler Yes that was a typo :) Thanks! I am also +1 to support numpy data structure in addition to Pandas. So happy to discuss here or separately

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-10-08 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 Hey folks, any thoughts on this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

2018-10-04 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22620#discussion_r222698014 --- Diff: python/pyspark/sql/udf.py --- @@ -310,9 +319,11 @@ def register(self, name, f, returnType=None): "Invalid retur

[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

2018-10-03 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22620#discussion_r222456993 --- Diff: python/pyspark/sql/udf.py --- @@ -310,9 +319,11 @@ def register(self, name, f, returnType=None): "Invalid retur

[GitHub] spark issue #22620: [SPARK-25601][PYTHON] Register Grouped aggregate UDF Vec...

2018-10-03 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22620 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

2018-10-03 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22620#discussion_r222421940 --- Diff: python/pyspark/sql/udf.py --- @@ -298,6 +298,15 @@ def register(self, name, f, returnType=None): >>> spark.sql("

[GitHub] spark pull request #22620: [SPARK-25601][PYTHON] Register Grouped aggregate ...

2018-10-03 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22620#discussion_r222411585 --- Diff: python/pyspark/sql/udf.py --- @@ -298,6 +298,15 @@ def register(self, name, f, returnType=None): >>> spark.sql("

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-09-27 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 Gental ping @cloud-fan @gatorsmile @HyukjinKwon @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...

2018-09-20 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 cc @HyukjinKwon @ueshin @BryanCutler @felixcheung This PR is ready for review. I have updated the description so hopefully it is easier to review. Please let me know if you need any

[GitHub] spark pull request #22305: [WIP][SPARK-24561][SQL][Python] User-defined wind...

2018-09-17 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r218244042 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed

[GitHub] spark pull request #22305: [WIP][SPARK-24561][SQL][Python] User-defined wind...

2018-09-17 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r218243887 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-09-14 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

2018-09-06 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22329 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22329: [SPARK-25328][PYTHON] Add an example for having t...

2018-09-05 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22329#discussion_r215267320 --- Diff: python/pyspark/sql/functions.py --- @@ -2804,6 +2804,22 @@ def pandas_udf(f=None, returnType=None, functionType=None): | 1|1.5

[GitHub] spark pull request #22329: [SPARK-25328][PYTHON] Add an example for having t...

2018-09-04 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22329#discussion_r214940744 --- Diff: python/pyspark/sql/functions.py --- @@ -2804,6 +2804,20 @@ def pandas_udf(f=None, returnType=None, functionType=None): | 1|1.5

[GitHub] spark issue #22305: [WIP][SPARK-24561][SQL][Python] User-defined window aggr...

2018-08-31 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22305 The current state is a minimum working version - I copied some code from `WindowExec` to make this work but will need to refactor those

[GitHub] spark pull request #22305: [WIP][SPARK-24561][SQL][Python] User-defined wind...

2018-08-31 Thread icexelloss
GitHub user icexelloss opened a pull request: https://github.com/apache/spark/pull/22305 [WIP][SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) ## What changes were proposed in this pull request? ### **This is currently

[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...

2018-08-29 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22208 @dongjoon-hyun SGTM. I misunderstood your suggestion about resolver. Keeping it simple was my preference too

[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...

2018-08-28 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22208 @dongjoon-hyun Could please take another look? I changed to use resolver and try to resolve column with backticks and added unit tests as well

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-28 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 Thanks all for the review! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22244: [WIP][SPARK-24721][SPARK-25213][SQL] extract python UDF ...

2018-08-27 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22244 @cloud-fan Thanks! I will take a look later today and incorporate this with my patch. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22208: [SPARK-25216][SQL] Improve error message when a c...

2018-08-24 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22208#discussion_r212716787 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -216,8 +216,16 @@ class Dataset[T] private[sql]( private[sql] def

[GitHub] spark pull request #22208: [SPARK-25216][SQL] Improve error message when a c...

2018-08-24 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22208#discussion_r212629188 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -216,8 +216,16 @@ class Dataset[T] private[sql]( private[sql] def

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-23 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r212460124 --- Diff: python/pyspark/sql/tests.py --- @@ -3367,6 +3367,35 @@ def test_ignore_column_of_all_nulls(self): finally

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-23 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 @HyukjinKwon I addressed the comments. Do you mind taking a another look? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22208: Improve error message when a column containing do...

2018-08-23 Thread icexelloss
GitHub user icexelloss opened a pull request: https://github.com/apache/spark/pull/22208 Improve error message when a column containing dot cannot be resolved ## What changes were proposed in this pull request? The current error message is often confusing to a new Spark

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-23 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r212396812 --- Diff: python/pyspark/sql/tests.py --- @@ -3367,6 +3367,33 @@ def test_ignore_column_of_all_nulls(self): finally

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-23 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r212347966 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala --- @@ -117,15 +117,18 @@ abstract class EvalPythonExec

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-23 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r212340459 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala --- @@ -117,15 +117,18 @@ abstract class EvalPythonExec

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-23 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r212309541 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala --- @@ -117,15 +117,18 @@ abstract class EvalPythonExec

[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...

2018-08-22 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21546#discussion_r211964996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -183,34 +178,106 @@ private[sql] object

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-21 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r211733007 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala --- @@ -117,15 +117,18 @@ abstract class EvalPythonExec

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-17 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210996331 --- Diff: python/pyspark/sql/tests.py --- @@ -3367,6 +3367,33 @@ def test_ignore_column_of_all_nulls(self): finally

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-17 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210955687 --- Diff: python/pyspark/sql/tests.py --- @@ -3367,6 +3367,33 @@ def test_ignore_column_of_all_nulls(self): finally

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-17 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210954447 --- Diff: python/pyspark/sql/tests.py --- @@ -3367,6 +3367,33 @@ def test_ignore_column_of_all_nulls(self): finally

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-16 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 Tests pass now. This comment https://github.com/apache/spark/pull/22104/files#r210414941 requires some attention. @cloud-fan Do you think this is the right way to handle GenericInternalRow

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-15 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210414941 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala --- @@ -117,15 +117,18 @@ abstract class EvalPythonExec

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-15 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210410738 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala --- @@ -117,15 +117,16 @@ abstract class EvalPythonExec

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-15 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210391237 --- Diff: python/pyspark/sql/tests.py --- @@ -3367,6 +3367,35 @@ def test_ignore_column_of_all_nulls(self): finally

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-15 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210390770 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -133,6 +134,9 @@ object ExtractPythonUDFs

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-15 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210390399 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala --- @@ -117,15 +117,16 @@ abstract class EvalPythonExec

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-15 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 Thanks @HyukjinKwon and @cloud-fan ! I will take a look --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-14 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 I think another way to fix this is to move the logic to `ExtractPythonUDF` to ignore `FileScanExec` `DataSourceScanExec` and `DataSourceV2ScanExec` instead of changing all three rules

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-14 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 @gatorsmile Can you advise how to create a df with data source? All my attempts end up triggering FileSourceStrategy not DataSourceStrategy

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-14 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 @gatorsmile Possibly, let me see if I can create a test case --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-14 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r210052093 --- Diff: python/pyspark/sql/tests.py --- @@ -3367,6 +3367,24 @@ def test_ignore_column_of_all_nulls(self): finally

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-14 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 retest please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...

2018-08-14 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/22104 cc @cloud-fan . Followed your suggestion here: https://issues.apache.org/jira/browse/SPARK-24721?focusedCommentId=16560537=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in...

2018-08-14 Thread icexelloss
GitHub user icexelloss opened a pull request: https://github.com/apache/spark/pull/22104 [SPARK-24721][SQL] Exclude Python UDFs filters in FileSourceStrategy ## What changes were proposed in this pull request? The PR excludes Python UDFs filters in FileSourceStrategy so

[GitHub] spark issue #21928: [SPARK-24976][PYTHON] Allow None for Decimal type conver...

2018-07-31 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21928 I see. Yeah sounds good to me. On Tue, Jul 31, 2018 at 12:30 PM Hyukjin Kwon wrote: > I think we shouldn't change minimum PyArrow version in 2.4.0 and the > u

[GitHub] spark issue #21887: [SPARK-23633][SQL] Update Pandas UDFs section in sql-pro...

2018-07-31 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21887 Thanks! @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21928: [SPARK-24976][PYTHON] Allow None for Decimal type conver...

2018-07-31 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21928 @HyukjinKwon arrow 0.10.0 release is around the corner. I think Spark 2.4 will very likely to ship with 0.10.0 (where I believe this issue has been fixed, @BryanCutler can you confirm

[GitHub] spark issue #21887: [SPARK-23633][SQL] Update Pandas UDFs section in sql-pro...

2018-07-30 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21887 @HyukjinKwon I manually generated the doc and looks good to me. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21887: [SPARK-23633][SQL] Update Pandas UDFs section in ...

2018-07-28 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21887#discussion_r205943208 --- Diff: examples/src/main/python/sql/arrow.py --- @@ -113,6 +113,42 @@ def substract_mean(pdf): # $example off:grouped_map_pandas_udf

[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...

2018-07-28 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 Thanks @HyukjinKwon @BryanCutler for the review! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...

2018-07-27 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 retest please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-27 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205872386 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -94,36 +95,52 @@ object

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-27 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205866645 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -94,36 +95,61 @@ object

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-27 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205859891 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -94,36 +95,61 @@ object

[GitHub] spark issue #21887: [SPARK-23633][SQL] Update Pandas UDFs section in sql-pro...

2018-07-27 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21887 Thanks @HyukjinKwon ! I addressed the comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...

2018-07-27 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @BryanCutler @HyukjinKwon I updated the PR based on Bryan's suggestion. Please take a look and let me know if you have further comments. Thanks

[GitHub] spark issue #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Python UDF...

2018-07-26 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21650 @HyukjinKwon I think Bryan's imple looks promising. Please let me take a look. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21887: [SPARK-23633][SQL] Update Pandas UDFs section in ...

2018-07-26 Thread icexelloss
GitHub user icexelloss opened a pull request: https://github.com/apache/spark/pull/21887 [SPARK-23633][SQL] Update Pandas UDFs section in sql-programming-guide ## What changes were proposed in this pull request? Update Pandas UDFs section in sql-programming-guide. Add

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-26 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205448677 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -94,36 +95,94 @@ object

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-26 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205445392 --- Diff: python/pyspark/sql/tests.py --- @@ -5060,6 +5049,147 @@ def test_type_annotation(self): df = self.spark.range(1).select(pandas_udf

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205268767 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -94,36 +95,94 @@ object

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21650#discussion_r205262719 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -94,36 +95,94 @@ object

  1   2   3   4   5   6   7   8   >