[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16534 Recently we hit some problems while extending python udf, to support `asNondeterministic`, `asNonNullable`, etc. It's really confusing if the return type is just a python function. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16534 Is this still a problem? Now `UserDefinedFunction` defines `returnType` as a property. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 I agree, just in case someone does have an isinstance check (or similar) we should document the change in the release notes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 Thanks @holdenk. I think it should be mentioned as a change of behavior in the release notes. We don't change API, and `UserDefinedFunction` is hardly public (it is not even included in the docs), nevertheless it is a change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 Merged to master, thanks @zero323 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 Great! Thanks for doing this, will merge to master :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 Don't worry, I get it :) The point is to make user experience better not worse, right? In practice: - These changes are pretty far from data, so overall impact is negligible and constant. - For UDF creation overhead is around ~8 microseconds (this doesn't include any JVM communication). - With Py4J call (JUDF and Column creation) everything is bound by JVM communication which has three orders of magnitude higher latency than our Python code. Rough tests (build 8f33731e796750e6f60dc9e2fc33a94d29d198b4): ``` Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.2.0-SNAPSHOT /_/ Using Python version 3.5.2 (default, Jul 2 2016 17:53:06) SparkSession available as 'spark'. In [1]: from pyspark.sql.functions import udf In [2]: from functools import wraps In [3]: def wrapped(f): ...: f_ = udf(f) ...: @wraps(f) ...: def wrapped_(*args): ...: return f_(*args) ...: return wrapped_ ...: In [4]: %timeit udf(lambda x: x) The slowest run took 8.96 times longer than the fastest. This could mean that an intermediate result is being cached. 10 loops, best of 3: 3.45 µs per loop In [5]: %timeit wrapped(lambda x: x) The slowest run took 6.67 times longer than the fastest. This could mean that an intermediate result is being cached. 10 loops, best of 3: 12.3 µs per loop In [6]: %timeit udf(lambda x: x)("x") The slowest run took 13.64 times longer than the fastest. This could mean that an intermediate result is being cached. 100 loops, best of 3: 11.3 ms per loop In [7]: %timeit wrapped(lambda x: x)("a") 100 loops, best of 3: 9.9 ms per loop In [8]: %timeit -n10 spark.range(0, 1).toDF("id").select(udf(lambda x: x)("id")).rdd.foreach(lambda _: None) 10 loops, best of 3: 227 ms per loop In [9]: %timeit -n10 spark.range(0, 1).toDF("id").select(wrapped(lambda x: x)("id")).rdd.foreach(lambda _: None) 10 loops, best of 3: 206 ms per loop ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 Yes pydoc.help does depend on looking at the docstring on the type rather than the object :( Too bad the IPython magic isn't used in pydoc too. Sorry for all the back and forth, I'm just trying to see if we can improve the documentation without slowing down our already not-super-fast Python UDF performance - how would you feel about doing a small perf test with Python UDFs to make sure this doesn't cause a regression? If there is no regression it looks fine, but if there is maybe we should explore the dynamic sub-classing option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 `update_wrapper` works the same way as `wraps` - it will be useful for IPython, which uses relatively complex inspection rules, but will be useless anywhere when one depends on `pydoc.help`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 I'm not sure about `wraps` but with `update_wrapper`, I tested it in a Jupyter kernel and it seems to give all of the docstring and signature information without adding another function dispatch inside of PySpark UDFs. In IPython ` def foo(x): """Identity""" return x class F(): def __init__(self, f): self.f = f def __call__(self, x): return f(x) a = update_wrapper(F(foo), foo)` results in a help string (from `?a`) of: > Call signature: a(x) Type: instance Base Class: __main__.F String form:<__main__.F instance at 0x7febb43d6ef0> Docstring: Identity Which seems like everything the current implementation does without adding the indirection. Is this not the behavior you are seeing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 To a very limited extent. It can bring some useful information in IPython / Jupyter (maybe some other tools as well) but won't work with built-in `help` / `pydoc.help`. You can compare: ```python from functools import wraps def f(x, *args): """This is some function""" return x class F(): def __init__(self, f): self.f = f def __call__(self, x): return f(x) g = wraps(f)(F(f)) @wraps(f) def h(x): return F(f)(x) ?g help(g) ?h help(h) ``` As far as I am aware it is either this or dynamical inheritance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 So it feels like we are adding an extra layer of indirection unnecessarily, could you use update_wrapper from functools directly on the udf object? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 Sure, I'll take another closer look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16534 Change looks good to me but I didn't look super carefully. @holdenk can you take a look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72966/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #72966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72966/testReport)** for PR 16534 at commit [`64bba41`](https://github.com/apache/spark/commit/64bba41fe062dc39ad8708fa4dd825e609254814). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #72966 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72966/testReport)** for PR 16534 at commit [`64bba41`](https://github.com/apache/spark/commit/64bba41fe062dc39ad8708fa4dd825e609254814). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72949/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #72949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72949/testReport)** for PR 16534 at commit [`3b3a41b`](https://github.com/apache/spark/commit/3b3a41bd351bc55259d751ecafcef297bb04ccd6). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72951/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #72951 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72951/testReport)** for PR 16534 at commit [`2a0ac46`](https://github.com/apache/spark/commit/2a0ac46c1b36626566968b8fde78b70502ddf5df). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #72951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72951/testReport)** for PR 16534 at commit [`2a0ac46`](https://github.com/apache/spark/commit/2a0ac46c1b36626566968b8fde78b70502ddf5df). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #72949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72949/testReport)** for PR 16534 at commit [`3b3a41b`](https://github.com/apache/spark/commit/3b3a41bd351bc55259d751ecafcef297bb04ccd6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #72242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72242/testReport)** for PR 16534 at commit [`9168009`](https://github.com/apache/spark/commit/9168009c9df8988bccd88ff82bbd4e1605ba2cbf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #72242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72242/testReport)** for PR 16534 at commit [`9168009`](https://github.com/apache/spark/commit/9168009c9df8988bccd88ff82bbd4e1605ba2cbf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 @rxin I am not aware of any straightforward way of separating these two, but I focused on the docstrings anyway. The rationale is simple - I want to be able to: - Create packages containing UDFs. - [Get concise syntax with decorators](https://github.com/apache/spark/pull/16533) without need for intermediate functions, or nesting. - [Import UDFs without side effects](https://github.com/apache/spark/pull/16536). - Have docstrings and argument annotations which correspond to the function I wrap, not a generic `UserDefinedFunctionObject` - this is what I want to achieve here. As illustrated in the JIRA ticket what we get right now is completely useless: ``` In [5]: ?add_one Type:UserDefinedFunction String form: File:~/Spark/spark-2.0/python/pyspark/sql/functions.py Signature: add_one(*cols) Docstring: User defined function in Python .. versionadded:: 1.3 ``` ``` help(add_one) Help on UserDefinedFunction in module pyspark.sql.functions object: class UserDefinedFunction(builtins.object) | User defined function in Python | | .. versionadded:: 1.3 | | Methods defined here: | | __call__(self, *cols) | Call self as a function. | | __del__(self) | | __init__(self, func, returnType, name=None) | Initialize self. See help(type(self)) for accurate signature. | | -- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) (END) ``` REPL is definitely the main use case. Handling docs with `wraps` is much trickier, but there are known workarounds . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16534 Is the goal to change the doc or the repl string? It might be useful to change the repl string but I'm not sure if it is worth changing the doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 Thanks @holdenk! Let's wait for another opinion (maybe @rxin) and if it is not acceptable I'll just close this and ask for closing the ticket. Theoretically we could define a constructor with dynamic type: ```python type(name, (UserDefinedFunction, ), {"__doc__": func.__doc__}) ``` but this is way to hacky. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 So I'm not super comfortable changing the return type (what about if user code has `isinstance` checks with `UserDefinedFunction`?) That being said if @davies or one of the other committers thinks this is an OK change as is I'm fine with that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71723/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #71723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71723/testReport)** for PR 16534 at commit [`65411a1`](https://github.com/apache/spark/commit/65411a1d1e8f6e396197a0748c306c3f83f53f76). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 @holdenk I used function arguments to make sure that public API, though not types, is preserved. Please let me know what you think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #71723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71723/testReport)** for PR 16534 at commit [`65411a1`](https://github.com/apache/spark/commit/65411a1d1e8f6e396197a0748c306c3f83f53f76). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71685/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #71685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71685/testReport)** for PR 16534 at commit [`8dd9071`](https://github.com/apache/spark/commit/8dd9071c2f847af5a0a29ddf0b0ad4a3e48c9b3a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #71685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71685/testReport)** for PR 16534 at commit [`8dd9071`](https://github.com/apache/spark/commit/8dd9071c2f847af5a0a29ddf0b0ad4a3e48c9b3a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #71680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71680/testReport)** for PR 16534 at commit [`3bac064`](https://github.com/apache/spark/commit/3bac064ef2031039813da5e13040675c0777436d). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16534 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71680/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16534 **[Test build #71680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71680/testReport)** for PR 16534 at commit [`3bac064`](https://github.com/apache/spark/commit/3bac064ef2031039813da5e13040675c0777436d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 @holdenk Indeed. Not the most fortunate moment for making a bunch of connected PRs :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/16534 @holdenk I don't think it should go to the point release at all (same as https://github.com/apache/spark/pull/16533 which, depending on the resolution, may introduce new functionality or breaking API changes). https://github.com/apache/spark/pull/16538 went to 2.2 so I think it is a reasonable target for all subtasks in [SPARK-19159](https://issues.apache.org/jira/browse/SPARK-19159). That being said public vs. private is a bit fuzzy here. `udf` docstring states that it: > Creates a `Column` expression representing a user defined function (UDF) and doesn't document return type otherwise. This is obviously not true. It is also worth noting that we can use a function wrapper without any changes to the API. It is not the most common practice but we can add required attributes to the function to keep full backwards compatibility for the time being. One way or another it would be nice to make it consistent with [SPARK-18777](https://issues.apache.org/jira/browse/SPARK-18777) though. If we go with a function wrapper here, it would make sense to use one there as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 It's a bit hard to follow up wit those during JIRA maintenance window - I'll follow up after JIRA comes back online :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16534 Improving UDF Docstrings for Python seems like a good idea, but at the cost of breaking the public API in a point release I think it might make sense for us to do the more work approach unless there is a really strong argument for why this part of the API isn't really public. But that's just my thoughts, what maybe @davies has a different opinion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org