[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2018-01-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16534
  
Recently we hit some problems while extending python udf, to support 
`asNondeterministic`, `asNonNullable`, etc. It's really confusing if the return 
type is just a python function.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2018-01-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16534
  
Is this still a problem? Now `UserDefinedFunction` defines `returnType` as 
a property.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-24 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
I agree, just in case someone does have an isinstance check (or similar) we 
should document the change in the release notes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-24 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
Thanks @holdenk. I think it should be mentioned as a change of behavior in 
the release notes. We don't change API, and `UserDefinedFunction` is hardly 
public (it is not even included in the docs), nevertheless it is a change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-24 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
Merged to master, thanks @zero323 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-24 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
Great! Thanks for doing this, will merge to master :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-24 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
Don't worry, I get it :) The point is to make user experience better not 
worse, right? In practice:

- These changes are pretty far from data, so overall impact is negligible 
and constant.
- For UDF creation overhead is around ~8 microseconds (this doesn't include 
any JVM communication).
- With Py4J call (JUDF and Column creation) everything is bound by JVM 
communication which has three orders of magnitude higher latency than our 
Python code.

Rough tests (build 8f33731e796750e6f60dc9e2fc33a94d29d198b4):

```
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.0-SNAPSHOT
  /_/

Using Python version 3.5.2 (default, Jul  2 2016 17:53:06)
SparkSession available as 'spark'.

In [1]: from pyspark.sql.functions import udf

In [2]: from functools import wraps

In [3]: def wrapped(f):
   ...: f_ = udf(f)
   ...: @wraps(f)
   ...: def wrapped_(*args):
   ...: return f_(*args)
   ...: return wrapped_
   ...: 

In [4]: %timeit udf(lambda x: x)
The slowest run took 8.96 times longer than the fastest. This could mean 
that an intermediate result is being cached.
10 loops, best of 3: 3.45 µs per loop

In [5]: %timeit wrapped(lambda x: x)
The slowest run took 6.67 times longer than the fastest. This could mean 
that an intermediate result is being cached.
10 loops, best of 3: 12.3 µs per loop

In [6]: %timeit udf(lambda x: x)("x")
The slowest run took 13.64 times longer than the fastest. This could mean 
that an intermediate result is being cached.
100 loops, best of 3: 11.3 ms per loop

In [7]: %timeit wrapped(lambda x: x)("a")
100 loops, best of 3: 9.9 ms per loop

In [8]: %timeit -n10  spark.range(0, 1).toDF("id").select(udf(lambda x: 
x)("id")).rdd.foreach(lambda _: None)
10 loops, best of 3: 227 ms per loop

In [9]: %timeit -n10  spark.range(0, 
1).toDF("id").select(wrapped(lambda x: x)("id")).rdd.foreach(lambda _: None)
10 loops, best of 3: 206 ms per loop
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-23 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
Yes pydoc.help does depend on looking at the docstring on the type rather 
than the object :( Too bad the IPython magic isn't used in pydoc too.

Sorry for all the back and forth, I'm just trying to see if we can improve 
the documentation without slowing down our already not-super-fast Python UDF 
performance - how would you feel about doing a small perf test with Python UDFs 
to make sure this doesn't cause a regression?

If there is no regression it looks fine, but if there is maybe we should 
explore the dynamic sub-classing option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-23 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
`update_wrapper` works the same way as `wraps` - it will be useful for 
IPython, which uses relatively complex inspection rules, but will be useless 
anywhere when one depends on `pydoc.help`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-23 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
I'm not sure about `wraps` but with `update_wrapper`, I tested it in a 
Jupyter kernel and it seems to give all of the docstring and signature 
information without adding another function dispatch inside of PySpark UDFs.

In IPython
`
def foo(x):
"""Identity"""
return x

class F():
def __init__(self, f):
self.f = f
def __call__(self, x):
return f(x)
a = update_wrapper(F(foo), foo)`

results in a help string (from `?a`) of:

> Call signature: a(x)
Type:   instance
Base Class: __main__.F
String form:<__main__.F instance at 0x7febb43d6ef0>
Docstring:  Identity


Which seems like everything the current implementation does without adding 
the indirection. Is this not the behavior you are seeing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-22 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
To a very limited extent. It can bring some useful information in IPython  
/ Jupyter (maybe some other tools as well) but won't work with built-in `help` 
/ `pydoc.help`.

You  can compare:

```python
from functools import wraps

def f(x, *args):
"""This is
some function"""
return x

class F():
def __init__(self, f):
self.f = f
def __call__(self, x):
return f(x)

g = wraps(f)(F(f))

@wraps(f)
def h(x):
return F(f)(x)

?g
help(g)

?h
help(h)
```

As far as I am aware it is either this or dynamical inheritance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-22 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
So it feels like we are adding an extra layer of indirection unnecessarily, 
could you use update_wrapper from functools directly on the udf object?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-16 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
Sure, I'll take another closer look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-16 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16534
  
Change looks good to me but I didn't look super carefully.

@holdenk can you take a look at this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72966/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #72966 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72966/testReport)**
 for PR 16534 at commit 
[`64bba41`](https://github.com/apache/spark/commit/64bba41fe062dc39ad8708fa4dd825e609254814).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #72966 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72966/testReport)**
 for PR 16534 at commit 
[`64bba41`](https://github.com/apache/spark/commit/64bba41fe062dc39ad8708fa4dd825e609254814).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72949/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #72949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72949/testReport)**
 for PR 16534 at commit 
[`3b3a41b`](https://github.com/apache/spark/commit/3b3a41bd351bc55259d751ecafcef297bb04ccd6).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72951/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #72951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72951/testReport)**
 for PR 16534 at commit 
[`2a0ac46`](https://github.com/apache/spark/commit/2a0ac46c1b36626566968b8fde78b70502ddf5df).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #72951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72951/testReport)**
 for PR 16534 at commit 
[`2a0ac46`](https://github.com/apache/spark/commit/2a0ac46c1b36626566968b8fde78b70502ddf5df).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #72949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72949/testReport)**
 for PR 16534 at commit 
[`3b3a41b`](https://github.com/apache/spark/commit/3b3a41bd351bc55259d751ecafcef297bb04ccd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #72242 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72242/testReport)**
 for PR 16534 at commit 
[`9168009`](https://github.com/apache/spark/commit/9168009c9df8988bccd88ff82bbd4e1605ba2cbf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #72242 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72242/testReport)**
 for PR 16534 at commit 
[`9168009`](https://github.com/apache/spark/commit/9168009c9df8988bccd88ff82bbd4e1605ba2cbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-26 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
@rxin I am not aware of any straightforward way of separating these two, 
but I focused on the docstrings anyway. The rationale is simple -  I want to be 
able to:

- Create packages containing UDFs.
- [Get concise syntax with 
decorators](https://github.com/apache/spark/pull/16533) without need for 
intermediate functions, or nesting.
- [Import UDFs without side 
effects](https://github.com/apache/spark/pull/16536).
- Have docstrings and argument annotations which correspond to the function 
I wrap, not a generic `UserDefinedFunctionObject` -  this is what I want to 
achieve here.  As illustrated in the JIRA ticket what we get right now is 
completely useless:

  ```
  In [5]: ?add_one
  Type:UserDefinedFunction
  String form: 
  File:~/Spark/spark-2.0/python/pyspark/sql/functions.py
  Signature:   add_one(*cols)
  Docstring:
  User defined function in Python


  .. versionadded:: 1.3
  ```

  ```
   help(add_one)
  
  Help on UserDefinedFunction in module pyspark.sql.functions object:
  
  class UserDefinedFunction(builtins.object)
   |  User defined function in Python
   |  
   |  .. versionadded:: 1.3
   |  
   |  Methods defined here:
   |  
   |  __call__(self, *cols)
   |  Call self as a function.
   |  
   |  __del__(self)
   |  
   |  __init__(self, func, returnType, name=None)
   |  Initialize self.  See help(type(self)) for accurate signature.
   |  
   |  --
   |  Data descriptors defined here:
   |  
   |  __dict__
   |  dictionary for instance variables (if defined)
   |  
   |  __weakref__
   |  list of weak references to the object (if defined)
  (END)
   ```

  REPL is definitely the main use case. Handling docs with `wraps` is much 
trickier, but there are known workarounds .




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16534
  
Is the goal to change the doc or the repl string? It might be useful to 
change the repl string but I'm not sure if it is worth changing the doc.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-26 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
Thanks @holdenk! Let's wait for another opinion (maybe @rxin) and if it is 
not acceptable I'll just close  this and ask for closing the ticket. 
Theoretically we could define a constructor with dynamic type:

```python
type(name, (UserDefinedFunction, ), {"__doc__":  func.__doc__})
```

but this is way to hacky. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-26 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
So I'm not super comfortable changing the return type (what about if user 
code has `isinstance` checks with `UserDefinedFunction`?) That being said if 
@davies or one of the other committers thinks this is an OK change as is I'm 
fine with that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71723/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #71723 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71723/testReport)**
 for PR 16534 at commit 
[`65411a1`](https://github.com/apache/spark/commit/65411a1d1e8f6e396197a0748c306c3f83f53f76).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-20 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
@holdenk I used function arguments to make sure that public API, though not 
types, is preserved. Please let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #71723 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71723/testReport)**
 for PR 16534 at commit 
[`65411a1`](https://github.com/apache/spark/commit/65411a1d1e8f6e396197a0748c306c3f83f53f76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71685/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #71685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71685/testReport)**
 for PR 16534 at commit 
[`8dd9071`](https://github.com/apache/spark/commit/8dd9071c2f847af5a0a29ddf0b0ad4a3e48c9b3a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #71685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71685/testReport)**
 for PR 16534 at commit 
[`8dd9071`](https://github.com/apache/spark/commit/8dd9071c2f847af5a0a29ddf0b0ad4a3e48c9b3a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #71680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71680/testReport)**
 for PR 16534 at commit 
[`3bac064`](https://github.com/apache/spark/commit/3bac064ef2031039813da5e13040675c0777436d).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16534
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71680/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16534
  
**[Test build #71680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71680/testReport)**
 for PR 16534 at commit 
[`3bac064`](https://github.com/apache/spark/commit/3bac064ef2031039813da5e13040675c0777436d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-12 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
@holdenk Indeed. Not the most fortunate moment for making a bunch of 
connected PRs :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-12 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/16534
  
@holdenk I don't think it should go to the point release at all (same as 
https://github.com/apache/spark/pull/16533 which, depending on the resolution, 
may introduce new functionality or breaking API changes). 
https://github.com/apache/spark/pull/16538 went to 2.2 so I think it is a 
reasonable target for all subtasks in 
[SPARK-19159](https://issues.apache.org/jira/browse/SPARK-19159).

That being said public vs. private is a bit fuzzy here. `udf` docstring 
states that it:

> Creates a `Column` expression representing a user defined function (UDF)

and doesn't document return type otherwise. This is obviously not true. 

It is also worth noting that we can use a function wrapper without any 
changes to the API. It is not the most common practice but we can add required 
attributes to the function to keep full backwards compatibility for the time 
being.

One way or another it would be nice to make it consistent with 
[SPARK-18777](https://issues.apache.org/jira/browse/SPARK-18777) though. If we 
go with a function wrapper here, it would make sense to use one there as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-12 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
It's a bit hard to follow up wit those during JIRA maintenance window - 
I'll follow up after JIRA comes back online :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-12 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/16534
  
Improving UDF Docstrings for Python seems like a good idea, but at the cost 
of breaking the public API in a point release I think it might make sense for 
us to do the more work approach unless there is a really strong argument for 
why this part of the API isn't really public. But that's just my thoughts, what 
maybe @davies has a different opinion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org