Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
I switched the examples to numeric examples so that we could avoid using
`ignore_unicode_prefix`, as requested by @HyukjinKwon in [this
commit](https://github.com/apache/spark/pull/17865/commits
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
Do I just have to `git rebase` on master?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
I think I addressed everything.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r123912061
--- Diff: python/pyspark/sql/functions.py ---
@@ -969,8 +1005,8 @@ def months_between(date1, date2):
"""
Returns the n
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
@HyukjinKwon I think I addressed all of your comments. Thank you for your
detailed review!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
@HyukjinKwon I could not get it to pass the tests without the unicode and
`ignore_unicode_prefix`
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
Ok, I removed the `ignore_unicode_prefix`, and it didn't have any test
problems. Also removed the date column renames where the original name was ok.
Finally, removed the string function
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r121860425
--- Diff: python/pyspark/sql/functions.py ---
@@ -1254,23 +1294,41 @@ def hash(*cols):
# -- String/Binary functions
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r121860217
--- Diff: python/pyspark/sql/functions.py ---
@@ -962,9 +993,9 @@ def add_months(start, months):
"""
Returns the date
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r121857727
--- Diff: python/pyspark/sql/functions.py ---
@@ -92,14 +98,16 @@ def _():
_functions_1_4 = {
# unary math functions
'acos': 'Computes
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r121857687
--- Diff: python/pyspark/sql/functions.py ---
@@ -189,15 +210,15 @@ def _():
}
for _name, _doc in _functions.items():
-globals
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
@HyukjinKwon @gatorsmile bump
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
Could I get a review for this? I think the only remaining questions is
whether (and how) to note the units for the trigonometry functions, like here:
https://github.com/map222/spark/blob
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r116550583
--- Diff: python/pyspark/sql/functions.py ---
@@ -153,7 +173,7 @@ def _():
# math functions that take two arguments as input
_binary_mathfunctions
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
Most recent commit reverts the "if" to "iff", changes all the backticks for
column names to single backtick, and tried new `:param:` option for angle
columns.
---
If yo
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r115889357
--- Diff: python/pyspark/sql/functions.py ---
@@ -153,7 +173,7 @@ def _():
# math functions that take two arguments as input
_binary_mathfunctions
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
@gatorsmile I checked four functions, `approx_count_distinct`, `coalesce`,
`covar_samp`, and `countDistinct`, comparing the python and Scala
documentation. None of them are the same. My guess
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r115404673
--- Diff: python/pyspark/sql/functions.py ---
@@ -153,7 +173,7 @@ def _():
# math functions that take two arguments as input
_binary_mathfunctions
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r115385649
--- Diff: python/pyspark/sql/functions.py ---
@@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz):
@since(1.5)
def to_utc_timestamp
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17865#discussion_r115385050
--- Diff: python/pyspark/sql/functions.py ---
@@ -793,8 +824,8 @@ def date_format(date, format):
.. note:: Use when ever possible specialized
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17865
@HyukjinKwon I ended up not making examples for the aggregate functions, as
I didn't make a good dataframe to demonstrate them. I could add more examples
for the string functions if you think
GitHub user map222 opened a pull request:
https://github.com/apache/spark/pull/17865
[SPARK-20456][Docs] Add examples for functions collection for pyspark
## What changes were proposed in this pull request?
This adds documentation to many functions
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17737#discussion_r113110805
--- Diff: python/pyspark/sql/column.py ---
@@ -527,7 +583,7 @@ def _test():
.appName("sql.column tests")\
.g
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17737#discussion_r113001853
--- Diff: python/pyspark/sql/column.py ---
@@ -288,8 +324,16 @@ def __iter__(self):
>>> df.filter(df.name.endswith('ice$')
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17737#discussion_r113001678
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -1008,7 +1009,7 @@ class Column(val expr: Expression) extends Logging {
def
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
Ok, I added ignore_unicode_prefix to the 6 functions, and it passed local
tests. I think it is ready for Jenkins again.
---
If your project is set up for it, you can reply to this email and have
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
@zero323 Aaaah, I had even identified what I needed to do! So I just need
to decorate `_unary_op` and `_bin_op`, yes?
---
If your project is set up for it, you can reply to this email and have your
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
The documentation unit test is now failing due to differences in python 2
and 3 strings. This is one of the error messages using the python 3.4 kernel:
```File
"/home/jenkins/work
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
@holdenk @srowen Could I get a Jenkins test for this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
I'd be happy to add other documentation, but I would also like to get my
first commit in! If you make a separate ticket to track that would help.
---
If your project is set up for it, you can reply
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
I can't get the test failure to replicate on my local machine. Running
`./python/run-tests.py --modules=pyspark-sql` locally doesn't give any errors.
The previous failure was due to not having
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
@HyukjinKwon I have updated the documentation to pass the tests (I didn't
realize the tests actually executed the documented code!). I was able to build
Spark locally, and run `./python/run-tests.py
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
@HyukjinKwon The Jenkins test failed. I'm having trouble running the tests
locally (I can't build Spark yet), and I can't decipher the Jenkins error
messages. Does something jump out to you
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
@HyukjinKwon Do I need to do something to start the Jenkins test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user map222 commented on the issue:
https://github.com/apache/spark/pull/17469
I think the latest commit addresses the formatting issues from above:
removed spaces insides `(..)`, removed the `\n` newlines, and made the
blockquotes more consistent with the rest of the code
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17469#discussion_r109989329
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
raise TypeError("Column is not ite
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17469#discussion_r109479266
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
raise TypeError("Column is not ite
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17469#discussion_r109477286
--- Diff: python/pyspark/sql/column.py ---
@@ -303,8 +333,25 @@ def isin(self, *cols):
desc = _unary_op("desc", "Returns a sort e
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17469#discussion_r109225499
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
raise TypeError("Column is not ite
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17469#discussion_r109225302
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
raise TypeError("Column is not ite
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17469#discussion_r109217816
--- Diff: python/pyspark/sql/column.py ---
@@ -250,11 +250,39 @@ def __iter__(self):
raise TypeError("Column is not ite
Github user map222 commented on a diff in the pull request:
https://github.com/apache/spark/pull/17469#discussion_r109036171
--- Diff: python/pyspark/sql/column.py ---
@@ -124,6 +124,35 @@ def _(self, other):
return _
+like_doc = """ Return a
GitHub user map222 opened a pull request:
https://github.com/apache/spark/pull/17469
[SPARK-20132][Docs]
## What changes were proposed in this pull request?
Add docstrings to column.py for the Column functions `rlike`, `like`,
`startswith`, and `endswith`. Pass these docstrings
43 matches
Mail list logo