Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20788
@ueshin @felixcheung @viirya hey, could you guys give this one a quick
look? I can work more on it, just in case
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/20788#discussion_r235183021
--- Diff: python/pyspark/sql/tests/test_dataframe.py ---
@@ -375,6 +375,19 @@ def test_generic_hints(self):
plan = df1.join(df2.hint
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20788
> **[Test build #98824 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98824/testReport)**
for PR 20788 at commit
[`de3257f`](https://github.com/apache/sp
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20788
@HyukjinKwon I added a test but I checked only the logical plan - is it ok?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20788
@HyukjinKwon Sure! I'll be working on this, then. Do you have any specific
scenarios in mind, so that I could start with them
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r202688999
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -128,6 +128,172 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r202683445
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -128,6 +128,172 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r194530372
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -128,6 +128,170 @@ case class
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
Is the CI broken? oO
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r193759057
--- Diff: python/pyspark/sql/functions.py ---
@@ -2394,6 +2394,23 @@ def array_repeat(col, count):
return Column(sc._jvm.functions.array_repeat
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r193053405
--- Diff: python/pyspark/sql/functions.py ---
@@ -350,7 +350,7 @@ def corr(col1, col2):
>>> a = range(20)
>>>
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
@ueshin Ok, looks like setting ev.isNull was causing the ev.value weird
behaviour - thank you! Updated with a working version
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r192862329
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -128,6 +128,175 @@ case class
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
Ok looks like the ternary if was the problem. Fixed it!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
@ueshin I took really long to update this because I'm getting a cache error
and didn't manage to figure why. Just in case I pushed my last changes with
your recommendation, and although looks
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r192729662
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,165 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r191412802
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,165 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r191208028
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,176 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r191202434
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,165 @@ case class
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
During the last days I looked for other ways to use `getValue` properly
without such repetition, but didn't get success at all :(
suggestions are (very) welcome
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r190301638
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,176 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r190270795
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,176 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r190095025
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,148 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r190094810
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
---
@@ -288,6 +289,88 @@ class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r190094430
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
---
@@ -288,6 +289,88 @@ class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r190036385
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,148 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r190034892
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,148 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r190014678
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,148 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189996412
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,148 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189925378
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
---
@@ -288,6 +289,88 @@ class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189900854
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,148 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189887317
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -127,6 +127,148 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189567647
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
---
@@ -265,6 +266,69 @@ class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189421719
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -126,6 +126,149 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189413681
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -126,6 +126,149 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189316555
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -126,6 +126,134 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189316316
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -126,6 +126,134 @@ case class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r189023172
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
---
@@ -199,6 +200,20 @@ class
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188947896
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,132 @@ case class MapKeys
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
Great! Also I finally discovered the unit-tests log file that logs the
generated java code :D
Since now it works I'll remove the WIP tag and focus on the other
suggestions (such as new
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
@mn-mikke I thought that `CodeGenerator.getValue` was directly used to
retrieve values from 1d arrays (such as an arraydata) - but I don't get how to
use it for 2d arrays (such as `Seq
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188694557
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,117 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188694041
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,117 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188693715
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,117 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188693270
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,117 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188420006
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,88 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188419498
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,88 @@ case class MapKeys
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
@mn-mikke thank you! Any idea on how to access elements of individual
arrays? In the old version I written a 'getValue' that uses
`CodeGenerator.getValue`, but since now it is a 2d data
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188283661
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,110 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r188283550
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,110 @@ case class MapKeys
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
Thank you so much for the suggestions! I tried to use IntelliJ a few times
but at the end I always return to sbt/terminal/vim after some frustration
(mainly due to not being able to configure
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
@mgaido91 Thank you for the suggestions and for being so patient. I updated
the code with `zip` name, more tests in CollectionExpression (I'll add more
after adding support to any number
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r187775631
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -90,6 +90,112 @@ case class MapKeys
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
Sorry for taking so long to update this - had a bad time looking for the
best way to wrap the zip result. At the end I decided to use
InternalRow/Struct, but I'm not sure if it is the best
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
@mgaido91 thank you, the suggestions were VERY enlightening! You are
correct, I tried to return the expected output in `doGenCode`, according with
others implementations I thougth
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/21045
Ok so It works fine in spark-shell but in pyspark I got this error:
```shell
File "/home/dguedes/Workspace/spark/python/pyspark/sql/functions.py", line
2155, in pyspark.sql.fun
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r181371296
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -87,6 +87,62 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r181196604
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -87,6 +87,62 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r181189694
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -87,6 +87,62 @@ case class MapKeys
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/21045#discussion_r181189546
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
---
@@ -87,6 +87,62 @@ case class MapKeys
GitHub user DylanGuedes opened a pull request:
https://github.com/apache/spark/pull/21045
[WIP][SPARK-23931][SQLAdds zip function to sparksql
Signed-off-by: DylanGuedes <djmggue...@gmail.com>
## What changes were proposed in this pull request?
(Pleas
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20788
Hi,
any new feedback about this?
thank you!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/20788#discussion_r175181523
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,12 @@ def hint(self, name, *parameters):
if not isinstance(name, str
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/20788#discussion_r174580902
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
if not isinstance(name, str
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/20788#discussion_r174208501
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
if not isinstance(name, str
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20788
Hi, I added two subtasks (one for Python and one for R)
[here](https://issues.apache.org/jira/browse/SPARK-21030)
Should I close this PR and open a new one or it is ok to just rename
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/20788#discussion_r173623998
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
if not isinstance(name, str
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/20788#discussion_r173618711
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
if not isinstance(name, str
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20775
Sure. Done.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user DylanGuedes opened a pull request:
https://github.com/apache/spark/pull/20788
[WIP][SPARK-21030][PYTHON][SQL] Adds more types for hint in pyspark
Signed-off-by: DylanGuedes <djmggue...@gmail.com>
## What changes were proposed in this pull r
Github user DylanGuedes commented on a diff in the pull request:
https://github.com/apache/spark/pull/20775#discussion_r173446874
--- Diff: examples/src/main/python/ml/dataframe_example.py ---
@@ -35,18 +35,18 @@
print("Usage: dataframe_example.py ", file=
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20775
@HyukjinKwon I checked every other built in function and this was the only
one being used. What you think
Github user DylanGuedes commented on the issue:
https://github.com/apache/spark/pull/20775
Sure, and thanks for the review! I used the "dataset" name because I
checked a few examples that also use it, but I'll rename to path then. I
already checked all other examples
GitHub user DylanGuedes opened a pull request:
https://github.com/apache/spark/pull/20775
Changes input variable to not conflict with built-in function
Signed-off-by: DylanGuedes <djmggue...@gmail.com>
## What changes were proposed in this pull request?
C
74 matches
Mail list logo