[ 
https://issues.apache.org/jira/browse/DATAFU-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938576#comment-17938576
 ] 

Ben Rahamim edited comment on DATAFU-180 at 3/26/25 12:09 PM:
--------------------------------------------------------------

Sounds good to me! Maybe, in the future, we can add several options, like 'min' 
or 'max' ... 

 

Can you open a PR please?


was (Author: benraha):
Sounds good to me! Maybe, in the future, we can add several options, like 'min' 
or 'max' ... 

> Expose missing methods in Python
> --------------------------------
>
>                 Key: DATAFU-180
>                 URL: https://issues.apache.org/jira/browse/DATAFU-180
>             Project: DataFu
>          Issue Type: Improvement
>    Affects Versions: 1.7.0, 2.0.0, 1.8.0
>            Reporter: Eyal Allweil
>            Priority: Minor
>              Labels: good-first-issue, newbie, python, scala, up-for-grabs
>             Fix For: 2.1.0
>
>
> The 
> [dedupRandomN|https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L610]
>  and 
> [dedupByAllExcept|https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L258]
>  methods are not fully exposed in Python.
>  # They need to be added to 
> [df_utils.py|https://github.com/apache/datafu/blob/6fd6fc4cb5e8156291600ee5f6ef3591dd74541e/datafu-spark/src/main/resources/pyspark_utils/df_utils.py]
>  # _dedupByAllExcept_ needs to be added to 
> [SparkDFUtilsBridge.|https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L35]
>  # The tests in 
> [df_utils_tests.py|https://github.com/apache/datafu/blob/master/datafu-spark/src/test/resources/python_tests/df_utils_tests.py]
>  should include both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to