Eyal Allweil created DATAFU-180:
-----------------------------------

             Summary: Expose missing methods in Python
                 Key: DATAFU-180
                 URL: https://issues.apache.org/jira/browse/DATAFU-180
             Project: DataFu
          Issue Type: Improvement
    Affects Versions: 1.8.0, 2.0.0, 1.7.0
            Reporter: Eyal Allweil
             Fix For: 2.1.0


The 
[dedupRandomN|https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L610]
 and 
[dedupByAllExcept|https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L258]
 methods are not fully exposed in Python.
 # They need to be added to 
[df_utils.py|https://github.com/apache/datafu/blob/6fd6fc4cb5e8156291600ee5f6ef3591dd74541e/datafu-spark/src/main/resources/pyspark_utils/df_utils.py]
 # _dedupByAllExcept_ needs to be added to 
[SparkDFUtilsBridge.|https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L35]
 # The tests in 
[df_utils_tests.py|https://github.com/apache/datafu/blob/master/datafu-spark/src/test/resources/python_tests/df_utils_tests.py]
 should include both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to