[ 
https://issues.apache.org/jira/browse/DATAFU-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eyal Allweil updated DATAFU-164:
--------------------------------
    Description: 
We can get better code coverage and cover edge cases that are currently missing 
in our main tests file, 
[TestSparkDFUtils|https://github.com/apache/datafu/blob/master/datafu-spark/src/test/scala/datafu/spark/TestSparkDFUtils.scala].

 

For example, another test for 
[joinWithRange|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L419]
 that includes the case that a record falls into 
{{{}decreased_range_single{}}}, but {{range_start}} and {{range_end}} do not 
contain {{single.}}

Another case for the 
[flatten|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L256]
 API could also be good.

Or for 
[dedupRandomN|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L567].

Or adding tests that verify that our skewed join methods 
({_}broadcastJoinSkewed{_} and {_}joinSkewed{_}) give the same results as a 
regular join.

Or for anything else, for that matter.

 

It's perfectly alright to only do one of them - either as a patch or GitHub PR.

  was:
We can get better code coverage and cover edge cases that are currently missing 
in our main tests file, 
[TestSparkDFUtils|https://github.com/apache/datafu/blob/master/datafu-spark/src/test/scala/datafu/spark/TestSparkDFUtils.scala].

 

For example, another test for 
[joinWithRange|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L419]
 that includes the case that a record falls into 
{{{}decreased_range_single{}}}, but {{range_start}} and {{range_end}} do not 
contain {{single.}}

Another case for the 
[flatten|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L256]
 API could also be good.

Or for 
[dedupRandomN|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L567].
 Or for anything else, for that matter.

 

It's perfectly alright to only do one of them - either as a patch or GitHub PR.


> Improve test cases
> ------------------
>
>                 Key: DATAFU-164
>                 URL: https://issues.apache.org/jira/browse/DATAFU-164
>             Project: DataFu
>          Issue Type: Test
>            Reporter: Eyal Allweil
>            Priority: Major
>              Labels: good-first-issue, newbie, up-for-grabs
>
> We can get better code coverage and cover edge cases that are currently 
> missing in our main tests file, 
> [TestSparkDFUtils|https://github.com/apache/datafu/blob/master/datafu-spark/src/test/scala/datafu/spark/TestSparkDFUtils.scala].
>  
> For example, another test for 
> [joinWithRange|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L419]
>  that includes the case that a record falls into 
> {{{}decreased_range_single{}}}, but {{range_start}} and {{range_end}} do not 
> contain {{single.}}
> Another case for the 
> [flatten|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L256]
>  API could also be good.
> Or for 
> [dedupRandomN|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L567].
> Or adding tests that verify that our skewed join methods 
> ({_}broadcastJoinSkewed{_} and {_}joinSkewed{_}) give the same results as a 
> regular join.
> Or for anything else, for that matter.
>  
> It's perfectly alright to only do one of them - either as a patch or GitHub 
> PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to