[ https://issues.apache.org/jira/browse/DATAFU-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eyal Allweil updated DATAFU-164: -------------------------------- Description: We can get better code coverage and cover edge cases that are currently missing in our main tests file, [TestSparkDFUtils|https://github.com/apache/datafu/blob/master/datafu-spark/src/test/scala/datafu/spark/TestSparkDFUtils.scala]. For example, another test for [joinWithRange|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L419] that includes the case that a record falls into {{{}decreased_range_single{}}}, but {{range_start}} and {{range_end}} do not contain {{single.}} Another case for the [flatten|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L256] API could also be good. Or for [dedupRandomN|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L567]. Or adding tests that verify that our skewed join methods ({_}broadcastJoinSkewed{_} and {_}joinSkewed{_}) give the same results as a regular join. Or for anything else, for that matter. It's perfectly alright to only do one of them - either as a patch or GitHub PR. was: We can get better code coverage and cover edge cases that are currently missing in our main tests file, [TestSparkDFUtils|https://github.com/apache/datafu/blob/master/datafu-spark/src/test/scala/datafu/spark/TestSparkDFUtils.scala]. For example, another test for [joinWithRange|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L419] that includes the case that a record falls into {{{}decreased_range_single{}}}, but {{range_start}} and {{range_end}} do not contain {{single.}} Another case for the [flatten|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L256] API could also be good. Or for [dedupRandomN|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L567]. Or for anything else, for that matter. It's perfectly alright to only do one of them - either as a patch or GitHub PR. > Improve test cases > ------------------ > > Key: DATAFU-164 > URL: https://issues.apache.org/jira/browse/DATAFU-164 > Project: DataFu > Issue Type: Test > Reporter: Eyal Allweil > Priority: Major > Labels: good-first-issue, newbie, up-for-grabs > > We can get better code coverage and cover edge cases that are currently > missing in our main tests file, > [TestSparkDFUtils|https://github.com/apache/datafu/blob/master/datafu-spark/src/test/scala/datafu/spark/TestSparkDFUtils.scala]. > > For example, another test for > [joinWithRange|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L419] > that includes the case that a record falls into > {{{}decreased_range_single{}}}, but {{range_start}} and {{range_end}} do not > contain {{single.}} > Another case for the > [flatten|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L256] > API could also be good. > Or for > [dedupRandomN|https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala#L567]. > Or adding tests that verify that our skewed join methods > ({_}broadcastJoinSkewed{_} and {_}joinSkewed{_}) give the same results as a > regular join. > Or for anything else, for that matter. > > It's perfectly alright to only do one of them - either as a patch or GitHub > PR. -- This message was sent by Atlassian Jira (v8.20.10#820010)