[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674543#comment-15674543
 ] 

Saikat Kanjilal commented on SPARK-9487:
----------------------------------------

Sean,
I took a look at the code and here it is:

List<List<String>> inputData = Arrays.asList(
      Arrays.asList("hello", "world"),
      Arrays.asList("hello", "moon"),
      Arrays.asList("hello"));

    List<List<Tuple2<String, Long>>> expected = Arrays.asList(
        Arrays.asList(
            new Tuple2<>("hello", 1L),
            new Tuple2<>("world", 1L)),
        Arrays.asList(
            new Tuple2<>("hello", 1L),
            new Tuple2<>("moon", 1L)),
        Arrays.asList(
            new Tuple2<>("hello", 1L)));

    JavaDStream<String> stream = JavaTestUtils.attachTestInputStream(ssc, 
inputData, 1);
    JavaPairDStream<String, Long> counted = stream.countByValue();
    JavaTestUtils.attachTestOutputStream(counted);
    List<List<Tuple2<String, Long>>> result = JavaTestUtils.runStreams(ssc, 3, 
3);

    Assert.assertEquals(expected, result);


As you can see the expected is assuming that the contents of the stream get 
counted accurately for every word, the output that gets generated through the 
flakiness just has hello,1 moon,1 reversed which I dont think matters, unless 
the goal of the test ist o identify words in order of how they enter the stream 
the expected and the actual answer are correct.  Therefore net net the test is 
flaky, should I refactor the test to actually look at the word count and not 
the order, thoughts on next steps?


> Use the same num. worker threads in Scala/Python unit tests
> -----------------------------------------------------------
>
>                 Key: SPARK-9487
>                 URL: https://issues.apache.org/jira/browse/SPARK-9487
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core, SQL, Tests
>    Affects Versions: 1.5.0
>            Reporter: Xiangrui Meng
>              Labels: starter
>         Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to