[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674543#comment-15674543 ]
Saikat Kanjilal commented on SPARK-9487: ---------------------------------------- Sean, I took a look at the code and here it is: List<List<String>> inputData = Arrays.asList( Arrays.asList("hello", "world"), Arrays.asList("hello", "moon"), Arrays.asList("hello")); List<List<Tuple2<String, Long>>> expected = Arrays.asList( Arrays.asList( new Tuple2<>("hello", 1L), new Tuple2<>("world", 1L)), Arrays.asList( new Tuple2<>("hello", 1L), new Tuple2<>("moon", 1L)), Arrays.asList( new Tuple2<>("hello", 1L))); JavaDStream<String> stream = JavaTestUtils.attachTestInputStream(ssc, inputData, 1); JavaPairDStream<String, Long> counted = stream.countByValue(); JavaTestUtils.attachTestOutputStream(counted); List<List<Tuple2<String, Long>>> result = JavaTestUtils.runStreams(ssc, 3, 3); Assert.assertEquals(expected, result); As you can see the expected is assuming that the contents of the stream get counted accurately for every word, the output that gets generated through the flakiness just has hello,1 moon,1 reversed which I dont think matters, unless the goal of the test ist o identify words in order of how they enter the stream the expected and the actual answer are correct. Therefore net net the test is flaky, should I refactor the test to actually look at the word count and not the order, thoughts on next steps? > Use the same num. worker threads in Scala/Python unit tests > ----------------------------------------------------------- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests > Affects Versions: 1.5.0 > Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org