Github user mxm commented on a diff in the pull request:
https://github.com/apache/flink/pull/2593#discussion_r81946646
--- Diff:
flink-fs-tests/src/test/java/org/apache/flink/hdfstests/ContinuousFileMonitoringTest.java
---
@@ -106,6 +107,155 @@ public static void destroyHDFS() {
// TESTS
@Test
+ public void testFileReadingOperatorWithIngestionTime() throws Exception
{
+ Set<org.apache.hadoop.fs.Path> filesCreated = new HashSet<>();
+ Map<Integer, String> expectedFileContents = new HashMap<>();
+
+ for(int i = 0; i < NO_OF_FILES; i++) {
+ Tuple2<org.apache.hadoop.fs.Path, String> file =
fillWithData(hdfsURI, "file", i, "This is test line.");
+ filesCreated.add(file.f0);
+ expectedFileContents.put(i, file.f1);
+ }
+
+ TextInputFormat format = new TextInputFormat(new Path(hdfsURI));
+ TypeInformation<String> typeInfo =
TypeExtractor.getInputFormatTypes(format);
+
+ final long watermarkInterval = 10;
+ ExecutionConfig executionConfig = new ExecutionConfig();
+ executionConfig.setAutoWatermarkInterval(watermarkInterval);
+
+ ContinuousFileReaderOperator<String, ?> reader = new
ContinuousFileReaderOperator<>(format);
+ reader.setOutputType(typeInfo, executionConfig);
+
+ final TestTimeServiceProvider timeServiceProvider = new
TestTimeServiceProvider();
+ final OneInputStreamOperatorTestHarness<FileInputSplit, String>
tester =
+ new OneInputStreamOperatorTestHarness<>(reader,
executionConfig, timeServiceProvider);
+ tester.setTimeCharacteristic(TimeCharacteristic.IngestionTime);
+ tester.open();
+
+ Assert.assertEquals(TimeCharacteristic.IngestionTime,
tester.getTimeCharacteristic());
+
+ // test that watermarks are correctly emitted
+
+ timeServiceProvider.setCurrentTime(201);
+ timeServiceProvider.setCurrentTime(301);
+ timeServiceProvider.setCurrentTime(401);
+ timeServiceProvider.setCurrentTime(501);
+
+ int i = 0;
+ for(Object line: tester.getOutput()) {
+ if (!(line instanceof Watermark)) {
+ Assert.fail("Only watermarks are expected here
");
+ }
+ Watermark w = (Watermark) line;
+ Assert.assertEquals(200 + (i * 100), w.getTimestamp());
+ i++;
+ }
+
+ // clear the output to get the elements only and the final
watermark
+ tester.getOutput().clear();
+ Assert.assertEquals(0, tester.getOutput().size());
+
+ // create the necessary splits for the test
+ FileInputSplit[] splits = format.createInputSplits(
+
reader.getRuntimeContext().getNumberOfParallelSubtasks());
+
+ // and feed them to the operator
+ Map<Integer, List<String>> actualFileContents = new HashMap<>();
+
+ long lastSeenWatermark = Long.MIN_VALUE;
+ int lineCounter = 0; // counter for the lines read from the
splits
+ int watermarkCounter = 0;
+
+ for(FileInputSplit split: splits) {
+
+ // set the next "current processing time".
+ long nextTimestamp =
timeServiceProvider.getCurrentProcessingTime() + watermarkInterval;
+ timeServiceProvider.setCurrentTime(nextTimestamp);
+
+ // send the next split to be read and wait until it is
fully read.
+ tester.processElement(new StreamRecord<>(split));
+ synchronized (tester.getCheckpointLock()) {
+ while (tester.getOutput().isEmpty() ||
tester.getOutput().size() != (LINES_PER_FILE + 1)) {
+ tester.getCheckpointLock().wait(10);
--- End diff --
Actually, sleeping wouldn't be necessary if you disabled the threaded split
processing of the `SplitReader` for this test. You could have a synchronous
reader for the test (would require a small change of the operator/reader to
enable that).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---