Confuse created FLUME-3350:
------------------------------

             Summary: Spooldir source may collect empty files and write them to 
HDFS
                 Key: FLUME-3350
                 URL: https://issues.apache.org/jira/browse/FLUME-3350
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: 1.9.0
            Reporter: Confuse
         Attachments: image-2020-01-09-21-33-55-306.png

When I collect data from spooldir source to HDFS,i found if an empty file is 
created in spoolDir, an empty file with the same name will appear on hfds. It 
seems unreasonable. After reading source coding,i fount this code the following 
conditions will never be true in SpoolDirectorySource class.

 public void run() {
      int backoffInterval = 250;
      boolean readingEvents = false;
      try {
        while (!Thread.interrupted()) {
          readingEvents = true;
          List<Event> events = reader.readEvents(batchSize);
          readingEvents = false;
          
           # this conditions will never be true
          if (events.isEmpty()) {
            break;
          }
       .
       .
       .
}

Please confirm whether this phenomenon is a problem. In my opinion, collecting 
empty file is meaningless. Especially for HDFS, it is not allowed to store too 
many small files on HDFS. Even if the user puts a lot of empty files 
unconsciously, flume should process it instead of writing to HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to