Justin Leet created METRON-2005:
-----------------------------------

             Summary: Batch Writer writes 0-byte files to HDFS on rotation
                 Key: METRON-2005
                 URL: https://issues.apache.org/jira/browse/METRON-2005
             Project: Metron
          Issue Type: Bug
            Reporter: Justin Leet
            Assignee: Justin Leet


This results from https://github.com/apache/metron/pull/505

That PR breaks the standard convention of just choose a file name and rotate 
the file repeatedly, because now any message could get routed to a different 
file based on a Stellar statement.  This break was noted in the PR, because we 
didn't care about the rotation number anyway.

This works fine for the 0th rotation (a new file is opened, data is written, 
file is closed), but on the first rotation we signal to the HdfsWriter that the 
file has been closed in order to limit the maximum number of open files, but 
still create a new file with rotation 1.  This file never receives any data 
(because we no longer maintain an open file reference to it), and the 
SourceHandler for it stays open with the Timer still attempting further 
(pointless rotations). Note that no data is lost, any data that would go into 
this file just instead goes into a new 0 rotation file.

This becomes more obvious the longer the cluster is running or the shorter the 
timeout on a file is.  As each open file attempts rotations, eventually large 
numbers of 0-byte files are created. 

An easy fix for this is to remove the creation of new files during rotations 
(but still perform RotationActions). This means that every file will have a 0 
rotation (which we don't actually use for anything anyway).  More complicated 
things could be done (e.g. evict oldest file from a cache), but it seems heavy 
handed for maintaining a rotation count we don't care about anyway.  
Additionally, the Timer should be cancelled when the reference is removed from 
HdfsWriter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to