jason.lee created FLUME-3097:
--------------------------------

             Summary: CLONE - WAL data grows forever even though data is 
delivered in E2E
                 Key: FLUME-3097
                 URL: https://issues.apache.org/jira/browse/FLUME-3097
             Project: Flume
          Issue Type: Bug
          Components: Master, Node, Sinks+Sources
            Reporter: jason.lee
            Priority: Blocker


With a heavy enough write load, it appears that the E2E agent WAL will get into 
a state where data just gets constantly shuffled around between the various 
directories / states (e.g. writing, logged, sending, sent). When this happens, 
the WAL directories grow indefinitely until the disk is exhausted, no matter 
how much data caused the problem.

To reproduce:
* Use the supplied config (or something similar).
* Write to the agent source at a rate of > 1MB/s for a short burst (using 
something like the provided generator below).
* Note that data is delivered to the collectorSink but the agent WAL manager 
constantly grows the data.

The config:
{code}
n1 : execStream("tail -F datafile") | agentE2ESink("host", 12345);
n2 : collectorSource(12345) | collectorSink("file://...", "n2-");
{code}

Generator:
{code}
perl -e 'while (1) { print $i++, "\n"; }' >> datafile
{code}

This looks and smells just like FLUME-430. I haven't yet examined the WAL or 
destination data for duplicates / missing events.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to