[ https://issues.apache.org/jira/browse/FLUME-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jason.lee updated FLUME-3097: ----------------------------- Remaining Estimate: 12h Original Estimate: 12h > CLONE - WAL data grows forever even though data is delivered in E2E > ------------------------------------------------------------------- > > Key: FLUME-3097 > URL: https://issues.apache.org/jira/browse/FLUME-3097 > Project: Flume > Issue Type: Bug > Components: Master, Node, Sinks+Sources > Reporter: jason.lee > Priority: Blocker > Original Estimate: 12h > Remaining Estimate: 12h > > With a heavy enough write load, it appears that the E2E agent WAL will get > into a state where data just gets constantly shuffled around between the > various directories / states (e.g. writing, logged, sending, sent). When this > happens, the WAL directories grow indefinitely until the disk is exhausted, > no matter how much data caused the problem. > To reproduce: > * Use the supplied config (or something similar). > * Write to the agent source at a rate of > 1MB/s for a short burst (using > something like the provided generator below). > * Note that data is delivered to the collectorSink but the agent WAL manager > constantly grows the data. > The config: > {code} > n1 : execStream("tail -F datafile") | agentE2ESink("host", 12345); > n2 : collectorSource(12345) | collectorSink("file://...", "n2-"); > {code} > Generator: > {code} > perl -e 'while (1) { print $i++, "\n"; }' >> datafile > {code} > This looks and smells just like FLUME-430. I haven't yet examined the WAL or > destination data for duplicates / missing events. -- This message was sent by Atlassian JIRA (v6.3.15#6346)