Selman Kayrancioglu created FLUME-3319:
------------------------------------------
Summary: Is there a chance that TAILDIR Source skipping files?
Key: FLUME-3319
URL: https://issues.apache.org/jira/browse/FLUME-3319
Project: Flume
Issue Type: Question
Components: Sinks+Sources
Affects Versions: 1.9.0
Environment: {{flume-env.sh}}
{code:bash}
export JAVA_OPTS="-Xms100m -Xmx1000m -Dcom.sun.management.jmxremote
-Dflume.root.logger=INFO,console
-javaagent:/opt/flume/flume/jmx_prometheus_javaagent-0.11.0.jar=5000:/opt/flume/flume/jmx_exporter.yml"
{code}
{{java -version}}
{code}
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
{code}
{{OS}}:
{code:java}
Linux 4.9.127-32.el7.x86_64 #1 SMP Mon Sep 17 13:40:58 UTC 2018 x86_64 x86_64
x86_64 GNU/Linux
{code}
Reporter: Selman Kayrancioglu
We are using TAILDIR Source + Kafka Sink with following configuration:
{noformat}
tuzla2kafka.sources = tuzla
tuzla2kafka.channels = c1
tuzla2kafka.sinks = kafka
tuzla2kafka.sources.tuzla.type = TAILDIR
tuzla2kafka.sources.tuzla.channels = c1
tuzla2kafka.sources.tuzla.positionFile =
/data/flume/positions/tuzla2kafka-taildir_position.json
tuzla2kafka.sources.tuzla.filegroups = tuzla_fluentd
tuzla2kafka.sources.tuzla.filegroups.tuzla_fluentd =
/data/tuzla/fluentd/event_log_production.*.log
tuzla2kafka.channels.c1.type = file
tuzla2kafka.channels.c1.checkpointDir = /data/flume/file_channels/c1/checkpoint
tuzla2kafka.channels.c1.dataDirs = /data/flume/file_channels/c1/data
tuzla2kafka.channels.c1.capacity = 1000000
tuzla2kafka.sinks.kafka.type = org.apache.flume.sink.kafka.KafkaSink
tuzla2kafka.sinks.kafka.channel = c1
tuzla2kafka.sinks.kafka.kafka.topic = mini-pipeline
tuzla2kafka.sinks.kafka.kafka.bootstrap.servers = kafka1:9092,kafka2:9092
tuzla2kafka.sinks.kafka.kafka.batchSize = 10000
tuzla2kafka.sinks.kafka.kafka.allowTopicOverride = false
{noformat}
Log files in {{tuzla2kafka.sources.tuzla.filegroups.tuzla_fluentd}} are rotated
hourly and each one of them ~1.5GB. We're testing this configuration for 3 days
and we noticed that Flume skipped 3 files in 3 days. We were not able to see
'Opening file / Closed file' in Flume logs for these 3 files. Is this a known
bug? We're trying to switch from {{fluentd}} to {{flume}} and this behaviour
eliminates {{flume}} as an alternative.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]