Hi everyone

FWIW I'll share our experience with 0.10.0b1 on centos6 boxes.
Hopefully this will be helpful for the developers, or some other users
who are experiencing similar issues.

We've configured heka to slurp many application and system logs. All
these logs have their Logger names appropriately set so that they can
be injected to a pipeline using a single TCPOutput configured like so
(jinja2 template):

[tcp-output]
type = "TcpOutput"
address = "{{ aggregator.host }}:{{ aggregator.port }}"
message_matcher = "Type !~ /^heka/"
keep_alive = true
use_buffering = true

[tcp-output.buffering]
max_file_size = 67108864  # 64MiB
max_buffer_size = 268435456  # 256MiB
full_action = "shutdown"
cursor_update_count = 50

[tcp-output.retries]
max_retries = 5
max_delay = "180s"
delay = "30s"

If the queue gets full, heka will honor the full_action directive and
shutdown. Heka is supervised so it will eventually get restarted.
Now, the buffered output behavior that surprises me is that after heka
is restarted (with a full queue on disk), it'll immediately shutdown
again, as the queue is still full.
FWIW here's a heka report [1] with all log inputs disabled, and
TCPOutput enabled (the idea was to let heka process the queue, but
that did not happen).

The queue_dir also contains a surprisingly large number of queue files
[2], all of them empty but 2.log which is 268435316 bytes.
checkpoint.txt points at it (2 268435316) so I have no idea where all
the disk queues come from or why queues are not rotated every ~64MB.

In the absence of connectivity or other type of issues (the
heka-aggregator is working fine for other hosts) I would expect
TCPOutput to try to establish a connection and start processing the
backlog before processing other Inputs, but I can confirm (with
tcpdump) that heka is not establishing any connection whatsoever so it
shutdowns again.

If this is not a race condition between TCPOutput and the many
LogstreamerInputs, would it be recommended to set full_action='block'
so heka has some room to try to process the backlog first?

The other buffered output issue we've experienced is related to
ESOutput, which is quite critical for us.
The ESOutput runs on dedicated heka machines that consume messages
from two RabbitMQ brokers. These have poolsize = 150 and
plugin_chansize = 50 since they have plenty of memory and only run
heka.
We started seeing many "Diagnostics: X packs have been idle for more
than Y seconds".
The inputs, decoders, outputs, encoders we use are all bundled with
heka itself, so it is not a matter of poorly written code that does
not free packs for recycling.
We disabled buffering, and those messages disappeared right away.

Worth noting that we still see some "HTTP response error status: 400
Bad Request" messages with ESOutputs with buffering disabled, but
that's a separate issue I believe.
I've managed to find another heka report [3] in our logs, this one
corresponds to the time when the few ESOutputs were reporting idle
packs.

On the bright side, I for one am super happy with heka and will keep
vouching for and supporting it.
Some more work on the buffered output needs doing but we'll get there :-)

I'm happy to refine these notes above and turn them into issues in GH.
I'm here to help, so if there's anything I can do, ask away!
I'll also try to update the docs when possible, as we build knowledge
while deploying heka to production.

Cheers

[1]
Sep 11 08:58:56 consume2 heka: 2015/09/11 08:58:56 Queue report initiated.
Sep 11 08:58:56 consume2 heka: 2015/09/11 08:58:56
========[heka.all-report]========
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka: ====Globals====
Sep 11 08:58:56 consume2 heka: inputRecycleChan:
Sep 11 08:58:56 consume2 heka:     InChanCapacity: 100
Sep 11 08:58:56 consume2 heka:     InChanLength: 100
Sep 11 08:58:56 consume2 heka: injectRecycleChan:
Sep 11 08:58:56 consume2 heka:     InChanCapacity: 100
Sep 11 08:58:56 consume2 heka:     InChanLength: 100
Sep 11 08:58:56 consume2 heka: Router:
Sep 11 08:58:56 consume2 heka:     InChanCapacity: 30
Sep 11 08:58:56 consume2 heka:     InChanLength: 0
Sep 11 08:58:56 consume2 heka:     ProcessMessageCount: 0
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka: ====Inputs====
Sep 11 08:58:56 consume2 heka: NONE
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka: ====Splitters====
Sep 11 08:58:56 consume2 heka: NONE
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka: ====Decoders====
Sep 11 08:58:56 consume2 heka: NONE
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka: ====Filters====
Sep 11 08:58:56 consume2 heka: NONE
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka: ====Outputs====
Sep 11 08:58:56 consume2 heka: tcp-output:
Sep 11 08:58:56 consume2 heka:     InChanCapacity: 2
Sep 11 08:58:56 consume2 heka:     InChanLength: 0
Sep 11 08:58:56 consume2 heka:     MatchChanCapacity: 30
Sep 11 08:58:56 consume2 heka:     MatchChanLength: 0
Sep 11 08:58:56 consume2 heka:     MatchAvgDuration: 0
Sep 11 08:58:56 consume2 heka:     ProcessMessageCount: 0
Sep 11 08:58:56 consume2 heka:
Sep 11 08:58:56 consume2 heka: ====Encoders====
Sep 11 08:58:56 consume2 heka: tcp-output-ProtobufEncoder:
Sep 11 08:58:56 consume2 heka:     ProcessMessageCount: 0
Sep 11 08:58:56 consume2 heka:     ProcessMessageAvgDuration: 0
Sep 11 08:58:56 consume2 heka: ========
Sep 11 08:58:56 consume2 heka:

[2]
[root@consume2 ~]# ls /var/cache/heka/output_queue/tcp_output/*.log|wc -l
547

[3]
Sep  9 10:02:58 heka-consumer2 heka: 2015/09/09 10:02:58 Queue report initiated.
Sep  9 10:02:58 heka-consumer2 heka: 2015/09/09 10:02:58
========[heka.all-report]========
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:02:58 heka-consumer2 heka: ====Globals====
Sep  9 10:02:58 heka-consumer2 heka: inputRecycleChan:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 200
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka: injectRecycleChan:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 200
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 175
Sep  9 10:02:58 heka-consumer2 heka: Router:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 75
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 424
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:02:58 heka-consumer2 heka: ====Inputs====
Sep  9 10:02:58 heka-consumer2 heka: system-yum:
Sep  9 10:02:58 heka-consumer2 heka: system-syslog:
Sep  9 10:02:58 heka-consumer2 heka: project1-httpd-error-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: system-yum-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: es-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster1-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: io-control-beacon-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: system-rsyslog-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: system-cron:
Sep  9 10:02:58 heka-consumer2 heka: system-ansible-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: es-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: system-rsyslog-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster3-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: project1-httpd-access-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: system-yum-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster2-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster2-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: io-control-beacon:
Sep  9 10:02:58 heka-consumer2 heka: system-auth:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster3-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: project1-httpd-error-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster1-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: io-control-beacon-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka: system-ansible-mq1-input:
Sep  9 10:02:58 heka-consumer2 heka: project1-httpd-access-mq2-input:
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:02:58 heka-consumer2 heka: ====Splitters====
Sep  9 10:02:58 heka-consumer2 heka: io-control-beacon-NullSplitter-stdout:
Sep  9 10:02:58 heka-consumer2 heka: io-control-beacon-NullSplitter-stderr:
Sep  9 10:02:58 heka-consumer2 heka: system-auth-TokenSplitter-1:
Sep  9 10:02:58 heka-consumer2 heka: system-yum-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster2-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster2-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
project1-httpd-error-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: system-yum-TokenSplitter-1:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster1-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: system-syslog-TokenSplitter-1:
Sep  9 10:02:58 heka-consumer2 heka:
io-control-beacon-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: system-cron-TokenSplitter-1:
Sep  9 10:02:58 heka-consumer2 heka:
project1-httpd-access-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
system-ansible-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: es-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
project1-httpd-error-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
project1-httpd-access-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: system-yum-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster3-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
system-ansible-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster1-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: es-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
io-control-beacon-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
system-rsyslog-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
system-rsyslog-mq1-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka: hdp-cluster3-mq2-input-HekaFramingSplitter:
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:02:58 heka-consumer2 heka: ====Decoders====
Sep  9 10:02:58 heka-consumer2 heka:
hdp-cluster1-mq1-input-hdp-cluster1-multidecoder:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 10
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 22
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageAvgDuration: 714676
Sep  9 10:02:58 heka-consumer2 heka:
io-control-beacon-mq2-input-io-control-beacon-decoder:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 5
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 94
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageAvgDuration: 32421
Sep  9 10:02:58 heka-consumer2 heka:
hdp-cluster1-mq2-input-hdp-cluster1-multidecoder:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 9
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 28
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageAvgDuration: 883540
Sep  9 10:02:58 heka-consumer2 heka:
hdp-cluster3-mq1-input-hdp-cluster3-multidecoder:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 3
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 17
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageAvgDuration: 636942
Sep  9 10:02:58 heka-consumer2 heka:
io-control-beacon-mq1-input-io-control-beacon-decoder:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 7
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 94
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageAvgDuration: 40874
Sep  9 10:02:58 heka-consumer2 heka:
system-rsyslog-mq2-input-system-rsyslog-multidecoder:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 1
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageAvgDuration: 247177
Sep  9 10:02:58 heka-consumer2 heka:
system-rsyslog-mq1-input-system-rsyslog-multidecoder:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 1
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageAvgDuration: 235225
Sep  9 10:02:58 heka-consumer2 heka:
hdp-cluster3-mq2-input-hdp-cluster3-multidecoder:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 6
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageCount: 1
Sep  9 10:02:58 heka-consumer2 heka:     ProcessMessageAvgDuration: 167061
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:02:58 heka-consumer2 heka: ====Filters====
Sep  9 10:02:58 heka-consumer2 heka: NONE
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:02:58 heka-consumer2 heka: ====Outputs====
Sep  9 10:02:58 heka-consumer2 heka: system-es-output:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 2
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     MatchChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     MatchChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     MatchAvgDuration: 237
Sep  9 10:02:58 heka-consumer2 heka: io-control-beacon-output:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 2
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     MatchChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     MatchChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     MatchAvgDuration: 121
Sep  9 10:02:58 heka-consumer2 heka: hdp-es-output:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 2
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     MatchChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     MatchChanLength: 75
Sep  9 10:02:58 heka-consumer2 heka:     MatchAvgDuration: 299
Sep  9 10:02:58 heka-consumer2 heka: dashboard-output:
Sep  9 10:02:58 heka-consumer2 heka:     InChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     InChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     MatchChanCapacity: 75
Sep  9 10:02:58 heka-consumer2 heka:     MatchChanLength: 0
Sep  9 10:02:58 heka-consumer2 heka:     MatchAvgDuration: 265
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:02:58 heka-consumer2 heka: ====Encoders====
Sep  9 10:02:58 heka-consumer2 heka: system-es-output-system-es-encoder:
Sep  9 10:02:58 heka-consumer2 heka:
io-control-beacon-output-io-control-beacon-encoder:
Sep  9 10:02:58 heka-consumer2 heka: hdp-es-output-hdp-es-encoder:
Sep  9 10:02:58 heka-consumer2 heka: ========
Sep  9 10:02:58 heka-consumer2 heka:
Sep  9 10:03:21 heka-consumer2 heka: 2015/09/09 10:03:21 Diagnostics:
55 packs have been idle more than 120 seconds.
Sep  9 10:03:21 heka-consumer2 heka: 2015/09/09 10:03:21 Diagnostics:
(input) Plugin names and quantities found on idle packs:
Sep  9 10:03:21 heka-consumer2 heka: 2015/09/09 10:03:21 Diagnostics:
#011io-control-beacon-output: 54
Sep  9 10:03:21 heka-consumer2 heka: 2015/09/09 10:03:21 Diagnostics:
#011hdp-es-output: 1

-- 
"If you want to go fast, go alone. If you want to go far, go together."
_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to