Johan, this sounds very strange indeed. Can you provide us with some more details?
- What kind of messages are you pouring into Graylog via UDP? (GELF, raw, syslog?) - Do you have any extractors or grok filters running for the messages coming in via UDP? - Any other differences between the TCP and UDP messages? - Can you show us your input configuration? - Are you using reverse DNS lookups? Thank you! Regards, Bernd On 24 February 2015 at 16:45, <sun...@sunner.com> wrote: > Well that could be a suspect if it wasn't for the fact that the old nodes > running on old hardware handle it just fine, along with the fact that the > traffic seems to reach the nodes just fine(i.e it actually fills the journal > up just fine, and the input buffer never breaks a sweat). And it's really > not that much traffic, even spread across four nodes those ~1000 messages > per second will cause this whereas the old nodes are just two and can handle > it just fine. > > About disk tuning, I haven't done much of that, and I realize I forgot to > mention that the Elasticsearch cluster is on separate physical hardware so > there's a minuscule amount of disk I/O happening on the Graylog nodes. > > It's really very strange since it seems like UDP itself isn't to blame, > after all the messages get into Graylog just fine and fills up the journal > rapidly. The screenshot from I linked was from after I had stopped sending > logs, i.e there was no longer any ingress traffic so the Graylog process had > nothing to do except emptying it's journal so it should all be internal > processing and egress traffic to Elasticsearch. And as can be seen in the > screenshot it seems like it's doing it in small bursts. > > In the exact same scenario(i.e when I just streamed a large file into the > system as fast as it could receive it) but with the logs having come over > TCP, it'll still store up a sizable number of messages in the journal, but > the processing of the journaled messages is both more even and vastly > faster. > > So in short it doesn't appear to be the communication itself, but something > happening "inside" the Graylog process, but that only happens when the > messages have been delivered over UDP. > > Regards > Johan > > > On Tuesday, February 24, 2015 at 3:07:47 PM UTC+1, Henrik Johansen wrote: >> >> Could this simply be because TCP avoids (or tries to avoid) congestion >> while UDP does not? >> >> /HJ >> >> On 24 Feb 2015, at 13:50, sun...@sunner.com wrote: >> >> Hello, >> >> With the release of 1.0 we've started moving towards a new cluster of GL >> hosts. These are working very well, with one exception. >> For some reason any reasonably significant UDP traffic will choke the >> message processor, fill up and process buffers on all four hosts, and >> effectively choke up all other message processing as well. >> Normally we do around 2k messages per second, split roughly 50/50 between >> TCP and UDP. Sending the entire TCP load to one host doesn't present a >> problem, it doesn't break a sweat. >> >> I've also experimented a little with sending a large text file using >> rsyslog's imfile module, sending it via TCP will bottleneck us at the ES >> side of things and cause the disk journal fill up fairly rapidly, but it's >> still working at at ~9k messages per second so that's fine. Sending it via >> UDP just causes GL to choke again, fill up the journal to a certain point >> and slowly slowly process the journal at little bursts of a few thousand >> messages followed by several seconds of apparent sleeping(i.e pretty much no >> CPU usage). >> >> During all of this the input buffer never fills up more than at most >> single digit percentages, using TCP the output buffer sometimes moves up to >> 20-30%, with UDP it never moves at all. It's all in the process buffer. >> Sending a large burst of messages and then stopping doesn't seem to affect >> this behavior either, even after the inbound messages stop it still takes a >> long time to process the messages that are already in the journal and >> process buffer. >> I'm using VisualVM to look at the CPU and memory usage, this is a >> screenshot of a UDP session: >> http://i59.tinypic.com/x23xfl.png >> >> I've tried mucking around with various knobs, processbuffer_processors, >> JVM settings, etc, with no results whatsoever, good or bad. >> There's nothing to suggest a problem in neither the graylog nor system >> logs. >> >> Pertinent specs and settings: >> ring_size = 16384 (CPU's have 20 MB L3) >> processbuffer_processors = 5 >> >> Java 8u31 >> Using G1GC with StringDeduplication, I've tried without the latter and >> just using CMC as well, no difference. >> 4 GB Xmx/Xms. >> Linux 3.16.0 >> net.core.rmem_max = 8388608 >> >> These are virtual machines, VMware, 8 GB / 8 vCPU's, Xeon E5-2690's. >> >> Software wise the old nodes are running the same setup more or less, >> except kernel 3.2.0, same JVM, G1GC, etc. Hardware wise, they're physical >> boxes, old Dell 2950's with dual quad core E5440's. That's Core2 era so >> quite a bit slower. >> >> Any ideas? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "graylog2" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to graylog2+u...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> >> > -- > You received this message because you are subscribed to the Google Groups > "graylog2" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to graylog2+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- Developer Tel.: +49 (0)40 609 452 077 Fax.: +49 (0)40 609 452 078 TORCH GmbH - A Graylog company Steckelhörn 11 20457 Hamburg Germany Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175 Geschäftsführer: Lennart Koopmann (CEO) -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to graylog2+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.