Bernd, Correct - that issue started after 0.92.x.
We are still seeing evaluated CPU utilisation but we are attributing that to the fact that 0.92 was loosing messages in our setup. > On 25 Feb 2015, at 17:37, Bernd Ahlers <be...@graylog.com> wrote: > > Henrik, > > uh, okay. I suppose it worked for you in 0.92 as well? > > I will create an issue on GitHub for that. > > Bernd > > On 25 February 2015 at 17:14, Henrik Johansen <h...@myunix.dk> wrote: >> Bernd, >> >> We saw the exact same issue - here is a graph over the CPU idle >> percentage across a few of the cluster nodes during the upgrade : >> >> http://5.9.37.177/graylog_cluster_cpu_idle.png >> >> We went from ~20% CPU utilisation to ~100% CPU utilisation across >> ~200 cores and things only settled down after disabling force_rdns. >> >> >> On 25 Feb 2015, at 11:55, Bernd Ahlers <be...@graylog.com> wrote: >> >> Johan, >> >> the only thing that changed from 0.92 to 1.0 is that the DNS lookup is >> now done when the messages are read from the journal and not in the >> input path where the messages are received. Otherwise, nothing has >> changed in that regard. >> >> We do not do any manual caching of the DNS lookups, but the JVM caches >> them by default. Check >> http://docs.oracle.com/javase/7/docs/technotes/guides/net/properties.html >> for networkaddress.cache.ttl and networkaddress.cache.negative.ttl. >> >> Regards, >> Bernd >> >> On 25 February 2015 at 08:56, <sun...@sunner.com> wrote: >> >> This is strange, I went through all of the settings for my reply, and we are >> indeed using rdns, and it seems to be the culprit. The strangeness is that >> it works fine on the old servers even though they're on the same networks, >> and using the same DNS's and resolver settings. >> Did something regarding reverse DNS change between 0.92 and 1.0? I'm >> thinking perhaps the server is trying to do one lookup per message instead >> of caching reverse lookups, seeing as the latter would result in very little >> DNS traffic since most of the logs will be coming from a small number of >> hosts. >> >> Regards >> Johan >> >> On Tuesday, February 24, 2015 at 5:08:54 PM UTC+1, Bernd Ahlers wrote: >> >> >> Johan, >> >> this sounds very strange indeed. Can you provide us with some more >> details? >> >> - What kind of messages are you pouring into Graylog via UDP? (GELF, >> raw, syslog?) >> - Do you have any extractors or grok filters running for the messages >> coming in via UDP? >> - Any other differences between the TCP and UDP messages? >> - Can you show us your input configuration? >> - Are you using reverse DNS lookups? >> >> Thank you! >> >> Regards, >> Bernd >> >> On 24 February 2015 at 16:45, <sun...@sunner.com> wrote: >> >> Well that could be a suspect if it wasn't for the fact that the old >> nodes >> running on old hardware handle it just fine, along with the fact that >> the >> traffic seems to reach the nodes just fine(i.e it actually fills the >> journal >> up just fine, and the input buffer never breaks a sweat). And it's >> really >> not that much traffic, even spread across four nodes those ~1000 >> messages >> per second will cause this whereas the old nodes are just two and can >> handle >> it just fine. >> >> About disk tuning, I haven't done much of that, and I realize I forgot >> to >> mention that the Elasticsearch cluster is on separate physical hardware >> so >> there's a minuscule amount of disk I/O happening on the Graylog nodes. >> >> It's really very strange since it seems like UDP itself isn't to blame, >> after all the messages get into Graylog just fine and fills up the >> journal >> rapidly. The screenshot from I linked was from after I had stopped >> sending >> logs, i.e there was no longer any ingress traffic so the Graylog process >> had >> nothing to do except emptying it's journal so it should all be internal >> processing and egress traffic to Elasticsearch. And as can be seen in >> the >> screenshot it seems like it's doing it in small bursts. >> >> In the exact same scenario(i.e when I just streamed a large file into >> the >> system as fast as it could receive it) but with the logs having come >> over >> TCP, it'll still store up a sizable number of messages in the journal, >> but >> the processing of the journaled messages is both more even and vastly >> faster. >> >> So in short it doesn't appear to be the communication itself, but >> something >> happening "inside" the Graylog process, but that only happens when the >> messages have been delivered over UDP. >> >> Regards >> Johan >> >> >> On Tuesday, February 24, 2015 at 3:07:47 PM UTC+1, Henrik Johansen >> wrote: >> >> >> Could this simply be because TCP avoids (or tries to avoid) congestion >> while UDP does not? >> >> /HJ >> >> On 24 Feb 2015, at 13:50, sun...@sunner.com wrote: >> >> Hello, >> >> With the release of 1.0 we've started moving towards a new cluster of >> GL >> hosts. These are working very well, with one exception. >> For some reason any reasonably significant UDP traffic will choke the >> message processor, fill up and process buffers on all four hosts, and >> effectively choke up all other message processing as well. >> Normally we do around 2k messages per second, split roughly 50/50 >> between >> TCP and UDP. Sending the entire TCP load to one host doesn't present a >> problem, it doesn't break a sweat. >> >> I've also experimented a little with sending a large text file using >> rsyslog's imfile module, sending it via TCP will bottleneck us at the >> ES >> side of things and cause the disk journal fill up fairly rapidly, but >> it's >> still working at at ~9k messages per second so that's fine. Sending it >> via >> UDP just causes GL to choke again, fill up the journal to a certain >> point >> and slowly slowly process the journal at little bursts of a few >> thousand >> messages followed by several seconds of apparent sleeping(i.e pretty >> much no >> CPU usage). >> >> During all of this the input buffer never fills up more than at most >> single digit percentages, using TCP the output buffer sometimes moves >> up to >> 20-30%, with UDP it never moves at all. It's all in the process buffer. >> Sending a large burst of messages and then stopping doesn't seem to >> affect >> this behavior either, even after the inbound messages stop it still >> takes a >> long time to process the messages that are already in the journal and >> process buffer. >> I'm using VisualVM to look at the CPU and memory usage, this is a >> screenshot of a UDP session: >> http://i59.tinypic.com/x23xfl.png >> >> I've tried mucking around with various knobs, processbuffer_processors, >> JVM settings, etc, with no results whatsoever, good or bad. >> There's nothing to suggest a problem in neither the graylog nor system >> logs. >> >> Pertinent specs and settings: >> ring_size = 16384 (CPU's have 20 MB L3) >> processbuffer_processors = 5 >> >> Java 8u31 >> Using G1GC with StringDeduplication, I've tried without the latter and >> just using CMC as well, no difference. >> 4 GB Xmx/Xms. >> Linux 3.16.0 >> net.core.rmem_max = 8388608 >> >> These are virtual machines, VMware, 8 GB / 8 vCPU's, Xeon E5-2690's. >> >> Software wise the old nodes are running the same setup more or less, >> except kernel 3.2.0, same JVM, G1GC, etc. Hardware wise, they're >> physical >> boxes, old Dell 2950's with dual quad core E5440's. That's Core2 era so >> quite a bit slower. >> >> Any ideas? >> >> -- >> You received this message because you are subscribed to the Google >> Groups >> "graylog2" group. >> To unsubscribe from this group and stop receiving emails from it, send >> an >> email to graylog2+u...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> You received this message because you are subscribed to the Google >> Groups >> "graylog2" group. >> To unsubscribe from this group and stop receiving emails from it, send >> an >> email to graylog2+u...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> >> >> >> >> -- >> Developer >> >> Tel.: +49 (0)40 609 452 077 >> Fax.: +49 (0)40 609 452 078 >> >> TORCH GmbH - A Graylog company >> Steckelhörn 11 >> 20457 Hamburg >> Germany >> >> Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175 >> Geschäftsführer: Lennart Koopmann (CEO) >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "graylog2" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to graylog2+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> >> >> >> >> -- >> Developer >> >> Tel.: +49 (0)40 609 452 077 >> Fax.: +49 (0)40 609 452 078 >> >> TORCH GmbH - A Graylog company >> Steckelhörn 11 >> 20457 Hamburg >> Germany >> >> Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175 >> Geschäftsführer: Lennart Koopmann (CEO) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "graylog2" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to graylog2+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "graylog2" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to graylog2+unsubscr...@googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. > > > > -- > Developer > > Tel.: +49 (0)40 609 452 077 > Fax.: +49 (0)40 609 452 078 > > TORCH GmbH - A Graylog company > Steckelhörn 11 > 20457 Hamburg > Germany > > Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175 > Geschäftsführer: Lennart Koopmann (CEO) > > -- > You received this message because you are subscribed to the Google Groups > "graylog2" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to graylog2+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to graylog2+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.