Johan,

this sounds very strange indeed. Can you provide us with some more details?

- What kind of messages are you pouring into Graylog via UDP? (GELF,
raw, syslog?)
- Do you have any extractors or grok filters running for the messages
coming in via UDP?
- Any other differences between the TCP and UDP messages?
- Can you show us your input configuration?
- Are you using reverse DNS lookups?

Thank you!

Regards,
Bernd

On 24 February 2015 at 16:45,  <sun...@sunner.com> wrote:
> Well that could be a suspect if it wasn't for the fact that the old nodes
> running on old hardware handle it just fine, along with the fact that the
> traffic seems to reach the nodes just fine(i.e it actually fills the journal
> up just fine, and the input buffer never breaks a sweat). And it's really
> not that much traffic, even spread across four nodes those ~1000 messages
> per second will cause this whereas the old nodes are just two and can handle
> it just fine.
>
> About disk tuning, I haven't done much of that, and I realize I forgot to
> mention that the Elasticsearch cluster is on separate physical hardware so
> there's a minuscule amount of disk I/O happening on the Graylog nodes.
>
> It's really very strange since it seems like UDP itself isn't to blame,
> after all the messages get into Graylog just fine and fills up the journal
> rapidly. The screenshot from I linked was from after I had stopped sending
> logs, i.e there was no longer any ingress traffic so the Graylog process had
> nothing to do except emptying it's journal so it should all be internal
> processing and egress traffic to Elasticsearch. And as can be seen in the
> screenshot it seems like it's doing it in small bursts.
>
> In the exact same scenario(i.e when I just streamed a large file into the
> system as fast as it could receive it) but with the logs having come over
> TCP, it'll still store up a sizable number of messages in the journal, but
> the processing of the journaled messages is both more even and vastly
> faster.
>
> So in short it doesn't appear to be the communication itself, but something
> happening "inside" the Graylog process, but that only happens when the
> messages have been delivered over UDP.
>
> Regards
> Johan
>
>
> On Tuesday, February 24, 2015 at 3:07:47 PM UTC+1, Henrik Johansen wrote:
>>
>> Could this simply be because TCP avoids (or tries to avoid) congestion
>> while UDP does not?
>>
>> /HJ
>>
>> On 24 Feb 2015, at 13:50, sun...@sunner.com wrote:
>>
>> Hello,
>>
>> With the release of 1.0 we've started moving towards a new cluster of GL
>> hosts. These are working very well, with one exception.
>> For some reason any reasonably significant UDP traffic will choke the
>> message processor, fill up and process buffers on all four hosts, and
>> effectively choke up all other message processing as well.
>> Normally we do around 2k messages per second, split roughly 50/50 between
>> TCP and UDP. Sending the entire TCP load to one host doesn't present a
>> problem, it doesn't break a sweat.
>>
>> I've also experimented a little with sending a large text file using
>> rsyslog's imfile module, sending it via TCP will bottleneck us at the ES
>> side of things and cause the disk journal fill up fairly rapidly, but it's
>> still working at at ~9k messages per second so that's fine. Sending it via
>> UDP just causes GL to choke again, fill up the journal to a certain point
>> and slowly slowly process the journal at little bursts of a few thousand
>> messages followed by several seconds of apparent sleeping(i.e pretty much no
>> CPU usage).
>>
>> During all of this the input buffer never fills up more than at most
>> single digit percentages, using TCP the output buffer sometimes moves up to
>> 20-30%, with UDP it never moves at all. It's all in the process buffer.
>> Sending a large burst of messages and then stopping doesn't seem to affect
>> this behavior either, even after the inbound messages stop it still takes a
>> long time to process the messages that are already in the journal and
>> process buffer.
>> I'm using VisualVM to look at the CPU and memory usage, this is a
>> screenshot of a UDP session:
>> http://i59.tinypic.com/x23xfl.png
>>
>> I've tried mucking around with various knobs, processbuffer_processors,
>> JVM settings, etc, with no results whatsoever, good or bad.
>> There's nothing to suggest a problem in neither the graylog nor system
>> logs.
>>
>> Pertinent specs and settings:
>> ring_size = 16384 (CPU's have 20 MB L3)
>> processbuffer_processors = 5
>>
>> Java 8u31
>> Using G1GC with StringDeduplication, I've tried without the latter and
>> just using CMC as well, no difference.
>> 4 GB Xmx/Xms.
>> Linux 3.16.0
>> net.core.rmem_max = 8388608
>>
>> These are virtual machines, VMware, 8 GB / 8 vCPU's, Xeon E5-2690's.
>>
>> Software wise the old nodes are running the same setup more or less,
>> except kernel 3.2.0, same JVM, G1GC, etc. Hardware wise, they're physical
>> boxes, old Dell 2950's with dual quad core E5440's. That's Core2 era so
>> quite a bit slower.
>>
>> Any ideas?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "graylog2" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to graylog2+u...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "graylog2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to graylog2+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Developer

Tel.: +49 (0)40 609 452 077
Fax.: +49 (0)40 609 452 078

TORCH GmbH - A Graylog company
Steckelhörn 11
20457 Hamburg
Germany

Commercial Reg. (Registergericht): Amtsgericht Hamburg, HRB 125175
Geschäftsführer: Lennart Koopmann (CEO)

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to graylog2+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to