[graylog2] Re: Graylog2 0.91.0-rc.1 "Cluster information currently unavailable"

Dustin Tennill Wed, 22 Oct 2014 09:08:06 -0700

I think have resolved my issue, and I don't think it was related to message 
spooling on disk specifically. I disabled the message_cache_spool_dir 
setting and went back to just using ram.


Theory: 
The elasticsearch cluster wasn't taking messages fast enough, thus heap 
size continued to grow until GL server crashed or the message spool filled 
up.  

After increasing the number of elasticsearch servers from 2 to 4, GL server 
now runs longer. 
Next, the output_batch_size was changed from 100 to 6000 - the heap size 
now grows less quickly. 

Better. 

Dustin


On Tuesday, October 7, 2014 5:29:34 PM UTC-4, Dustin Tennill wrote:
>
> OK. 
>
> Separate filesystem doesn't resolve the issue, graylog2 just runs until 
> the message_cache_spool_dir fills up and then crashes. 
>
> I am writing a couple of scripts to catch when the disk is nearly full, 
> stop services, delete all the files in the message_cache_spool_dir and 
> start services back up. 
>
> I am going to try a fresh install of graylog on another host and see if 
> the issue occurs. 
>
>
>
>
>
> On Monday, October 6, 2014 10:09:57 AM UTC-4, Dustin Tennill wrote:
>>
>> It crashed once the disk filled up. 
>>
>> I am going create a partition just for the message_cache_spool_dir to see 
>> if perhaps it is aware of full disk and will resolve the issue itself. 
>>
>> Anyone have any specific information on this setting? Documentation 
>> doesn't mention it yet, and I can't see any way to handle it other than 
>> stop/delete files/start. 
>>
>>
>>
>> On Sunday, October 5, 2014 3:22:10 PM UTC-4, Dustin Tennill wrote:
>>>
>>> The spool directory is growing at a steady rate - around 500M every five 
>>> minutes.
>>>
>>> root@myhost:/var/lib/graylog2-server/message-cache-spool# sleep 300; du 
>>> -sh *; date;sleep 300; du -sh *; date;sleep 300; du -sh *; date;
>>> 40K    input-cache
>>> 664K    input-cache.p
>>> 900K    input-cache.t
>>> 61M    output-cache
>>> 2.9G    output-cache.p
>>> 904K    output-cache.t
>>> Sun Oct  5 14:45:50 EDT 2014
>>> 40K    input-cache
>>> 664K    input-cache.p
>>> 900K    input-cache.t
>>> 61M    output-cache
>>> 3.3G    output-cache.p
>>> 904K    output-cache.t
>>> Sun Oct  5 14:50:50 EDT 2014
>>> 40K    input-cache
>>> 664K    input-cache.p
>>> 900K    input-cache.t
>>> 61M    output-cache
>>> 3.7G    output-cache.p
>>> 1.7M    output-cache.t
>>> Sun Oct  5 14:55:50 EDT 2014
>>>
>>> Based on past experience, this will grow until graylog2 crashes. 
>>>
>>>
>>> On Sunday, October 5, 2014 2:18:39 PM UTC-4, Dustin Tennill wrote:
>>>>
>>>> Apologies to the group - I didn't realize my posts were being moderated 
>>>> until I had attempted post the same comment several times. 
>>>>
>>>> I enabled the message_cache_off_heap setting and it seems to have 
>>>> resolved slow GC crash issue.
>>>> message_cache_off_heap = true 
>>>> message_cache_spool_dir = /var/lib/graylog2-server/message-cache-spool
>>>> With this setting on, my 20G heap stays between 5G and 10G utilized. 
>>>>
>>>> However, as far as I can tell the message_cache_spool_dir seems to grow 
>>>> until the disk fills up. 
>>>>
>>>> Has anyone experienced this? Is there a cleanup operation I should be 
>>>> performing? 
>>>>
>>>> Dustin
>>>>
>>>>
>>>> On Wednesday, October 1, 2014 12:16:19 PM UTC-4, Dustin Tennill wrote:
>>>>>
>>>>> All,
>>>>>
>>>>> I recently upgraded to rc.1/ElasticSearch 1.3.2 and am having some 
>>>>> issues. We are not in production yet, and I understand that I should 
>>>>> expect 
>>>>> problems with the release candidate code. 
>>>>>
>>>>> *Our Graylog Environment:*
>>>>> A single Graylog Radio Server (0.91.0-rc.1)
>>>>> A single Graylog Server (0.91.0-rc.1)
>>>>> Java Settings:  -Xmx20480M -Xms20480M -verbose:gc 
>>>>> -Xloggc:/var/log/grayloggc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>> A single Graylog-Web Server (0.91.0-rc.1)
>>>>> Two ElasticSearch Nodes (1.3.2)
>>>>> Statistics: 6000-7000 msgs per second when things are working correctly
>>>>>
>>>>> *1. "Cluster information currently unavailable" message shown when I 
>>>>> browse to the system page. *
>>>>> Since upgrading to the current release, I note that the ElasticSearch 
>>>>> health indication page nearly always shows "Cluster information currently 
>>>>> unavailable". 
>>>>> My ElasticSearch cluster appears healthy to me. I am using the head 
>>>>> plugin, and can confirm all is "green" and both nodes are caught up. 
>>>>> At least once this has worked correctly - not sure why. 
>>>>>
>>>>> This doesn't appear to mean anything, data is still coming in and 
>>>>> being processed correctly. 
>>>>>
>>>>> *2. Graylog2-server - crashes eventually due to slow garbage 
>>>>> collection. *
>>>>> I don't know for sure that this is WHY I seem to have a crash, but the 
>>>>> trend seems to be if GC takes longer than a few seconds, I start seeing 
>>>>> these message patterns. 
>>>>>
>>>>> 2014-10-01 11:59:09,598 WARN : org.elasticsearch.monitor.jvm - 
>>>>> [graylog2-server] [gc][old][2150][139] duration [1.1m], collections 
>>>>> [1]/[1.1m], total [1.1m]/[1.9h], memory [17.8gb]->[17.9gb]/[19.1gb], 
>>>>> all_pools {[young] [4.5gb]->[4.5gb]/[4.7gb]}{[survivor] 
>>>>> [0b]->[0b]/[911mb]}{[old] [13.3gb]->[13.3gb]/[13.3gb]}
>>>>> 2014-10-01 11:59:09,601 ERROR: 
>>>>> org.graylog2.jersey.container.netty.NettyContainer - Uncaught exception 
>>>>> during jersey resource handling
>>>>> java.io.IOException: Broken pipe
>>>>>     at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>>>>>     at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>>>>>     at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>>>>>     at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>>>>>     at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.SocketSendBufferPool$UnpooledSendBuffer.transferTo(SocketSendBufferPool.java:203)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:201)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:146)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
>>>>>     at org.jboss.netty.channel.Channels.write(Channels.java:725)
>>>>>     at 
>>>>> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
>>>>>     at 
>>>>> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:784)
>>>>>     at 
>>>>> org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:280)
>>>>>     at 
>>>>> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleDownstream(ChunkedWriteHandler.java:121)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
>>>>>     at org.jboss.netty.channel.Channels.write(Channels.java:704)
>>>>>     at org.jboss.netty.channel.Channels.write(Channels.java:671)
>>>>>     at 
>>>>> org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
>>>>>     at 
>>>>> org.graylog2.jersey.container.netty.NettyContainer$NettyResponseWriter$1.write(NettyContainer.java:142)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:229)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:299)
>>>>>     at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
>>>>>     at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
>>>>>     at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
>>>>>     at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
>>>>>     at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
>>>>>     at java.io.BufferedWriter.flush(BufferedWriter.java:254)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.ReaderWriter.writeToAsString(ReaderWriter.java:192)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:129)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:99)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.StringMessageProvider.writeTo(StringMessageProvider.java:59)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:265)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:250)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:162)
>>>>>     at 
>>>>> org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:106)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:162)
>>>>>     at 
>>>>> org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:85)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:162)
>>>>>     at 
>>>>> org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1154)
>>>>>     at 
>>>>> org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:621)
>>>>>     at 
>>>>> org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:377)
>>>>>     at 
>>>>> org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:367)
>>>>>     at 
>>>>> org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:274)
>>>>>     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
>>>>>     at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
>>>>>     at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
>>>>>     at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
>>>>>     at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
>>>>>     at 
>>>>> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:297)
>>>>>     at 
>>>>> org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
>>>>>     at 
>>>>> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1028)
>>>>>     at 
>>>>> org.graylog2.jersey.container.netty.NettyContainer.messageReceived(NettyContainer.java:356)
>>>>>     at 
>>>>> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>>     at 
>>>>> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>>>>>     at 
>>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
>>>>>     at 
>>>>> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
>>>>>     at 
>>>>> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
>>>>>     at 
>>>>> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
>>>>>     at 
>>>>> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>>>>>     at 
>>>>> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>>>>>     at 
>>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
>>>>>     at 
>>>>> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>>>>>     at 
>>>>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>>>     at 
>>>>> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>>>>>     at 
>>>>> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>>>>>     at 
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>     at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>     at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>> I routinely get the java.io.IOException: Broken pipe message if I just 
>>>>> browse to the "System" page directly. 
>>>>>
>>>>> Thoughts? 
>>>>>
>>>>> Any information I didn't provide? 
>>>>>
>>>>> Thanks !!
>>>>>
>>>>> Dustin Tennill
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to graylog2+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[graylog2] Re: Graylog2 0.91.0-rc.1 "Cluster information currently unavailable"

Reply via email to