Ping…

 

From: 张帅 [mailto:satan.stud...@gmail.com] On Behalf Of zhangshuai.u...@gmail.com
Sent: Wednesday, April 12, 2017 5:29 PM
To: user@ignite.apache.org
Subject: OOM when using Ignite as HDFS Cache

 

Hi there,

 

I’d like to use Ignite as HDFS Cache in my cluster but failed with OOM error. 
Could you help to review my configuration to help avoid it?

 

I’m using DUAL_ASYNC mode. The Ignite nodes can find each other to establish 
the cluster. There are very few changes in default-config.xml but attached for 
your review. The JVM heap size is limited to 1GB. The Ignite suffers from OOM 
exception when I’m running Hadoop benchmark TestDFSIO writing 4*4GB files. I 
think writing 4GB file to HDFS is in streaming so Ignite should work with it. 
It’s acceptable to slow down the write performance to wait Ignite write cached 
data to HDFS but not acceptable to lead crash or data lost.

 

The ignite log is attached as ignite_log.zip, pick some key messages here:

 

17/04/12 00:49:17 INFO [grid-timeout-worker-#19%null%] internal.IgniteKernal: 

Metrics for local node (to disable set 'metricsLogFrequency' to 0)

    ^-- Node [id=9b5dcc35, name=null, uptime=00:26:00:254]

    ^-- H/N/C [hosts=173, nodes=173, CPUs=2276]

    ^-- CPU [cur=0.13%, avg=0.82%, GC=0%]

    ^-- Heap [used=555MB, free=43.3%, comm=979MB]

    ^-- Non heap [used=61MB, free=95.95%, comm=62MB]

    ^-- Public thread pool [active=0, idle=0, qSize=0]

    ^-- System thread pool [active=0, idle=6, qSize=0]

    ^-- Outbound messages queue [size=0]

17/04/12 00:50:06 INFO [disco-event-worker-#35%null%] 
discovery.GridDiscoveryManager: Added new node to topology: TcpDiscoveryNode 
[id=553b5c1a-da0b-43cb-b691-b842352b3105, addrs=[0:0:0:0:0:0:0:1, 
10.152.133.46, 10.55.68.223, 127.0.0.1, 192.168.1.1], 
sockAddrs=[BN1APS0A98852E/10.152.133.46:47500, 
bn1sch010095221.phx.gbl/10.55.68.223:47500, /0:0:0:0:0:0:0:1:47500, 
/192.168.1.1:47500, /127.0.0.1:47500], discPort=47500, order=176, intOrder=175, 
lastExchangeTime=1491983403106, loc=false, ver=2.0.0#20170405-sha1:2c830b0d, 
isClient=false]

[00:50:06] Topology snapshot [ver=176, servers=174, clients=0, CPUs=2288, 
heap=180.0GB]

...

Exception in thread "igfs-client-worker-2-#585%null%" 
java.lang.OutOfMemoryError: GC overhead limit exceeded

  at java.util.Arrays.copyOf(Arrays.java:3332)

  at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)

  at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)

  at java.lang.StringBuffer.append(StringBuffer.java:270)

  at java.io.StringWriter.write(StringWriter.java:112)

  at java.io.PrintWriter.write(PrintWriter.java:456)

  at java.io.PrintWriter.write(PrintWriter.java:473)

  at java.io.PrintWriter.print(PrintWriter.java:603)

  at java.io.PrintWriter.println(PrintWriter.java:756)

  at java.lang.Throwable$WrappedPrintWriter.println(Throwable.java:764)

  at java.lang.Throwable.printStackTrace(Throwable.java:658)

  at java.lang.Throwable.printStackTrace(Throwable.java:721)

  at 
org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:60)

  at 
org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)

  at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)

  at org.apache.log4j.AsyncAppender.append(AsyncAppender.java:162)

  at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)

  at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)

  at org.apache.log4j.Category.callAppenders(Category.java:206)

  at org.apache.log4j.Category.forcedLog(Category.java:391)

  at org.apache.log4j.Category.error(Category.java:322)

  at org.apache.ignite.logger.log4j.Log4JLogger.error(Log4JLogger.java:495)

  at org.apache.ignite.internal.GridLoggerProxy.error(GridLoggerProxy.java:148)

  at org.apache.ignite.internal.util.IgniteUtils.error(IgniteUtils.java:4281)

  at org.apache.ignite.internal.util.IgniteUtils.error(IgniteUtils.java:4306)

  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:126)

  at java.lang.Thread.run(Thread.java:745)

Exception in thread "LeaseRenewer:had...@namenode-vip.yarn3-dev-bn2.bn2.ap.gbl" 
java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception in thread 
"igfs-delete-worker%igfs%9b5dcc35-3a4c-4a90-ac9e-89fdd65302a7%" 
java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception in thread "exchange-worker-#39%null%" java.lang.OutOfMemoryError: GC 
overhead limit exceeded

…

17/04/12 01:40:10 WARN [disco-event-worker-#35%null%] 
discovery.GridDiscoveryManager: Stopping local node according to configured 
segmentation policy.

 

Looking forward to your help.

 

 

Regards,

Shuai Zhang

Reply via email to