[
https://issues.apache.org/jira/browse/GIRAPH-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655203#comment-14655203
]
Vitaly Tsvetkov commented on GIRAPH-1026:
-----------------------------------------
Dear Hassan,
thank you very much for your recomendations!
We applied patch from GIRAPH-1025 so we tried to run the job for 100-120M
vertices this way:
{noformat}
hadoop jar
giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
\
org.apache.giraph.GiraphRunner \
-Dgiraph.yarn.task.heap.mb=20480 \
-Dgiraph.isStaticGraph=true \
-Dgiraph.useOutOfCoreGraph=true \
-Dgiraph.maxPartitionsInMemory=0 \
-Dgiraph.enableFlowControlInput=true \
-Dgiraph.lowFreeMemoryFraction=0.2 -Dgiraph.midFreeMemoryFraction=0.3
-Dgiraph.fairFreeMemoryFraction=0.4 \
-Dgiraph.weightedPageRank.superstepCount=5 \
ru.isys.WeightedPageRankComputation \
-vif ru.isys.CrawlerInputFormat -vip /tmp/bigdata \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /tmp/giraph \
-w 4 \
-yj
giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
{noformat}
In container logs we see multiple CheckMemoryCallable calls like this:
{noformat}
INFO [check-memory] ooc.CheckMemoryCallable
(CheckMemoryCallable.java:call(223)) - call: Memory is very limited now.
Calling GC manually. freeMemory = 2836.78MB
INFO [check-memory] ooc.CheckMemoryCallable
(CheckMemoryCallable.java:call(231)) - call: GC is done. GC time = 7.69 sec,
and freeMemory = 10430.13MB
{noformat}
Job successfully read input but during superstep 0 it eventually falls with
such kind of OutOfMemoryError:
{noformat}
INFO [AMRM Callback Handler Thread] yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:onContainersCompleted(571)) - Got response from
RM for container ask, completedCnt=1
INFO [AMRM Callback Handler Thread] yarn.GiraphApplicationMaster
(GiraphApplicationMaster.java:onContainersCompleted(574)) - Got container
status for containerID=container_1438068521412_0225_01_000002, state=COMPLETE,
exitStatus=-104, diagnostics=Container
[pid=23929,containerID=container_1438068521412_0225_01_000002] is running
beyond physical memory limits. Current usage: 20.2 GB of 20 GB physical memory
used; 22.4 GB of 42 GB virtual memory used. Killing container.
Dump of the process-tree for container_1438068521412_0225_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 23929 23924 23929 23929 (bash) 0 1 14376960 435 /bin/bash -c java
-Xmx20480M -Xms20480M -cp
.:${CLASSPATH}:./*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HADOOP_MAPRED_HOME/lib/*:$MR2_CLASSPATH:./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/263-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*:
org.apache.giraph.yarn.GiraphYarnTask 1438068521412 225 2 1
1>/var/log/hadoop-yarn/container/application_1438068521412_0225/container_1438068521412_0225_01_000002/task-2-stdout.log
2>/var/log/hadoop-yarn/container/application_1438068521412_0225/container_1438068521412_0225_01_000002/task-2-stderr.log
|- 23937 23929 23929 23929 (java) 46205 9377 24024481792 5283157 java
-Xmx20480M -Xms20480M -cp
.:${CLASSPATH}:./*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HADOOP_MAPRED_HOME/lib/*:$MR2_CLASSPATH:./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/263-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*::./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/263-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*::./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/263-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*:
org.apache.giraph.yarn.GiraphYarnTask 1438068521412 225 2 1
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
{noformat}
We tried to increase lowFreeMemoryFraction and others to 0.1 more, tried to use
even properties for out-of-core messages (by the way, should we still use their
now with new adaptive out-of-core mechanism?)
{noformat}
-Dgiraph.messageStoreFactoryClass=org.apache.giraph.comm.messages.out_of_core.DiskBackedMessageStoreFactory
-Dgiraph.maxMessagesInMemory=5000000/10000000/5000000/10000000
-Dgiraph.useBigDataIOForMessages=true/false
{noformat}
But nothing of these helps us. Job fell down during computation.
> New Out-of-core mechanism does not work
> ---------------------------------------
>
> Key: GIRAPH-1026
> URL: https://issues.apache.org/jira/browse/GIRAPH-1026
> Project: Giraph
> Issue Type: Bug
> Affects Versions: 1.2.0-SNAPSHOT
> Reporter: Max Garmash
>
> After releasing new OOC mechanism we tried to test it on our data and it
> failed.
> Our environment:
> 4x (CPU 6 cores / 12 threads, RAM 64GB)
> We can successfully process about 75 millions of vertices.
> With 100-120M vertices it fails like this:
> {noformat}
> 2015-08-04 12:35:21,000 INFO [AMRM Callback Handler Thread]
> yarn.GiraphApplicationMaster
> (GiraphApplicationMaster.java:onContainersCompleted(574)) - Got container
> status for containerID=container_1438068521412_0193_01_000005,
> state=COMPLETE, exitStatus=-104, diagnostics=Container
> [pid=6700,containerID=container_1438068521412_0193_01_000005] is running
> beyond physical memory limits. Current usage: 20.3 GB of 20 GB physical
> memory used; 22.4 GB of 42 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1438068521412_0193_01_000005 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 6704 6700 6700 6700 (java) 78760 20733 24033841152 5317812 java
> -Xmx20480M -Xms20480M -cp
> .:${CLASSPATH}:./*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HADOOP_MAPRED_HOME/lib/*:$MR2_CLASSPATH:./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/264-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*::./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/264-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*::./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/264-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*:
> org.apache.giraph.yarn.GiraphYarnTask 1438068521412 193 5 1
> |- 6700 6698 6700 6700 (bash) 0 0 14376960 433 /bin/bash -c java
> -Xmx20480M -Xms20480M -cp
> .:${CLASSPATH}:./*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HADOOP_MAPRED_HOME/lib/*:$MR2_CLASSPATH:./*:/etc/hadoop/conf.cloudera.yarn:/run/cloudera-scm-agent/process/264-yarn-NODEMANAGER:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hadoop-mapreduce/lib/*:
> org.apache.giraph.yarn.GiraphYarnTask 1438068521412 193 5 1
> 1>/var/log/hadoop-yarn/container/application_1438068521412_0193/container_1438068521412_0193_01_000005/task-5-stdout.log
>
> 2>/var/log/hadoop-yarn/container/application_1438068521412_0193/container_1438068521412_0193_01_000005/task-5-stderr.log
>
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
> {noformat}
> Logs from container
> {noformat}
> 2015-08-04 12:34:51,258 INFO [netty-server-worker-4] handler.RequestDecoder
> (RequestDecoder.java:channelRead(74)) - decode: Server window metrics
> MBytes/sec received = 12.5315, MBytesReceived = 380.217, ave received req
> MBytes = 0.007, secs waited = 30.34
> 2015-08-04 12:35:16,258 INFO [check-memory] ooc.CheckMemoryCallable
> (CheckMemoryCallable.java:call(221)) - call: Memory is very limited now.
> Calling GC manually. freeMemory = 924.27MB
> {noformat}
> We are running our job like this:
> {noformat}
> hadoop jar
> giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
> \
> org.apache.giraph.GiraphRunner \
> -Dgiraph.yarn.task.heap.mb=20480 \
> -Dgiraph.isStaticGraph=true \
> -Dgiraph.useOutOfCoreGraph=true \
> -Dgiraph.logLevel=info \
> -Dgiraph.weightedPageRank.superstepCount=5 \
> ru.isys.WeightedPageRankComputation \
> -vif ru.isys.CrawlerInputFormat \
> -vip /tmp/bigdata/input \
> -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
> -op /tmp/giraph \
> -w 6 \
> -yj
> giraph-examples-1.2.0-SNAPSHOT-for-hadoop-2.6.0-cdh5.4.4-jar-with-dependencies.jar
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)