Eric, Thanks for your ideas on this. I've actually traced this issue to a single saturated link in our datacenter. But you've given me some ideas on how I can optimize this system some more. Thanks.
Logan On Sun, Nov 11, 2012 at 12:00 AM, Eric Yang <[email protected]> wrote: > Hi Logan, > > It looks like the datanode is saturated when large mapreduce job is in > process. Chukwa agent will drop data on the floor, if there is more data > that agent can be buffer in memory. Are the collectors running on > datanode? Do you have multiple disks for the datanode? It maybe good to > map number of disks to (task slot - 1) and let chukwa collector write to a > disk that is not used concurrently by mapreduce task to provide good > performance for both data injection and data processing. > > regards, > Eric > > On Sat, Nov 10, 2012 at 2:17 PM, Logan Hardy <[email protected]>wrote: > >> We are running CentOS 5.4, Chukwa 0.3.0, java version "1.6.0_17", and are >> feeding a steady stream of data into our CDH3u3 Hadoop cluster. We have 6 >> Chukwa agent machines feeding 3 Chukwa collectors. Any time the cluster >> gets busy with a big job or the task of decommissioning a node the Chukwa >> agent and collector start to back up and and I start seeing "WaitingQueue - >> MemLimitQueue is full" messages in the agent.log as shown below. As soon as >> hadoop cluster activity dies down the MemLimitQueue messages go away and >> everything goes back to normal. >> >> [root@COLL5 chukwa]# ps auxf | grep chukwa >> root 11258 0.0 0.0 61172 732 pts/0 S+ 15:15 0:00 >> \_ grep chukwa >> root 29248 1.2 2.1 415572 86928 ? Sl 04:03 8:04 >> /usr/java/default/bin/java -Xms32M -Xmx64M -DAPP=agent >> -Dlog4j.configuration=chukwa-log4j.properties >> -DCHUKWA_HOME=/usr/local/chukwa/bin/.. >> -DCHUKWA_CONF_DIR=/usr/local/chukwa/bin/../conf >> -DCHUKWA_LOG_DIR=/usr/local/chukwa/logs -classpath >> /usr/local/chukwa/bin/../conf::/usr/local/chukwa/bin/../chukwa-agent-0.3.0.jar:/usr/local/chukwa/bin/../chukwa-core-0.3.0.jar:/usr/local/chukwa/bin/../hadoopjars/hadoop-0.20.0-core.jar:/usr/local/chukwa/bin/../lib/NagiosAppender-1.5.0.jar:/usr/local/chukwa/bin/../lib/ant-1.7.1.jar:/usr/local/chukwa/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/chukwa/bin/../lib/asm-3.1.jar:/usr/local/chukwa/bin/../lib/commons-beanutils-1.8.0.jar:/usr/local/chukwa/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/usr/local/chukwa/bin/../lib/commons-codec-1.3.jar:/usr/local/chukwa/bin/../lib/commons-collections-3.1.jar:/usr/local/chukwa/bin/../lib/commons-fileupload-1.2.jar:/usr/local/chukwa/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/chukwa/bin/../lib/commons-io-1.4.jar:/usr/local/chukwa/bin/../lib/commons-lang-2.4.jar:/usr/local/chukwa/bin/../lib/commons-logging-1.1.1.jar:/usr/local/chukwa/bin/../lib/commons-logging-api-1.0.4.jar:/usr/local/chukwa/bin/../lib/commons-net-1.4.1.jar:/usr/local/chukwa/bin/../lib/core-3.1.1.jar:/usr/local/chukwa/bin/../lib/ezmorph-1.0.6.jar:/usr/local/chukwa/bin/../lib/jchronic-0.2.3.jar:/usr/local/chukwa/bin/../lib/jersey-bundle-1.1.0-ea.jar:/usr/local/chukwa/bin/../lib/jetty-6.1.11.jar:/usr/local/chukwa/bin/../lib/jetty-util-6.1.11.jar:/usr/local/chukwa/bin/../lib/json-lib-2.2.3-jdk15.jar:/usr/local/chukwa/bin/../lib/json.jar:/usr/local/chukwa/bin/../lib/jsp-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsp-api-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsr311-api-1.0.jar:/usr/local/chukwa/bin/../lib/junit-3.8.1.jar:/usr/local/chukwa/bin/../lib/log4j-1.2.13.jar:/usr/local/chukwa/bin/../lib/mysql-connector-java-5.1.6.jar:/usr/local/chukwa/bin/../lib/prefuse.jar:/usr/local/chukwa/bin/../lib/servlet-api-2.5-6.1.11.jar >> org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent >> >> >> agent.log >> ........ >> 2012-11-10 14:56:14,470 INFO Timer-0 ChukwaAgent - writing checkpoint 7257 >> 2012-11-10 14:56:18,655 INFO Timer-1 HttpConnector - # http chunks ACK'ed >> since last report: 547 >> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >>>>>> >> HTTP Got success back from http://10.5.200.204:8080/chukwa; response >> length 832 >> 2012-11-10 14:56:20,163 INFO HTTP post thread HttpConnector - sent 13 >> chunks, got back 13 acks >> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >> collected 13 chunks >> *2012-11-10 14:56:20,163 INFO Thread-6 WaitingQueue - MemLimitQueue is >> full [8119214]* >> 2012-11-10 14:56:20,166 INFO HTTP post thread ChukwaHttpSender - >>>>>> >> HTTP post to http://10.5.200.204:8080/ length = 2286662 >> 2012-11-10 14:56:24,474 INFO Timer-0 ChukwaAgent - writing checkpoint 7258 >> 2012-11-10 14:56:27,293 INFO HTTP post thread ChukwaHttpSender - >>>>>> >> HTTP Got success back from http://10.5.200.204:8080/chukwa; response >> length 832 >> 2012-11-10 14:56:27,294 INFO HTTP post thread HttpConnector - sent 13 >> chunks, got back 13 acks >> 2012-11-10 14:56:27,294 INFO HTTP post thread ChukwaHttpSender - >> collected 13 chunks >> *2012-11-10 14:56:27,295 INFO Thread-6 WaitingQueue - MemLimitQueue is >> full [8091188]* >> 2012-11-10 14:56:27,302 INFO HTTP post thread ChukwaHttpSender - >>>>>> >> HTTP post to http://10.5.200.204:8080/ length = 2214008 >> 2012-11-10 14:56:29,476 INFO Timer-0 ChukwaAgent - writing checkpoint 7259 >> >> >> Any ideas? >> >> -- >> -- >> *Logan Hardy *| Operations Engineer >> 33Across <http://www.33across.com/> | Follow us: >> Twitter<http://www.twitter.com/33across> >> | Facebook <http://www.facebook.com/33across> >> >> o 801.231.4573 >> >> *Learn about our Q1 Brand Graph Category Insights >> Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf> >> * >> * >> 33Across and Tynt in the News >> *AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat • >> WSJ <http://33across.com/news.php#axzz1uqxl0v16> >> >> > -- -- *Logan Hardy *| Operations Engineer 33Across <http://www.33across.com/> | Follow us: Twitter<http://www.twitter.com/33across> | Facebook <http://www.facebook.com/33across> o 801.231.4573 *Learn about our Q1 Brand Graph Category Insights Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf> * * 33Across and Tynt in the News *AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat • WSJ<http://33across.com/news.php#axzz1uqxl0v16>
