Too Many CLOSE_WAIT make performance down
Version: HBase: 0.94.3 HDFS: 0.20.* There are too many CLOSE_WAIT connection from RS to DN, and I find the number is over 3. And change the Log-Level of 'org.apache.hadoop.ipc.HBaseServer.trace' to DEBUG, and check that the performance: Call #2649932; Served: HRegionInterface#get queueTime=0 processingTime=284 contents=1 Get, 86 bytes So the conclusion is that when DataNode server port has been occupied by normal or irregular connection, it will bring read/write performance down. According to principle of TCP/IP protocol, CLOSE_WAIT means that RS cannot close fd which has been opened, and I restart RS gracefully, the problem has been tackled. Ok, My question is : Can someone tell me in which conditions do RS will ignore the file handler? Any ideas will be nice. Thanks! -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: www.binospace.com BLOG: http://blog.sina.com.cn/jiangbinglover Focus on distributed computing, HDFS/HBase
Re: whitelist feature of YARN
YARN-521, which brings whitelisting to the AMRMClient APIs, is now included in 2.1.0-beta. Check out the doc for the relaxLocality paramater in ContainerRequest in AMRMClient: https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java and I can help clarify here if anything's confusing. -Sandy On Tue, Jul 9, 2013 at 2:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Yes, I have been using AMRMClient APIs. I am planning to shift to whatever way is this white list feature is supported with. But am not sure what is meant by submitting ResourceRequests directly to RM. Can you please elaborate on this or give me a pointer to some example code on how to do it... Thanks for the reply, -Kishore On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.comwrote: Hi Krishna, From your previous email, it looks like you are using the AMRMClient APIs. Support for whitelisting is not yet supported through them. I am working on this in YARN-521, which should be included in the next release after 2.1.0-beta. If you are submitting ResourceRequests directly to the RM, you can whitelist a node by * setting the relaxLocality flag on the node-level ResourceRequest to true * setting the relaxLocality flag on the corresponding rack-level ResourceRequest to false * setting the relaxLocality flag on the corresponding any-level ResourceRequest to false -Sandy On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
Re: whitelist feature of YARN
Hi Sandy, Thanks for the reply and it is good to know YARN-521 is done! Please answer my following questions 1) when is 2.1.0-beta going to be released? is it soon or do you suggest me take it from the trunk or is there a recent release candidate available? 2) I have recently changed my application to use the new Asynchronous interfaces. I am hoping it works with that too, correct me if I am wrong. 3) Change in interface: The old interface for ContainerRequest constructor used to be this: public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, int containerCount); where as now it is changed to a) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority) b) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, boolean relaxLocality) that means the old argument containerCount is gone! How would I be able to specify how many containers do I need? -Kishore On Wed, Aug 7, 2013 at 11:37 AM, Sandy Ryza sandy.r...@cloudera.com wrote: YARN-521, which brings whitelisting to the AMRMClient APIs, is now included in 2.1.0-beta. Check out the doc for the relaxLocality paramater in ContainerRequest in AMRMClient: https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java and I can help clarify here if anything's confusing. -Sandy On Tue, Jul 9, 2013 at 2:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Yes, I have been using AMRMClient APIs. I am planning to shift to whatever way is this white list feature is supported with. But am not sure what is meant by submitting ResourceRequests directly to RM. Can you please elaborate on this or give me a pointer to some example code on how to do it... Thanks for the reply, -Kishore On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.comwrote: Hi Krishna, From your previous email, it looks like you are using the AMRMClient APIs. Support for whitelisting is not yet supported through them. I am working on this in YARN-521, which should be included in the next release after 2.1.0-beta. If you are submitting ResourceRequests directly to the RM, you can whitelist a node by * setting the relaxLocality flag on the node-level ResourceRequest to true * setting the relaxLocality flag on the corresponding rack-level ResourceRequest to false * setting the relaxLocality flag on the corresponding any-level ResourceRequest to false -Sandy On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
Namenode is failing with expception to join
I have all configuration fine. But whenever i start namenode it fails with a below exception. No clue where to fix this? 2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 1 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under construction = 0 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 115 loaded in 0 seconds. 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 0 from /data/1/dfs/nn/current/fsimage_000 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@5f18223d expecting start txid #1 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream '/data/1/dfs/nn/current/edits_0515247-0515255' to transaction ID 1 2013-08-07 02:56:22,753 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete. 2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 515247.
Re: whitelist feature of YARN
Responses inline: On Tue, Aug 6, 2013 at 11:55 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Thanks for the reply and it is good to know YARN-521 is done! Please answer my following questions 1) when is 2.1.0-beta going to be released? is it soon or do you suggest me take it from the trunk or is there a recent release candidate available? We're very close and my guess would be no later than the end of the month (don't hold me to this). 2) I have recently changed my application to use the new Asynchronous interfaces. I am hoping it works with that too, correct me if I am wrong. ContainerRequest is shared by the async interfaces as well so it should work here. 3) Change in interface: The old interface for ContainerRequest constructor used to be this: public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, int containerCount); where as now it is changed to a) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority) b) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, boolean relaxLocality) that means the old argument containerCount is gone! How would I be able to specify how many containers do I need? We now expect that you submit a ContainerRequest for each container you want. -Kishore On Wed, Aug 7, 2013 at 11:37 AM, Sandy Ryza sandy.r...@cloudera.comwrote: YARN-521, which brings whitelisting to the AMRMClient APIs, is now included in 2.1.0-beta. Check out the doc for the relaxLocality paramater in ContainerRequest in AMRMClient: https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java and I can help clarify here if anything's confusing. -Sandy On Tue, Jul 9, 2013 at 2:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Yes, I have been using AMRMClient APIs. I am planning to shift to whatever way is this white list feature is supported with. But am not sure what is meant by submitting ResourceRequests directly to RM. Can you please elaborate on this or give me a pointer to some example code on how to do it... Thanks for the reply, -Kishore On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.comwrote: Hi Krishna, From your previous email, it looks like you are using the AMRMClient APIs. Support for whitelisting is not yet supported through them. I am working on this in YARN-521, which should be included in the next release after 2.1.0-beta. If you are submitting ResourceRequests directly to the RM, you can whitelist a node by * setting the relaxLocality flag on the node-level ResourceRequest to true * setting the relaxLocality flag on the corresponding rack-level ResourceRequest to false * setting the relaxLocality flag on the corresponding any-level ResourceRequest to false -Sandy On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
Re: Namenode is failing with expception to join
Manish, you stop HDFS then start HDFS on the standby name node right? please looked at https://issues.apache.org/jira/browse/HDFS-5058 there are two solutions: 1) start HDFS on the active name node, nor SBN 2) copy {namenode.name.dir}/* to the SBN I advice #1. On Wed, Aug 7, 2013 at 3:00 PM, Manish Bhoge manishbh...@rocketmail.comwrote: I have all configuration fine. But whenever i start namenode it fails with a below exception. No clue where to fix this? 2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 1 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under construction = 0 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 115 loaded in 0 seconds. 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 0 from /data/1/dfs/nn/current/fsimage_000 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@5f18223d expecting start txid #1 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream '/data/1/dfs/nn/current/edits_0515247-0515255' to transaction ID 1 2013-08-07 02:56:22,753 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 515247.
Re: Namenode is failing with expception to join
I am not using HA here. All I am trying here is to make a 2 node cluster. But before that i wanted to make sure that i am setting up everything right and make the HDFS up on Pseudo distributed mode. However, I am suspecting a mistake in my /etc/hosts file. As, I have rename the local host to myhost-1 Please suggest. From: Azuryy Yu azury...@gmail.com To: user@hadoop.apache.org; Manish Bhoge manishbh...@rocketmail.com Sent: Wednesday, 7 August 2013 1:08 PM Subject: Re: Namenode is failing with expception to join Manish, you stop HDFS then start HDFS on the standby name node right? please looked at https://issues.apache.org/jira/browse/HDFS-5058 there are two solutions: 1) start HDFS on the active name node, nor SBN 2) copy {namenode.name.dir}/* to the SBN I advice #1. On Wed, Aug 7, 2013 at 3:00 PM, Manish Bhoge manishbh...@rocketmail.com wrote: I have all configuration fine. But whenever i start namenode it fails with a below exception. No clue where to fix this? 2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 1 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under construction = 0 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 115 loaded in 0 seconds. 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 0 from /data/1/dfs/nn/current/fsimage_000 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@5f18223d expecting start txid #1 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream '/data/1/dfs/nn/current/edits_0515247-0515255' to transaction ID 1 2013-08-07 02:56:22,753 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete. 2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 515247.
Re: Namenode is failing with expception to join
Hi, Did you configured your Name Node to store multiple copies of its metadata?. You can recover your name node in that situation. #hadoop namenode -recover it will ask you whether you want to continue or not, Please follow the instructions. Thanks On Wed, Aug 7, 2013 at 1:44 PM, Manish Bhoge manishbh...@rocketmail.comwrote: I am not using HA here. All I am trying here is to make a 2 node cluster. But before that i wanted to make sure that i am setting up everything right and make the HDFS up on Pseudo distributed mode. However, I am suspecting a mistake in my /etc/hosts file. As, I have rename the local host to myhost-1 Please suggest. -- *From:* Azuryy Yu azury...@gmail.com *To:* user@hadoop.apache.org; Manish Bhoge manishbh...@rocketmail.com *Sent:* Wednesday, 7 August 2013 1:08 PM *Subject:* Re: Namenode is failing with expception to join Manish, you stop HDFS then start HDFS on the standby name node right? please looked at https://issues.apache.org/jira/browse/HDFS-5058 there are two solutions: 1) start HDFS on the active name node, nor SBN 2) copy {namenode.name.dir}/* to the SBN I advice #1. On Wed, Aug 7, 2013 at 3:00 PM, Manish Bhoge manishbh...@rocketmail.comwrote: I have all configuration fine. But whenever i start namenode it fails with a below exception. No clue where to fix this? 2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 1 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under construction = 0 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 115 loaded in 0 seconds. 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded image for txid 0 from /data/1/dfs/nn/current/fsimage_000 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@5f18223d expecting start txid #1 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding stream '/data/1/dfs/nn/current/edits_0515247-0515255' to transaction ID 1 2013-08-07 02:56:22,753 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2013-08-07 02:56:22,754 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.2013-08-07 02:56:22,754 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 1, but got txid 515247.
Re: whitelist feature of YARN
Sandy, Thanks again. I found RC1 for 2.1.0-beta available at http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc1/ Would this have the fix for YARN-521? and, can I use that? -Kishore On Wed, Aug 7, 2013 at 12:35 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Responses inline: On Tue, Aug 6, 2013 at 11:55 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Thanks for the reply and it is good to know YARN-521 is done! Please answer my following questions 1) when is 2.1.0-beta going to be released? is it soon or do you suggest me take it from the trunk or is there a recent release candidate available? We're very close and my guess would be no later than the end of the month (don't hold me to this). 2) I have recently changed my application to use the new Asynchronous interfaces. I am hoping it works with that too, correct me if I am wrong. ContainerRequest is shared by the async interfaces as well so it should work here. 3) Change in interface: The old interface for ContainerRequest constructor used to be this: public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, int containerCount); where as now it is changed to a) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority) b) public ContainerRequest(Resource capability, String[] nodes, String[] racks, Priority priority, boolean relaxLocality) that means the old argument containerCount is gone! How would I be able to specify how many containers do I need? We now expect that you submit a ContainerRequest for each container you want. -Kishore On Wed, Aug 7, 2013 at 11:37 AM, Sandy Ryza sandy.r...@cloudera.comwrote: YARN-521, which brings whitelisting to the AMRMClient APIs, is now included in 2.1.0-beta. Check out the doc for the relaxLocality paramater in ContainerRequest in AMRMClient: https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java and I can help clarify here if anything's confusing. -Sandy On Tue, Jul 9, 2013 at 2:54 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Sandy, Yes, I have been using AMRMClient APIs. I am planning to shift to whatever way is this white list feature is supported with. But am not sure what is meant by submitting ResourceRequests directly to RM. Can you please elaborate on this or give me a pointer to some example code on how to do it... Thanks for the reply, -Kishore On Mon, Jul 8, 2013 at 10:53 PM, Sandy Ryza sandy.r...@cloudera.comwrote: Hi Krishna, From your previous email, it looks like you are using the AMRMClient APIs. Support for whitelisting is not yet supported through them. I am working on this in YARN-521, which should be included in the next release after 2.1.0-beta. If you are submitting ResourceRequests directly to the RM, you can whitelist a node by * setting the relaxLocality flag on the node-level ResourceRequest to true * setting the relaxLocality flag on the corresponding rack-level ResourceRequest to false * setting the relaxLocality flag on the corresponding any-level ResourceRequest to false -Sandy On Mon, Jul 8, 2013 at 6:48 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Can someone please point to some example code of how to use the whitelist feature of YARN, I have recently got RC1 for hadoop-2.1.0-beta and want to use this feature. It would be great if you can point me to some description of what this white listing feature is, I have gone through some JIRA logs related to this but more concrete explanation would be helpful. Thanks, Kishore
MutableCounterLong metrics display in ganglia
I use hadoop-2.0.5 and config hadoop-metrics2.properties file with below content. *.sink.ganglia.class=org. apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 *.sink.ganglia.supportsparse=true namenode.sink.ganglia.servers=10.232.98.74:8649 datanode.sink.ganglia.servers=10.232.98.74:8649 I write one programme that call FSDataOutputStream.hsync() method once per second. There is @Metric MutableCounterLong fsyncCount metrics in DataNodeMetrics, when FSDataOutputStream.hsync() method is called, the value of fsyncCount is increased, dataNode send the value of fsyncCount to ganglia every ten seconds, so I think the value of fsyncCount in ganglia should be 10, 20 ,30, 40 and so on . but the ganglia display 1,1,1,1,1 .. , so the value is the value of fsyncCount is set to zero every ten seconds and ”fsyncCount.value/10“ . Is the the value of MutableCounterLong class set to zero every ten seconds and MutableCounterLong .value/10? Thanks, LiuLei
Re: Large-scale collection of logs from multiple Hadoop nodes
We have the same scenario as you described. The following is our solution, just FYI: We installed a local scribe agent on every node of our cluster, and we have several central scribe servers. We extended log4j to support writing logs to the local scribe agent, and the local scribe agents forward the logs to the central scribe servers, at last the central scribe servers write these logs to a specified hdfs cluster used for offline processing. Then we use hive/impale to analyse the collected logs. From: Public Network Services publicnetworkservi...@gmail.commailto:publicnetworkservi...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Tuesday, August 6, 2013 1:58 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Large-scale collection of logs from multiple Hadoop nodes Hi... I am facing a large-scale usage scenario of log collection from a Hadoop cluster and examining ways as to how it should be implemented. More specifically, imagine a cluster that has hundreds of nodes, each of which constantly produces Syslog events that need to be gathered an analyzed at another point. The total amount of logs could be tens of gigabytes per day, if not more, and the reception rate in the order of thousands of events per second, if not more. One solution is to send those events over the network (e.g., using using flume) and collect them in one or more (less than 5) nodes in the cluster, or in another location, whereby the logs will be processed by a either constantly MapReduce job, or by non-Hadoop servers running some log processing application. Another approach could be to deposit all these events into a queuing system like ActiveMQ or RabbitMQ, or whatever. In all cases, the main objective is to be able to do real-time log analysis. What would be the best way of implementing the above scenario? Thanks! PNS
Re: Large-scale collection of logs from multiple Hadoop nodes
Hi, the approach with Flume is the most reliable workflow for, since Flume has a builtin Syslog source as well a loadbalancing channel. On top you can define multiple channels for different sources. Best, Alex sent via my mobile device mapredit.blogspot.com @mapredit On Aug 7, 2013, at 1:44 PM, 武泽胜 wuzesh...@xiaomi.com wrote: We have the same scenario as you described. The following is our solution, just FYI: We installed a local scribe agent on every node of our cluster, and we have several central scribe servers. We extended log4j to support writing logs to the local scribe agent, and the local scribe agents forward the logs to the central scribe servers, at last the central scribe servers write these logs to a specified hdfs cluster used for offline processing. Then we use hive/impale to analyse the collected logs. From: Public Network Services publicnetworkservi...@gmail.com Reply-To: user@hadoop.apache.org user@hadoop.apache.org Date: Tuesday, August 6, 2013 1:58 AM To: user@hadoop.apache.org user@hadoop.apache.org Subject: Large-scale collection of logs from multiple Hadoop nodes Hi... I am facing a large-scale usage scenario of log collection from a Hadoop cluster and examining ways as to how it should be implemented. More specifically, imagine a cluster that has hundreds of nodes, each of which constantly produces Syslog events that need to be gathered an analyzed at another point. The total amount of logs could be tens of gigabytes per day, if not more, and the reception rate in the order of thousands of events per second, if not more. One solution is to send those events over the network (e.g., using using flume) and collect them in one or more (less than 5) nodes in the cluster, or in another location, whereby the logs will be processed by a either constantly MapReduce job, or by non-Hadoop servers running some log processing application. Another approach could be to deposit all these events into a queuing system like ActiveMQ or RabbitMQ, or whatever. In all cases, the main objective is to be able to do real-time log analysis. What would be the best way of implementing the above scenario? Thanks! PNS
RE: Compilation problem of Hadoop Projects after Import into Eclipse
Sathwik I experienced something similar a few weeks ago. I reported a JIRA on the documentation of this, please comment there https://issues.apache.org/jira/browse/HADOOP-9771 Regards ./g From: Sathwik B P [mailto:sath...@apache.org] Sent: Tuesday, August 06, 2013 4:46 AM To: user@hadoop.apache.org Subject: Compilation problem of Hadoop Projects after Import into Eclipse Hi guys, I see a couple of problem with the generation of eclipse artifacts mvn eclipse:eclipse. There are a couple of compilation issues after importing the hadoop projects into eclipse, though am able to rectify them. 1) hadoop-common: TestAvroSerialization.java doesn't compile as it uses AvroRecord which exists under target/generated-test-sources/java. Solution: Need to include target/generated-test-sources/java as source. 2) hadoop-streaming: linked source folder conf which should point to hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn -server-resourcemanager/conf doesn't point to the path correctly Solution: Manually add the conf and link it to hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn -server-resourcemanager/conf Can this be fixed? I have just compiled the hadoop trunk codebase. regards, sathwik
Oozie ssh action error
Whats the probable cause of the error when the error log of the ssh action reads: Error: permission denied (publickey password) I already have a passphrase-less ssh set. can you guys point me towards the potential reason and solution to the error. Thanks, Kasa.
Re: Oozie ssh action error
Hi, I hope below points might help you. *Approach 1#* You need to change the sshd_config file in the remote server (probably in /etc/ssh/sshd_config). Change PasswordAuthentication value. PasswordAuthentication no to PasswordAuthentication yes And then restart the SSHD daemon *Approach 2#* Check the authorized_keys file permission. chmod 600 ~/.ssh/authorized_keys Thanks. On Wed, Aug 7, 2013 at 7:16 PM, Kasa V Varun Tej kasava...@gmail.comwrote: Whats the probable cause of the error when the error log of the ssh action reads: Error: permission denied (publickey password) I already have a passphrase-less ssh set. can you guys point me towards the potential reason and solution to the error. Thanks, Kasa.
Re: Extra start-up overhead with hadoop-2.1.0-beta
Hi Omkar, Can you please see if you can answer my question with this info or if you need anything else from me? Also, does resource localization improve or impact any performance? Thanks, Kishore On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.com wrote: How are you making these measurements can you elaborate more? Is it on a best case basis or on an average or worst case? How many resources are you sending it for localization? were the sizes and number of these resources consistent across tests? Were these resources public/private/application specific? Apart from this is the other load on node manager same? is the load on hdfs same? did you see any network bottleneck? More information will help a lot. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Please share with me if you anyone has an answer or clues to my question regarding the start up performance. Also, one more thing I have observed today is the time taken to run a command on a container went up by more than a second in this latest version. When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point I call startContainer() to the point the command is started on the container. where as When using 2.1.0-beta, it is taking around 1.5 seconds from the point it came to the call back onContainerStarted() to the point the command is seen started running on the container. Thanks, Kishore On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been using the hadoop-2.0.1-beta release candidate and observed that it is slower in running my simple application that runs on 2 containers. I have tried to find out which parts of it is really having this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I found that. 1) From the point my Client has submitted the Application Master to RM, it is taking 2 seconds extra 2) From the point my container request are set up by Application Master, till the containers are allocated, it is taking 2 seconds extra Is this overhead expected with the changes that went into the new version? Or is there to improve it by changing something in configurations or so? Thanks, Kishore
Re: Is there any way to use a hdfs file as a Circular buffer?
Use CEP tool like Esper and Storm, you will be able to achieve that ...I can give you more inputs if you can provide me more details of what you are trying to achieve Regards, Som Shekhar Sharma +91-8197243810 On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin vboylin1...@gmail.com wrote: Hi Niels and Bertrand, Thank you for you great advices. In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same. we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives. Thank you. Lin Wukang 2013/7/25 Niels Basjes ni...@basjes.nl A circular file on hdfs is not possible. Some of the ways around this limitation: - Create a series of files and delete the oldest file when you have too much. - Put the data into an hbase table and do something similar. - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection). Niels Hi all, Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
Datanode doesn't connect to Namenode
Hi everyone, My slave machine (cloud15) the datanode shows this log. It doesn't connect to the master (cloud6). 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cloud15/192.168.188.15:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.RPC: Server at cloud15/ 192.168.188.15:54310 not available yet, Z... But when I type jps command on slave machine DataNode is running. This is my file core-site.xml in slave machine (cloud15): configuration property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://cloud15:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration In the master machine I just swap cloud15 to cloud6. In the file /etc/host I have (192.168.188.15 cloud15) and (192.168.188.6 cloud6) lines, and both machines access through ssh with out password. Am I missing anything? Thanks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia*
Re: Is there any way to use a hdfs file as a Circular buffer?
Hi Shekhar, Thank you for your replies.So far as I know, Storm is a distributed computing framework, but what we need is a storage system, high throughput and concurrency is matters.We have thousands of devices, each device will produce a steady stream of brinary data. The space for every device is fixed, so their should reuse the space on the disk.So, how can storm or esper achieve that? Many Thanks Lin Wukang 2013/8/8 Shekhar Sharma shekhar2...@gmail.com Use CEP tool like Esper and Storm, you will be able to achieve that ...I can give you more inputs if you can provide me more details of what you are trying to achieve Regards, Som Shekhar Sharma +91-8197243810 On Wed, Aug 7, 2013 at 9:58 PM, Wukang Lin vboylin1...@gmail.com wrote: Hi Niels and Bertrand, Thank you for you great advices. In our scenario, we need to store a steady stream of binary data into a circular storage,throughput and concurrency are the most important indicators.The first way seems work, but as hdfs is not friendly for small files, this approche may be not smooth enough.HBase is good, but not appropriate for us, both for throughput and storage.mongodb is quite good for web applications, but not suitable the scenario we meet all the same. we need a distributed storage system,with Highe throughput, HA,LB and secure. Maybe It act much like hbase, manager a lot of small file(hfile) as a large region. we manager a lot of small file as a large one. Perhaps we should develop it by ourselives. Thank you. Lin Wukang 2013/7/25 Niels Basjes ni...@basjes.nl A circular file on hdfs is not possible. Some of the ways around this limitation: - Create a series of files and delete the oldest file when you have too much. - Put the data into an hbase table and do something similar. - Use completely different technology like mongodb which has built in support for a circular buffer (capped collection). Niels Hi all, Is there any way to use a hdfs file as a Circular buffer? I mean, if I set a quotas to a directory on hdfs, and writting data to a file in that directory continuously. Once the quotas exceeded, I can redirect the writter and write the data from the beginning of the file automatically .
Re: Datanode doesn't connect to Namenode
Hi, Your logs showing that the process is creating IPC call not for namenode, it is hitting datanode itself. Check you please check you datanode processes status?. Regards Jitendra On Wed, Aug 7, 2013 at 10:29 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: Hi everyone, My slave machine (cloud15) the datanode shows this log. It doesn't connect to the master (cloud6). 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cloud15/192.168.188.15:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.RPC: Server at cloud15/ 192.168.188.15:54310 not available yet, Z... But when I type jps command on slave machine DataNode is running. This is my file core-site.xml in slave machine (cloud15): configuration property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://cloud15:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration In the master machine I just swap cloud15 to cloud6. In the file /etc/host I have (192.168.188.15 cloud15) and (192.168.188.6 cloud6) lines, and both machines access through ssh with out password. Am I missing anything? Thanks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia*
Re: Datanode doesn't connect to Namenode
Hi , your configuration of Datanode shows namefs.default.name/name valuehdfs://cloud15:54310/value But you have said Namenode is configured on master (cloud6). Can you check the configuration again ? Regards, Sivaram R L On Wed, Aug 7, 2013 at 10:29 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: Hi everyone, My slave machine (cloud15) the datanode shows this log. It doesn't connect to the master (cloud6). 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cloud15/192.168.188.15:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.RPC: Server at cloud15/ 192.168.188.15:54310 not available yet, Z... But when I type jps command on slave machine DataNode is running. This is my file core-site.xml in slave machine (cloud15): configuration property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://cloud15:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration In the master machine I just swap cloud15 to cloud6. In the file /etc/host I have (192.168.188.15 cloud15) and (192.168.188.6 cloud6) lines, and both machines access through ssh with out password. Am I missing anything? Thanks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia*
Re: Extra start-up overhead with hadoop-2.1.0-beta
I believe https://issues.apache.org/jira/browse/MAPREDUCE-5399 causes performance degradation in cases where there are a lot of reducers. I can imagine it causing degradation if the configuration files are super big / some other weird cases. From: Krishna Kishore Bonagiri write2kish...@gmail.com To: user@hadoop.apache.org Sent: Wednesday, August 7, 2013 10:03 AM Subject: Re: Extra start-up overhead with hadoop-2.1.0-beta Hi Omkar, Can you please see if you can answer my question with this info or if you need anything else from me? Also, does resource localization improve or impact any performance? Thanks, Kishore On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.com wrote: How are you making these measurements can you elaborate more? Is it on a best case basis or on an average or worst case? How many resources are you sending it for localization? were the sizes and number of these resources consistent across tests? Were these resources public/private/application specific? Apart from this is the other load on node manager same? is the load on hdfs same? did you see any network bottleneck? More information will help a lot. Thanks, Omkar Joshi Hortonworks Inc. On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Please share with me if you anyone has an answer or clues to my question regarding the start up performance. Also, one more thing I have observed today is the time taken to run a command on a container went up by more than a second in this latest version. When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point I call startContainer() to the point the command is started on the container. where as When using 2.1.0-beta, it is taking around 1.5 seconds from the point it came to the call back onContainerStarted() to the point the command is seen started running on the container. Thanks, Kishore On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been using the hadoop-2.0.1-beta release candidate and observed that it is slower in running my simple application that runs on 2 containers. I have tried to find out which parts of it is really having this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I found that. 1) From the point my Client has submitted the Application Master to RM, it is taking 2 seconds extra 2) From the point my container request are set up by Application Master, till the containers are allocated, it is taking 2 seconds extra Is this overhead expected with the changes that went into the new version? Or is there to improve it by changing something in configurations or so? Thanks, Kishore
Re: Datanode doesn't connect to Namenode
yes, in slave I type: namefs.default.name/name valuehdfs://cloud15:54310/value in master I type: namefs.default.name/name valuehdfs://cloud6:54310/value If I type cloud6 on both configurations, the slave doesn't start. On Wed, Aug 7, 2013 at 2:40 PM, Sivaram RL sivaram...@gmail.com wrote: Hi , your configuration of Datanode shows namefs.default.name/name valuehdfs://cloud15:54310/value But you have said Namenode is configured on master (cloud6). Can you check the configuration again ? Regards, Sivaram R L On Wed, Aug 7, 2013 at 10:29 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: Hi everyone, My slave machine (cloud15) the datanode shows this log. It doesn't connect to the master (cloud6). 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cloud15/192.168.188.15:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.RPC: Server at cloud15/ 192.168.188.15:54310 not available yet, Z... But when I type jps command on slave machine DataNode is running. This is my file core-site.xml in slave machine (cloud15): configuration property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://cloud15:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration In the master machine I just swap cloud15 to cloud6. In the file /etc/host I have (192.168.188.15 cloud15) and (192.168.188.6 cloud6) lines, and both machines access through ssh with out password. Am I missing anything? Thanks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia* -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia*
Re: Extra start-up overhead with hadoop-2.1.0-beta
No Ravi, I am not running any MR job. Also, my configuration files are not big. On Wed, Aug 7, 2013 at 11:12 PM, Ravi Prakash ravi...@ymail.com wrote: I believe https://issues.apache.org/jira/browse/MAPREDUCE-5399 causes performance degradation in cases where there are a lot of reducers. I can imagine it causing degradation if the configuration files are super big / some other weird cases. -- *From:* Krishna Kishore Bonagiri write2kish...@gmail.com *To:* user@hadoop.apache.org *Sent:* Wednesday, August 7, 2013 10:03 AM *Subject:* Re: Extra start-up overhead with hadoop-2.1.0-beta Hi Omkar, Can you please see if you can answer my question with this info or if you need anything else from me? Also, does resource localization improve or impact any performance? Thanks, Kishore On Thu, Aug 1, 2013 at 11:20 PM, Omkar Joshi ojo...@hortonworks.comwrote: How are you making these measurements can you elaborate more? Is it on a best case basis or on an average or worst case? How many resources are you sending it for localization? were the sizes and number of these resources consistent across tests? Were these resources public/private/application specific? Apart from this is the other load on node manager same? is the load on hdfs same? did you see any network bottleneck? More information will help a lot. Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com/ On Thu, Aug 1, 2013 at 2:19 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, Please share with me if you anyone has an answer or clues to my question regarding the start up performance. Also, one more thing I have observed today is the time taken to run a command on a container went up by more than a second in this latest version. When using 2.0.4-alpha, it used to take 0.3 to 0.5 seconds from the point I call startContainer() to the point the command is started on the container. where as When using 2.1.0-beta, it is taking around 1.5 seconds from the point it came to the call back onContainerStarted() to the point the command is seen started running on the container. Thanks, Kishore On Thu, Jul 25, 2013 at 8:38 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I have been using the hadoop-2.0.1-beta release candidate and observed that it is slower in running my simple application that runs on 2 containers. I have tried to find out which parts of it is really having this extra overhead(compared to hadoop-2.0.4-alpha), and here is what I found that. 1) From the point my Client has submitted the Application Master to RM, it is taking 2 seconds extra 2) From the point my container request are set up by Application Master, till the containers are allocated, it is taking 2 seconds extra Is this overhead expected with the changes that went into the new version? Or is there to improve it by changing something in configurations or so? Thanks, Kishore
Re: Datanode doesn't connect to Namenode
I'm not able to see tasktraker process on your datanode. On Wed, Aug 7, 2013 at 11:14 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: yes, in slave I type: namefs.default.name/name valuehdfs://cloud15:54310/value in master I type: namefs.default.name/name valuehdfs://cloud6:54310/value If I type cloud6 on both configurations, the slave doesn't start. On Wed, Aug 7, 2013 at 2:40 PM, Sivaram RL sivaram...@gmail.com wrote: Hi , your configuration of Datanode shows namefs.default.name/name valuehdfs://cloud15:54310/value But you have said Namenode is configured on master (cloud6). Can you check the configuration again ? Regards, Sivaram R L On Wed, Aug 7, 2013 at 10:29 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: Hi everyone, My slave machine (cloud15) the datanode shows this log. It doesn't connect to the master (cloud6). 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cloud15/192.168.188.15:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.RPC: Server at cloud15/192.168.188.15:54310 not available yet, Z... But when I type jps command on slave machine DataNode is running. This is my file core-site.xml in slave machine (cloud15): configuration property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://cloud15:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration In the master machine I just swap cloud15 to cloud6. In the file /etc/host I have (192.168.188.15 cloud15) and (192.168.188.6 cloud6) lines, and both machines access through ssh with out password. Am I missing anything? Thanks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia* -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia*
Re: Datanode doesn't connect to Namenode
Your hdfs name entry should be same on master and databnodes * namefs.default.name/name* *valuehdfs://cloud6:54310/value* Thanks On Wed, Aug 7, 2013 at 11:05 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: on my slave the process is running: hduser@cloud15:/usr/local/hadoop$ jps 19025 DataNode 19092 Jps On Wed, Aug 7, 2013 at 2:26 PM, Jitendra Yadav jeetuyadav200...@gmail.com wrote: Hi, Your logs showing that the process is creating IPC call not for namenode, it is hitting datanode itself. Check you please check you datanode processes status?. Regards Jitendra On Wed, Aug 7, 2013 at 10:29 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: Hi everyone, My slave machine (cloud15) the datanode shows this log. It doesn't connect to the master (cloud6). 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cloud15/192.168.188.15:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.RPC: Server at cloud15/192.168.188.15:54310 not available yet, Z... But when I type jps command on slave machine DataNode is running. This is my file core-site.xml in slave machine (cloud15): configuration property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://cloud15:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration In the master machine I just swap cloud15 to cloud6. In the file /etc/host I have (192.168.188.15 cloud15) and (192.168.188.6 cloud6) lines, and both machines access through ssh with out password. Am I missing anything? Thanks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia* -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia*
Re: Datanode doesn't connect to Namenode
Disable the firewall on data node and namenode machines.. Regards, Som Shekhar Sharma +91-8197243810 On Wed, Aug 7, 2013 at 11:33 PM, Jitendra Yadav jeetuyadav200...@gmail.comwrote: Your hdfs name entry should be same on master and databnodes * namefs.default.name/name* *valuehdfs://cloud6:54310/value* Thanks On Wed, Aug 7, 2013 at 11:05 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: on my slave the process is running: hduser@cloud15:/usr/local/hadoop$ jps 19025 DataNode 19092 Jps On Wed, Aug 7, 2013 at 2:26 PM, Jitendra Yadav jeetuyadav200...@gmail.com wrote: Hi, Your logs showing that the process is creating IPC call not for namenode, it is hitting datanode itself. Check you please check you datanode processes status?. Regards Jitendra On Wed, Aug 7, 2013 at 10:29 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: Hi everyone, My slave machine (cloud15) the datanode shows this log. It doesn't connect to the master (cloud6). 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: cloud15/192.168.188.15:54310. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS) 2013-08-07 13:44:03,110 INFO org.apache.hadoop.ipc.RPC: Server at cloud15/192.168.188.15:54310 not available yet, Z... But when I type jps command on slave machine DataNode is running. This is my file core-site.xml in slave machine (cloud15): configuration property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://cloud15:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration In the master machine I just swap cloud15 to cloud6. In the file /etc/host I have (192.168.188.15 cloud15) and (192.168.188.6 cloud6) lines, and both machines access through ssh with out password. Am I missing anything? Thanks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia* -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia*
Re: setLocalResources() on ContainerLaunchContext
Good that your timestamp worked... Now for hdfs try this hdfs://hdfs-host-name:hdfs-host-portabsolute-path now verify that your absolute path is correct. I hope it will work. bin/hadoop fs -ls absolute-path hdfs://isredeng:8020*//*kishore/kk.ksh... why // ?? you have hdfs file at absolute location /kishore/kk.sh? is /kishore and /kishore/kk.sh accessible to the user who is making startContainer call or the one running AM container? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Aug 6, 2013 at 10:43 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, Hitesh Omkar, Thanks for the replies. I tried getting the last modified timestamp like this and it works. Is this a right thing to do? File file = new File(/home_/dsadm/kishore/kk.ksh); shellRsrc.setTimestamp(file.lastModified()); And, when I tried using a hdfs file qualifying it with both node name and port, it didn't work, I get a similar error as earlier. String shellScriptPath = hdfs://isredeng:8020//kishore/kk.ksh; 13/08/07 01:36:28 INFO ApplicationMaster: Got container status for containerID= container_1375853431091_0005_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=File does not exist: hdfs://isredeng:8020/kishore/kk.ksh 13/08/07 01:36:28 INFO ApplicationMaster: Got failure status for a container : -1000 On Wed, Aug 7, 2013 at 7:45 AM, Harsh J ha...@cloudera.com wrote: Thanks Hitesh! P.s. Port isn't a requirement (and with HA URIs, you shouldn't add a port), but isredeng has to be the authority component. On Wed, Aug 7, 2013 at 7:37 AM, Hitesh Shah hit...@apache.org wrote: @Krishna, your logs showed the file error for hdfs://isredeng/kishore/kk.ksh I am assuming you have tried dfs -ls /kishore/kk.ksh and confirmed that the file exists? Also the qualified path seems to be missing the namenode port. I need to go back and check if a path without the port works by assuming the default namenode port. @Harsh, adding a helper function seems like a good idea. Let me file a jira to have the above added to one of the helper/client libraries. thanks -- Hitesh On Aug 6, 2013, at 6:47 PM, Harsh J wrote: It is kinda unnecessary to be asking developers to load in timestamps and length themselves. Why not provide a java.io.File, or perhaps a Path accepting API, that gets it automatically on their behalf using the FileSystem API internally? P.s. A HDFS file gave him a FNF, while a Local file gave him a proper TS/Len error. I'm guessing there's a bug here w.r.t. handling HDFS paths. On Wed, Aug 7, 2013 at 12:35 AM, Hitesh Shah hit...@apache.org wrote: Hi Krishna, YARN downloads a specified local resource on the container's node from the url specified. In all situtations, the remote url needs to be a fully qualified path. To verify that the file at the remote url is still valid, YARN expects you to provide the length and last modified timestamp of that file. If you use an hdfs path such as hdfs://namenode:port/absolute path to file, you will need to get the length and timestamp from HDFS. If you use file:///, the file should exist on all nodes and all nodes should have the file with the same length and timestamp for localization to work. ( For a single node setup, this works but tougher to get right on a multi-node setup - deploying the file via a rpm should likely work). -- Hitesh On Aug 6, 2013, at 11:11 AM, Omkar Joshi wrote: Hi, You need to match the timestamp. Probably get the timestamp locally before adding it. This is explicitly done to ensure that file is not updated after user makes the call to avoid possible errors. Thanks, Omkar Joshi Hortonworks Inc. On Tue, Aug 6, 2013 at 5:25 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I tried the following and it works! String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh; But now getting a timestamp error like below, when I passed 0 to setTimestamp() 13/08/06 08:23:48 INFO ApplicationMaster: Got container status for containerID= container_1375784329048_0017_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh changed on src filesystem (expected 0, was 136758058 On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote: Can you try passing a fully qualified local path? That is, including the file:/ scheme On Aug 6, 2013 4:05 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, The setResource() call on LocalResource() is expecting an argument of type org.apache.hadoop.yarn.api.records.URL which is converted from a string in the form of URI. This happens in the following call of Distributed Shell example, shellRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI( shellScriptPath))); So, if I give a local file I get a parsing error like below, which is
Re: setLocalResources() on ContainerLaunchContext
Hi Omkar, I will try that. I might have got 2 of '/' wrongly while trying it in different ways to make it work. The file kishore/kk.ksh is accessible to the same user that is running the AM container. And my another questions is to understand what are the exact benefits of using this resource localization? Can you please explain me briefly or point me some online documentation talking about it? Thanks, Kishore On Wed, Aug 7, 2013 at 11:49 PM, Omkar Joshi ojo...@hortonworks.com wrote: Good that your timestamp worked... Now for hdfs try this hdfs://hdfs-host-name:hdfs-host-portabsolute-path now verify that your absolute path is correct. I hope it will work. bin/hadoop fs -ls absolute-path hdfs://isredeng:8020*//*kishore/kk.ksh... why // ?? you have hdfs file at absolute location /kishore/kk.sh? is /kishore and /kishore/kk.sh accessible to the user who is making startContainer call or the one running AM container? Thanks, Omkar Joshi *Hortonworks Inc.* http://www.hortonworks.com On Tue, Aug 6, 2013 at 10:43 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi Harsh, Hitesh Omkar, Thanks for the replies. I tried getting the last modified timestamp like this and it works. Is this a right thing to do? File file = new File(/home_/dsadm/kishore/kk.ksh); shellRsrc.setTimestamp(file.lastModified()); And, when I tried using a hdfs file qualifying it with both node name and port, it didn't work, I get a similar error as earlier. String shellScriptPath = hdfs://isredeng:8020//kishore/kk.ksh; 13/08/07 01:36:28 INFO ApplicationMaster: Got container status for containerID= container_1375853431091_0005_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=File does not exist: hdfs://isredeng:8020/kishore/kk.ksh 13/08/07 01:36:28 INFO ApplicationMaster: Got failure status for a container : -1000 On Wed, Aug 7, 2013 at 7:45 AM, Harsh J ha...@cloudera.com wrote: Thanks Hitesh! P.s. Port isn't a requirement (and with HA URIs, you shouldn't add a port), but isredeng has to be the authority component. On Wed, Aug 7, 2013 at 7:37 AM, Hitesh Shah hit...@apache.org wrote: @Krishna, your logs showed the file error for hdfs://isredeng/kishore/kk.ksh I am assuming you have tried dfs -ls /kishore/kk.ksh and confirmed that the file exists? Also the qualified path seems to be missing the namenode port. I need to go back and check if a path without the port works by assuming the default namenode port. @Harsh, adding a helper function seems like a good idea. Let me file a jira to have the above added to one of the helper/client libraries. thanks -- Hitesh On Aug 6, 2013, at 6:47 PM, Harsh J wrote: It is kinda unnecessary to be asking developers to load in timestamps and length themselves. Why not provide a java.io.File, or perhaps a Path accepting API, that gets it automatically on their behalf using the FileSystem API internally? P.s. A HDFS file gave him a FNF, while a Local file gave him a proper TS/Len error. I'm guessing there's a bug here w.r.t. handling HDFS paths. On Wed, Aug 7, 2013 at 12:35 AM, Hitesh Shah hit...@apache.org wrote: Hi Krishna, YARN downloads a specified local resource on the container's node from the url specified. In all situtations, the remote url needs to be a fully qualified path. To verify that the file at the remote url is still valid, YARN expects you to provide the length and last modified timestamp of that file. If you use an hdfs path such as hdfs://namenode:port/absolute path to file, you will need to get the length and timestamp from HDFS. If you use file:///, the file should exist on all nodes and all nodes should have the file with the same length and timestamp for localization to work. ( For a single node setup, this works but tougher to get right on a multi-node setup - deploying the file via a rpm should likely work). -- Hitesh On Aug 6, 2013, at 11:11 AM, Omkar Joshi wrote: Hi, You need to match the timestamp. Probably get the timestamp locally before adding it. This is explicitly done to ensure that file is not updated after user makes the call to avoid possible errors. Thanks, Omkar Joshi Hortonworks Inc. On Tue, Aug 6, 2013 at 5:25 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: I tried the following and it works! String shellScriptPath = file:///home_/dsadm/kishore/kk.ksh; But now getting a timestamp error like below, when I passed 0 to setTimestamp() 13/08/06 08:23:48 INFO ApplicationMaster: Got container status for containerID= container_1375784329048_0017_01_02, state=COMPLETE, exitStatus=-1000, diagnostics=Resource file:/home_/dsadm/kishore/kk.ksh changed on src filesystem (expected 0, was 136758058 On Tue, Aug 6, 2013 at 5:24 PM, Harsh J ha...@cloudera.com wrote: Can you try passing a fully qualified local path? That is, including the
compatible hadoop version for hbase-0.94.10
Hi, I need to create a opentsdb cluster which needs hbase and hadoop. I picked the latest hbase supported by opentsdb which is hbase-.0.94.10 Can anybody please suggest which latest version of Hadoop I can use with hbase-0.94.10 Thanks in advance. Regards, VSR.
Re: compatible hadoop version for hbase-0.94.10
If you look at pom.xml for 0.94, you should see hadoop-1.1 and hadoop-1.2 profiles. Those hadoop releases (1.1.2 and 1.2.0, respectively) should work. On Wed, Aug 7, 2013 at 12:13 PM, oc tsdb oc.t...@gmail.com wrote: Hi, I need to create a opentsdb cluster which needs hbase and hadoop. I picked the latest hbase supported by opentsdb which is hbase-.0.94.10 Can anybody please suggest which latest version of Hadoop I can use with hbase-0.94.10 Thanks in advance. Regards, VSR.
Re: compatible hadoop version for hbase-0.94.10
Thanks Ted. Regards, OC. On Wed, Aug 7, 2013 at 12:22 PM, Ted Yu yuzhih...@gmail.com wrote: If you look at pom.xml for 0.94, you should see hadoop-1.1 and hadoop-1.2 profiles. Those hadoop releases (1.1.2 and 1.2.0, respectively) should work. On Wed, Aug 7, 2013 at 12:13 PM, oc tsdb oc.t...@gmail.com wrote: Hi, I need to create a opentsdb cluster which needs hbase and hadoop. I picked the latest hbase supported by opentsdb which is hbase-.0.94.10 Can anybody please suggest which latest version of Hadoop I can use with hbase-0.94.10 Thanks in advance. Regards, VSR.
specify Mapred tasks and slots
Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks.
Re: specify Mapred tasks and slots
use mapred.tasktracker.reduce.tasks in mapred-site.xml the default value is 2...Which means that on this task tracker it will not run more than 2 reducer tasks at any given point of time.. Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks.
Re: specify Mapred tasks and slots
Slots are decided upon the configuration of machines, RAM etc... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks.
Re: specify Mapred tasks and slots
My question is can I specify how many slots to be used for each M/R task? On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma shekhar2...@gmail.comwrote: Slots are decided upon the configuration of machines, RAM etc... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks.
RE: specify Mapred tasks and slots
One task can use only one slot, It cannot use more than one slot. If the task is Map task then it will use one map slot and if the task is reduce task the it will use one reduce slot from the configured ones. Thanks Devaraj k From: Azuryy Yu [mailto:azury...@gmail.com] Sent: 08 August 2013 08:27 To: user@hadoop.apache.org Subject: Re: specify Mapred tasks and slots My question is can I specify how many slots to be used for each M/R task? On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma shekhar2...@gmail.commailto:shekhar2...@gmail.com wrote: Slots are decided upon the configuration of machines, RAM etc... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.commailto:azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks.
Re: specify Mapred tasks and slots
What Devaraj said. Except that if you use CapacityScheduler, then you can bind together memory requests and slot concepts, and be able to have a task grab more than one slot for itself when needed. We've discussed this aspect previously at http://search-hadoop.com/m/gnFs91yIg1e On Thu, Aug 8, 2013 at 8:34 AM, Devaraj k devara...@huawei.com wrote: One task can use only one slot, It cannot use more than one slot. If the task is Map task then it will use one map slot and if the task is reduce task the it will use one reduce slot from the configured ones. Thanks Devaraj k From: Azuryy Yu [mailto:azury...@gmail.com] Sent: 08 August 2013 08:27 To: user@hadoop.apache.org Subject: Re: specify Mapred tasks and slots My question is can I specify how many slots to be used for each M/R task? On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma shekhar2...@gmail.com wrote: Slots are decided upon the configuration of machines, RAM etc... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks. -- Harsh J
Re: specify Mapred tasks and slots
Thanks Harsh and all friends response. That's helpful. On Thu, Aug 8, 2013 at 11:55 AM, Harsh J ha...@cloudera.com wrote: What Devaraj said. Except that if you use CapacityScheduler, then you can bind together memory requests and slot concepts, and be able to have a task grab more than one slot for itself when needed. We've discussed this aspect previously at http://search-hadoop.com/m/gnFs91yIg1e On Thu, Aug 8, 2013 at 8:34 AM, Devaraj k devara...@huawei.com wrote: One task can use only one slot, It cannot use more than one slot. If the task is Map task then it will use one map slot and if the task is reduce task the it will use one reduce slot from the configured ones. Thanks Devaraj k From: Azuryy Yu [mailto:azury...@gmail.com] Sent: 08 August 2013 08:27 To: user@hadoop.apache.org Subject: Re: specify Mapred tasks and slots My question is can I specify how many slots to be used for each M/R task? On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma shekhar2...@gmail.com wrote: Slots are decided upon the configuration of machines, RAM etc... Regards, Som Shekhar Sharma +91-8197243810 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote: Hi Dears, Can I specify how many slots to use for reduce? I know we can specify reduces tasks, but is there one task occupy one slot? it it possible that one tak occupy more than one slot in Hadoop-1.1.2. Thanks. -- Harsh J
is it ok? build hadoop cluster on kvm on product envionment?
hi,all: my company has not much burget for boxes,if i build cluster on kvm ,it will cause a lot of impact on performance??