Re: Hadoop 2.4.0 How to change Configured Capacity
Hi, Both ”dfs.name.data.dir” and “dfs.datanode.data.dir” are not set in my cluster. By the way I have searched around about these two parameters, I cannot find them in Hadoop Default page. http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml Can you please advise where to set them and how to set them? in hdfs-site.xml or in core-site.xml or another configuration file? Many thanks Arthur On 29 Jul, 2014, at 1:27 am, hadoop hive hadooph...@gmail.com wrote: You need to add each disk inside dfs.name.data.dir parameter. On Jul 28, 2014 5:14 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have installed Hadoop 2.4.0 with 5 nodes, each node physically has 4T hard disk, when checking the configured capacity, I found it is about 49.22 GB per node, can anyone advise how to set bigger “configured capacity” e.g. 2T or more per node? Name node Configured Capacity: 264223436800 (246.08 GB) Each Datanode Configured Capacity: 52844687360 (49.22 GB) regards Arthur
Re: Hadoop 2.4.0 How to change Configured Capacity
You will need to set them up in the hdfs-site.xml. P.s. Their default is present in the hdfs-default.xml you linked to: http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml#dfs.datanode.data.dir On Sat, Aug 2, 2014 at 12:29 PM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, Both ”dfs.name.data.dir” and “dfs.datanode.data.dir” are not set in my cluster. By the way I have searched around about these two parameters, I cannot find them in Hadoop Default page. http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml Can you please advise where to set them and how to set them? in hdfs-site.xml or in core-site.xml or another configuration file? Many thanks Arthur On 29 Jul, 2014, at 1:27 am, hadoop hive hadooph...@gmail.com wrote: You need to add each disk inside dfs.name.data.dir parameter. On Jul 28, 2014 5:14 AM, arthur.hk.c...@gmail.com arthur.hk.c...@gmail.com wrote: Hi, I have installed Hadoop 2.4.0 with 5 nodes, each node physically has 4T hard disk, when checking the configured capacity, I found it is about 49.22 GB per node, can anyone advise how to set bigger “configured capacity” e.g. 2T or more per node? Name node Configured Capacity: 264223436800 (246.08 GB) Each Datanode Configured Capacity: 52844687360 (49.22 GB) regards Arthur -- Harsh J
RE: ResourceManager debugging
Hi Yehia , I set YARN_RESOURCEMANAGER_OPTS in installation folder/bin/yarn and i was able to debug. YARN_RESOURCEMANAGER_OPTS=$YARN_RESOURCEMANAGER_OPTS -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7089,suspend=n Regards, Naga Huawei Technologies Co., Ltd. Phone: Fax: Mobile: +91 9980040283 Email: naganarasimh...@huawei.commailto:naganarasimh...@huawei.com Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com ¡This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! From: Yehia Elshater [y.z.elsha...@gmail.com] Sent: Saturday, August 02, 2014 09:22 To: user@hadoop.apache.org Subject: ResourceManager debugging Hi, I am wondering how to remote debugging Yarn's RM using eclipse. I tried to adding the debugging options -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=1337 to YARN_OPTS but it did not work. Any suggestions ? Thanks
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)
Hi everyone, I am having an issue with MapReduce jobs running through Hive being killed after 600s timeouts and with very simple jobs taking over 3 hours (or just failing) for a set of files with a compressed size of only 1-2gb. I will try and provide as much information as I can here, so if someone can help, that would be really great. I have a cluster of 7 nodes (1 master, 6 slaves) with the following config: Master node: 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz 64GB DDR3 SDRAM 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5) Slave nodes (each): Intel Xeon 4-core E3-1220v3 @ 3.1GHz 32GB DDR3 SDRAM 4 x 2TB SATA-3 hard drive Operating system on all nodes: openSUSE Linux 13.1 We have the Apache BigTop package version 0.7, with Hadoop version 2.0.6-alpha and Hive version 0.11. YARN has been configured as per these recommendations: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/ I also set the following additional settings before running jobs: set yarn.nodemanager.resource.cpu-vcores=4; set mapred.tasktracker.map.tasks.maximum=4; set hive.hadoop.supports.splittable.combineinputformat=true; set hive.merge.mapredfiles=true; No one else uses this cluster while I am working. What I¹m trying to do: I have a bunch of XML files on HDFS, which I am reading into Hive using this SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to create a series of tables from these files and finally run a Python script on one of them to perform some scientific calculations. The files are .xml.gz format and (uncompressed) are only about 4mb in size each. hive.input.format is set to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to avoid the ³small files problem.² Problems: My HQL statements work perfectly for up to 1000 of these files. Even for much larger numbers, doing select * works fine, which means the files are being read properly, but if I do something as simple as selecting just one column from the whole table for a larger number of files, containers start being killed and jobs fail with this error in the container logs: 2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp .-ext-10001/_tmp.00_0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException): No lease on /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp .-ext-10001/_tmp.00_0: File does not exist. Holder DFSClient_attempt_1403771939632_0402_m_00_0_-1627633686_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem. java:2398) Killed jobs show the above and also the following message: AttemptID:attempt_1403771939632_0402_m_00_0 Timed out after 600 secsContainer killed by the ApplicationMaster. Also, in the node logs, I get a lot of pings like this: INFO [IPC Server handler 17 on 40961] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1403771939632_0362_m_02_0 For 5000 files (1gb compressed), the selection of a single column finishes, but takes over 3 hours. For 10,000 files, the job hangs on about 4% map and then errors out. While the jobs are running, I notice that the containers are not evenly distributed across the cluster. Some nodes lie idle, while the application master node runs 7 containers, maxing out the 28gb of RAM allocated to Hadoop on each slave node. This is the output of netstat i while the column selection is running: Kernel Interface table Iface MTU MetRX-OK RX-ERR RX-DRP RX-OVRTX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 79515196 0 2265807 0 45694758 0 0 0 BMRU eth1 1500 0 77410508 0 0 0 40815746 0 0 0 BMRU lo65536 0 16593808 0 0 0 16593808 0 0 0 LRU Are there some settings I am missing that mean the cluster isn¹t processing this data as efficiently as it can? I am very new to Hadoop and there are so many logs, etc, that troubleshooting can be a bit overwhelming. Where else should I be looking to try and diagnose what is wrong? Thanks in advance for any help you can give! Kind regards, Ana
Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)
Can you check the ulimit for tour user. Which might be causing this. On Aug 2, 2014 8:54 PM, Ana Gillan ana.gil...@gmail.com wrote: Hi everyone, I am having an issue with MapReduce jobs running through Hive being killed after 600s timeouts and with very simple jobs taking over 3 hours (or just failing) for a set of files with a compressed size of only 1-2gb. I will try and provide as much information as I can here, so if someone can help, that would be really great. I have a cluster of 7 nodes (1 master, 6 slaves) with the following config: • Master node: – 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz – 64GB DDR3 SDRAM – 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5) • Slave nodes (each): – Intel Xeon 4-core E3-1220v3 @ 3.1GHz – 32GB DDR3 SDRAM – 4 x 2TB SATA-3 hard drive • Operating system on all nodes: openSUSE Linux 13.1 We have the Apache BigTop package version 0.7, with Hadoop version 2.0.6-alpha and Hive version 0.11. YARN has been configured as per these recommendations: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/ I also set the following additional settings before running jobs: set yarn.nodemanager.resource.cpu-vcores=4; set mapred.tasktracker.map.tasks.maximum=4; set hive.hadoop.supports.splittable.combineinputformat=true; set hive.merge.mapredfiles=true; No one else uses this cluster while I am working. What I’m trying to do: I have a bunch of XML files on HDFS, which I am reading into Hive using this SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to create a series of tables from these files and finally run a Python script on one of them to perform some scientific calculations. The files are .xml.gz format and (uncompressed) are only about 4mb in size each. hive.input.format is set to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to avoid the “small files problem.” Problems: My HQL statements work perfectly for up to 1000 of these files. Even for much larger numbers, doing select * works fine, which means the files are being read properly, but if I do something as simple as selecting just one column from the whole table for a larger number of files, containers start being killed and jobs fail with this error in the container logs: 2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-ext-10001/_tmp.00_0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-ext-10001/_tmp.00_0: File does not exist. Holder DFSClient_attempt_1403771939632_0402_m_00_0_-1627633686_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2398) Killed jobs show the above and also the following message: AttemptID:attempt_1403771939632_0402_m_00_0 Timed out after 600 secsContainer killed by the ApplicationMaster. Also, in the node logs, I get a lot of pings like this: INFO [IPC Server handler 17 on 40961] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1403771939632_0362_m_02_0 For 5000 files (1gb compressed), the selection of a single column finishes, but takes over 3 hours. For 10,000 files, the job hangs on about 4% map and then errors out. While the jobs are running, I notice that the containers are not evenly distributed across the cluster. Some nodes lie idle, while the application master node runs 7 containers, maxing out the 28gb of RAM allocated to Hadoop on each slave node. This is the output of netstat –i while the column selection is running: Kernel Interface table Iface MTU MetRX-OK RX-ERR RX-DRP RX-OVRTX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 79515196 0 2265807 0 45694758 0 0 0 BMRU eth1 1500 0 77410508 0 0 0 40815746 0 0 0 BMRU lo65536 0 16593808 0 0 0 16593808 0 0 0 LRU Are there some settings I am missing that mean the cluster isn’t processing this data as efficiently as it can? I am very new to Hadoop and there are so many logs, etc, that troubleshooting can be a bit overwhelming. Where else should I be looking to try and diagnose what is wrong? Thanks in advance for any help you can give! Kind regards, Ana
Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)
For my own user? It is as follows: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 483941 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 800 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited From: hadoop hive hadooph...@gmail.com Reply-To: user@hadoop.apache.org Date: Saturday, 2 August 2014 16:34 To: user@hadoop.apache.org Subject: Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException) Can you check the ulimit for tour user. Which might be causing this. On Aug 2, 2014 8:54 PM, Ana Gillan ana.gil...@gmail.com wrote: Hi everyone, I am having an issue with MapReduce jobs running through Hive being killed after 600s timeouts and with very simple jobs taking over 3 hours (or just failing) for a set of files with a compressed size of only 1-2gb. I will try and provide as much information as I can here, so if someone can help, that would be really great. I have a cluster of 7 nodes (1 master, 6 slaves) with the following config: Master node: 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz 64GB DDR3 SDRAM 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5) Slave nodes (each): Intel Xeon 4-core E3-1220v3 @ 3.1GHz 32GB DDR3 SDRAM 4 x 2TB SATA-3 hard drive Operating system on all nodes: openSUSE Linux 13.1 We have the Apache BigTop package version 0.7, with Hadoop version 2.0.6-alpha and Hive version 0.11. YARN has been configured as per these recommendations: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/ I also set the following additional settings before running jobs: set yarn.nodemanager.resource.cpu-vcores=4; set mapred.tasktracker.map.tasks.maximum=4; set hive.hadoop.supports.splittable.combineinputformat=true; set hive.merge.mapredfiles=true; No one else uses this cluster while I am working. What I¹m trying to do: I have a bunch of XML files on HDFS, which I am reading into Hive using this SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to create a series of tables from these files and finally run a Python script on one of them to perform some scientific calculations. The files are .xml.gz format and (uncompressed) are only about 4mb in size each. hive.input.format is set to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to avoid the ³small files problem.² Problems: My HQL statements work perfectly for up to 1000 of these files. Even for much larger numbers, doing select * works fine, which means the files are being read properly, but if I do something as simple as selecting just one column from the whole table for a larger number of files, containers start being killed and jobs fail with this error in the container logs: 2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.- ext-10001/_tmp.00_0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.L easeExpiredException): No lease on /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.- ext-10001/_tmp.00_0: File does not exist. Holder DFSClient_attempt_1403771939632_0402_m_00_0_-1627633686_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.ja va:2398) Killed jobs show the above and also the following message: AttemptID:attempt_1403771939632_0402_m_00_0 Timed out after 600 secsContainer killed by the ApplicationMaster. Also, in the node logs, I get a lot of pings like this: INFO [IPC Server handler 17 on 40961] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from attempt_1403771939632_0362_m_02_0 For 5000 files (1gb compressed), the selection of a single column finishes, but takes over 3 hours. For 10,000 files, the job hangs on about 4% map and then errors out. While the jobs are running, I notice that the containers are not evenly distributed across the cluster. Some nodes lie idle, while the application master node runs 7 containers, maxing out the 28gb of RAM allocated to Hadoop on each slave node. This is the output of netstat i while the column selection is running: Kernel Interface table Iface MTU MetRX-OK RX-ERR RX-DRP RX-OVRTX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 79515196 0 2265807 0
Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)
Filemax across the cluster is set to over 6 million. I¹ve checked the open file limits for the accounts used by the Hadoop daemons and they have an open file limit of 32K. This is confirmed by the various .out files, e.g. /var/log/hadoop-hdfs/hadoop-hdfs-datanode-slave1.out Contains open files (-n) 32768. Is this too low? What is the recommended value for open files on all nodes? Also does my own user need to have the same value? I¹ve also tried running the same column selection on files crushed by the filecrush program https://github.com/edwardcapriolo/filecrush/ This created 5 large files out of the 10,000 small files (still totally 2gb compressed), but this job won¹t progress past 0% map. From: Ana Gillan ana.gil...@gmail.com Date: Saturday, 2 August 2014 16:36 To: user@hadoop.apache.org Subject: Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException) For my own user? It is as follows: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 483941 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 800 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited From: hadoop hive hadooph...@gmail.com Reply-To: user@hadoop.apache.org Date: Saturday, 2 August 2014 16:34 To: user@hadoop.apache.org Subject: Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException) Can you check the ulimit for tour user. Which might be causing this. On Aug 2, 2014 8:54 PM, Ana Gillan ana.gil...@gmail.com wrote: Hi everyone, I am having an issue with MapReduce jobs running through Hive being killed after 600s timeouts and with very simple jobs taking over 3 hours (or just failing) for a set of files with a compressed size of only 1-2gb. I will try and provide as much information as I can here, so if someone can help, that would be really great. I have a cluster of 7 nodes (1 master, 6 slaves) with the following config: Master node: 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz 64GB DDR3 SDRAM 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5) Slave nodes (each): Intel Xeon 4-core E3-1220v3 @ 3.1GHz 32GB DDR3 SDRAM 4 x 2TB SATA-3 hard drive Operating system on all nodes: openSUSE Linux 13.1 We have the Apache BigTop package version 0.7, with Hadoop version 2.0.6-alpha and Hive version 0.11. YARN has been configured as per these recommendations: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/ I also set the following additional settings before running jobs: set yarn.nodemanager.resource.cpu-vcores=4; set mapred.tasktracker.map.tasks.maximum=4; set hive.hadoop.supports.splittable.combineinputformat=true; set hive.merge.mapredfiles=true; No one else uses this cluster while I am working. What I¹m trying to do: I have a bunch of XML files on HDFS, which I am reading into Hive using this SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to create a series of tables from these files and finally run a Python script on one of them to perform some scientific calculations. The files are .xml.gz format and (uncompressed) are only about 4mb in size each. hive.input.format is set to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to avoid the ³small files problem.² Problems: My HQL statements work perfectly for up to 1000 of these files. Even for much larger numbers, doing select * works fine, which means the files are being read properly, but if I do something as simple as selecting just one column from the whole table for a larger number of files, containers start being killed and jobs fail with this error in the container logs: 2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.- ext-10001/_tmp.00_0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.L easeExpiredException): No lease on /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.- ext-10001/_tmp.00_0: File does not exist. Holder DFSClient_attempt_1403771939632_0402_m_00_0_-1627633686_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.ja va:2398) Killed jobs show the above and also the following message: AttemptID:attempt_1403771939632_0402_m_00_0
Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)
32k seems fine for mapred user(hope you using this for fetching you data) but if you have huge data on your system you can try 64k. Did you try increasing you time from 600 sec to like 20 mins. Can you also check on which stage its getting hanged or killed. Thanks On Aug 2, 2014 9:38 PM, Ana Gillan ana.gil...@gmail.com wrote: Filemax across the cluster is set to over 6 million. I’ve checked the open file limits for the accounts used by the Hadoop daemons and they have an open file limit of 32K. This is confirmed by the various .out files, e.g. /var/log/hadoop-hdfs/hadoop-hdfs-datanode-slave1.out Contains open files (-n) 32768. Is this too low? What is the recommended value for open files on all nodes? Also does my own user need to have the same value? I’ve also tried running the same column selection on files crushed by the filecrush program https://github.com/edwardcapriolo/filecrush/ This created 5 large files out of the 10,000 small files (still totally 2gb compressed), but this job won’t progress past 0% map. From: Ana Gillan ana.gil...@gmail.com Date: Saturday, 2 August 2014 16:36 To: user@hadoop.apache.org Subject: Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException) For my own user? It is as follows: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 483941 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 800 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited From: hadoop hive hadooph...@gmail.com Reply-To: user@hadoop.apache.org Date: Saturday, 2 August 2014 16:34 To: user@hadoop.apache.org Subject: Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException) Can you check the ulimit for tour user. Which might be causing this. On Aug 2, 2014 8:54 PM, Ana Gillan ana.gil...@gmail.com wrote: Hi everyone, I am having an issue with MapReduce jobs running through Hive being killed after 600s timeouts and with very simple jobs taking over 3 hours (or just failing) for a set of files with a compressed size of only 1-2gb. I will try and provide as much information as I can here, so if someone can help, that would be really great. I have a cluster of 7 nodes (1 master, 6 slaves) with the following config: • Master node: – 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz – 64GB DDR3 SDRAM – 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5) • Slave nodes (each): – Intel Xeon 4-core E3-1220v3 @ 3.1GHz – 32GB DDR3 SDRAM – 4 x 2TB SATA-3 hard drive • Operating system on all nodes: openSUSE Linux 13.1 We have the Apache BigTop package version 0.7, with Hadoop version 2.0.6-alpha and Hive version 0.11. YARN has been configured as per these recommendations: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/ I also set the following additional settings before running jobs: set yarn.nodemanager.resource.cpu-vcores=4; set mapred.tasktracker.map.tasks.maximum=4; set hive.hadoop.supports.splittable.combineinputformat=true; set hive.merge.mapredfiles=true; No one else uses this cluster while I am working. What I’m trying to do: I have a bunch of XML files on HDFS, which I am reading into Hive using this SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to create a series of tables from these files and finally run a Python script on one of them to perform some scientific calculations. The files are .xml.gz format and (uncompressed) are only about 4mb in size each. hive.input.format is set to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to avoid the “small files problem.” Problems: My HQL statements work perfectly for up to 1000 of these files. Even for much larger numbers, doing select * works fine, which means the files are being read properly, but if I do something as simple as selecting just one column from the whole table for a larger number of files, containers start being killed and jobs fail with this error in the container logs: 2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient: Failed to close file /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-ext-10001/_tmp.00_0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on
Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)
I¹m not sure which user is fetching the data, but I¹m assuming no one changed that from the default. The data isn¹t huge in size, just in number, so I suppose the open files limit is not the issue? I¹m running the job again with mapred.task.timeout=120, but containers are still being killed in the same way Just without the timeout message. And it somehow massively slowed down the machine as well, so even typing commands took a long time (???) I¹m not sure what you mean by which stage it¹s getting killed on. If you mean in the command line progress counters, it's always on Stage-1. Also, this is the end of the container log for the killed container. Failed and killed jobs always start fine with lots of these ³processing file² and ³processing alias² statements, but then suddenly warn about a DataStreamer Exception and then are killed with an error, which is the same as the warning. Not sure if this exception is the actual issue or if it¹s just a knock-on effect of something else. 2014-08-02 17:47:38,618 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w63.xml.gz 2014-08-02 17:47:38,641 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:38,932 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w67.xml.gz 2014-08-02 17:47:38,989 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:42,675 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w6i.xml.gz 2014-08-02 17:47:42,888 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:45,416 WARN [Thread-8] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException): No lease on /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp. -ext-10001/_tmp.06_0: File does not exist. Holder DFSClient_attempt_1403771939632_0409_m_06_0_303479000_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem. java:2398) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNames ystem.java:2217) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam esystem.java:2137) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRp cServer.java:491) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator PB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto bufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at org.apache.hadoop.ipc.Client.call(Client.java:1240) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav a:202) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati onHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand ler.java:83) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBloc k(ClientNamenodeProtocolTranslatorPB.java:311) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFS OutputStream.java:1156) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DF SOutputStream.java:1009) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java :464) 2014-08-02 17:47:45,417 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient: Failed to close file
Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)
Hey try change ulimit to 64k for user which running query and change time from scheduler which should be set to 600sec. Check the jt logs also for further issues. Thanks On Aug 2, 2014 11:09 PM, Ana Gillan ana.gil...@gmail.com wrote: I’m not sure which user is fetching the data, but I’m assuming no one changed that from the default. The data isn’t huge in size, just in number, so I suppose the open files limit is not the issue? I’m running the job again with mapred.task.timeout=120, but containers are still being killed in the same way… Just without the timeout message. And it somehow massively slowed down the machine as well, so even typing commands took a long time (???) I’m not sure what you mean by which stage it’s getting killed on. If you mean in the command line progress counters, it's always on Stage-1. Also, this is the end of the container log for the killed container. Failed and killed jobs always start fine with lots of these “processing file” and “processing alias” statements, but then suddenly warn about a DataStreamer Exception and then are killed with an error, which is the same as the warning. Not sure if this exception is the actual issue or if it’s just a knock-on effect of something else. 2014-08-02 17:47:38,618 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w63.xml.gz 2014-08-02 17:47:38,641 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:38,932 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w67.xml.gz 2014-08-02 17:47:38,989 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:42,675 INFO [main] org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w6i.xml.gz 2014-08-02 17:47:42,888 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek 2014-08-02 17:47:45,416 WARN [Thread-8] org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.-ext-10001/_tmp.06_0: File does not exist. Holder DFSClient_attempt_1403771939632_0409_m_06_0_303479000_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2398) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2217) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2137) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:491) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at org.apache.hadoop.ipc.Client.call(Client.java:1240) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy10.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:311) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1156) at
Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)
Ok, I will request this to be done, as I¹m not an admin, and then get back to this thread on Monday. Thank you! From: hadoop hive hadooph...@gmail.com Reply-To: user@hadoop.apache.org Date: Saturday, 2 August 2014 18:50 To: user@hadoop.apache.org Subject: Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode .LeaseExpiredException) Hey try change ulimit to 64k for user which running query and change time from scheduler which should be set to 600sec. Check the jt logs also for further issues. Thanks
Re: Fair Scheduler issue
Hi Julien, Did you try to change yarn.nodemanager.resource.memory-mb to 13 GB for example (the other 3 for OS) ? Thanks On 1 August 2014 05:41, Julien Naour julna...@gmail.com wrote: Hello, I'm currently using HDP 2.0 so it's Hadoop 2.2.0. My cluster consist in 4 nodes, 16 coeurs 16 GB RAM 4*3To each. Recently we passed from 2 users to 8. We need now a more appropriate Scheduler. We begin with Capacity Scheduler. There was some issues with the different queues particularly when using some spark shell that used some resources for a long time. So we decide to try Fair Scheduler which seems to be a good solution. The problem is that FairScheduler doesn't allow all available resources. It's capped at 73% of the available memory for one jobs 63% for 2 jobs and 45% for 3 jobs. The problem could come from shells that take resources for a long time. We tried some configuration like yarn.scheduler.fair.user-as-default-queue=false or play with the minimum ressources allocated minResources in fair-scheduler.xml but it doesn't seems to resolve the issue. Any advices or good practices to held a good Fair Scheduler? Regards, Julien
Re: ResourceManager debugging
Hi Naga, Thanks a lot for your help. I have submitted multiple MapReduce jobs, the debugger is attached successfully to eclipse and I put a breakpoint in org.apache.hadoop.yarn.server.resourcemanager.ResourceManager, but, eclipse debugger still waits without any interruption. However, I put another breakpoint in another class (for example RMAppAttemptContainerAllocatedEvent), the debugger reached the code. Do you have any idea why ResourceManager code was not reachable by the debugger? Thanks On 2 August 2014 08:04, Naganarasimha G R (Naga) garlanaganarasi...@huawei.com wrote: Hi Yehia , I set *YARN_RESOURCEMANAGER_OPTS *in *installation folder/bin/yarn *and i was able to debug. YARN_RESOURCEMANAGER_OPTS=$YARN_RESOURCEMANAGER_OPTS -Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=7089,suspend=n Regards, Naga Huawei Technologies Co., Ltd. Phone: Fax: Mobile: +91 9980040283 Email: naganarasimh...@huawei.com Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com ¡ This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -- *From:* Yehia Elshater [y.z.elsha...@gmail.com] *Sent:* Saturday, August 02, 2014 09:22 *To:* user@hadoop.apache.org *Subject:* ResourceManager debugging Hi, I am wondering how to remote debugging Yarn's RM using eclipse. I tried to adding the debugging options -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=1337 to YARN_OPTS but it did not work. Any suggestions ? Thanks
Exception in hadoop and java
Hi, I am writing a code in java that connects to hadoop. Earlier it was running fine. I wanted to add some charts and I used jfree api,it started giving this error.chart is not using hadoop. I removed the chart ,but it keeps coming.If anybody can look into it and help me in understanding that why this error came up and how can I handle it? *14/08/02 21:33:01 ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser gnu.xml.dom.JAXPFactory@8f2ca6* *:java.lang.**UnsupportedOperationException:* * setXIncludeAware is not supported on this JAXP implementation or earlier: class gnu.xml.dom.JAXPFactory java.lang.* *UnsupportedOperationException:* * setXIncludeAware is not supported on this JAXP implementation or earlier: class gnu.xml.dom.JAXPFactoryat javax.xml.parsers.* *DocumentBuilderFactory.**setXIncludeAware(**DocumentBuilderFactory.java:* *589) at org.apache.hadoop.conf.**Configuration.loadResource(* *Configuration.java:1143)at org.apache.hadoop.conf.* *Configuration.loadResources(* *Configuration.java:1119)at org.apache.hadoop.conf.* *Configuration.getProps(* *Configuration.java:1063) at org.apache.hadoop.conf.* *Configuration.get(* *Configuration.java:470)at org.apache.hadoop.fs.* *FileSystem.getDefaultUri(* *FileSystem.java:131)at org.apache.hadoop.fs.* *FileSystem.get(FileSystem.* *java:123)at myclass1.myfunction1(**myclass1**.* *java:39) at myclass1.main(**myclass1**.java:25)* Thanks, Ekta