Re: Hadoop 2.4.0 How to change Configured Capacity

2014-08-02 Thread arthur.hk.c...@gmail.com
Hi,

Both ”dfs.name.data.dir” and “dfs.datanode.data.dir” are not set in my cluster. 
By the way I have searched around about these two parameters, I cannot find 
them in Hadoop Default page.
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Can you please advise where to set them and how to set them? in hdfs-site.xml  
or in core-site.xml or another configuration file?

Many thanks
Arthur

On 29 Jul, 2014, at 1:27 am, hadoop hive hadooph...@gmail.com wrote:

 You need to add each disk inside dfs.name.data.dir parameter.
 
 On Jul 28, 2014 5:14 AM, arthur.hk.c...@gmail.com 
 arthur.hk.c...@gmail.com wrote:
 Hi,
 
 I have installed Hadoop 2.4.0 with 5 nodes, each node physically has 4T hard 
 disk, when checking the configured capacity, I found it is about 49.22 GB per 
 node, can anyone advise how to set bigger “configured capacity” e.g. 2T or 
 more  per node?
 
 Name node
 Configured Capacity: 264223436800 (246.08 GB)
 
 Each Datanode
 Configured Capacity: 52844687360 (49.22 GB)
 
 regards
 Arthur



Re: Hadoop 2.4.0 How to change Configured Capacity

2014-08-02 Thread Harsh J
You will need to set them up in the hdfs-site.xml.

P.s. Their default is present in the hdfs-default.xml you linked to:
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml#dfs.datanode.data.dir

On Sat, Aug 2, 2014 at 12:29 PM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
 Hi,

 Both ”dfs.name.data.dir” and “dfs.datanode.data.dir” are not set in my
 cluster. By the way I have searched around about these two parameters, I
 cannot find them in Hadoop Default page.
 http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

 Can you please advise where to set them and how to set them? in
 hdfs-site.xml  or in core-site.xml or another configuration file?

 Many thanks
 Arthur

 On 29 Jul, 2014, at 1:27 am, hadoop hive hadooph...@gmail.com wrote:

 You need to add each disk inside dfs.name.data.dir parameter.

 On Jul 28, 2014 5:14 AM, arthur.hk.c...@gmail.com
 arthur.hk.c...@gmail.com wrote:

 Hi,

 I have installed Hadoop 2.4.0 with 5 nodes, each node physically has 4T
 hard disk, when checking the configured capacity, I found it is about 49.22
 GB per node, can anyone advise how to set bigger “configured capacity” e.g.
 2T or more  per node?

 Name node
 Configured Capacity: 264223436800 (246.08 GB)

 Each Datanode
 Configured Capacity: 52844687360 (49.22 GB)

 regards
 Arthur





-- 
Harsh J


RE: ResourceManager debugging

2014-08-02 Thread Naganarasimha G R (Naga)
Hi Yehia ,

I set  YARN_RESOURCEMANAGER_OPTS in installation folder/bin/yarn  and 
i was able to debug.



 YARN_RESOURCEMANAGER_OPTS=$YARN_RESOURCEMANAGER_OPTS -Xdebug 
-Xrunjdwp:server=y,transport=dt_socket,address=7089,suspend=n



Regards,

Naga



Huawei Technologies Co., Ltd.
Phone:
Fax:
Mobile:  +91 9980040283
Email: naganarasimh...@huawei.commailto:naganarasimh...@huawei.com
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com

¡This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!


From: Yehia Elshater [y.z.elsha...@gmail.com]
Sent: Saturday, August 02, 2014 09:22
To: user@hadoop.apache.org
Subject: ResourceManager debugging

Hi,

I am wondering how to remote debugging Yarn's RM using eclipse. I tried to 
adding the debugging options -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=1337 to YARN_OPTS but 
it did not work. Any suggestions ?

Thanks



org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

2014-08-02 Thread Ana Gillan
Hi everyone,

I am having an issue with MapReduce jobs running through Hive being killed
after 600s timeouts and with very simple jobs taking over 3 hours (or just
failing) for a set of files with a compressed size of only 1-2gb. I will try
and provide as much information as I can here, so if someone can help, that
would be really great.

I have a cluster of 7 nodes (1 master, 6 slaves) with the following config:
 € Master node:
 
 ­ 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz
 
 ­ 64GB DDR3 SDRAM
 
 ­ 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5)
 
 € Slave nodes (each):
 
 ­ Intel Xeon 4-core E3-1220v3 @ 3.1GHz
 
 ­ 32GB DDR3 SDRAM
 
 ­ 4 x 2TB SATA-3 hard drive
 
 € Operating system on all nodes: openSUSE Linux 13.1

We have the Apache BigTop package version 0.7, with Hadoop version
2.0.6-alpha and Hive version 0.11.
YARN has been configured as per these recommendations:
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

I also set the following additional settings before running jobs:
set yarn.nodemanager.resource.cpu-vcores=4;
set mapred.tasktracker.map.tasks.maximum=4;
set hive.hadoop.supports.splittable.combineinputformat=true;
set hive.merge.mapredfiles=true;

No one else uses this cluster while I am working.

What I¹m trying to do:
I have a bunch of XML files on HDFS, which I am reading into Hive using this
SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to create a
series of tables from these files and finally run a Python script on one of
them to perform some scientific calculations. The files are .xml.gz format
and (uncompressed) are only about 4mb in size each. hive.input.format is set
to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to avoid the
³small files problem.²

Problems:
My HQL statements work perfectly for up to 1000 of these files. Even for
much larger numbers, doing select * works fine, which means the files are
being read properly, but if I do something as simple as selecting just one
column from the whole table for a larger number of files, containers start
being killed and jobs fail with this error in the container logs:

2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient:
Failed to close file
/tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp
.-ext-10001/_tmp.00_0
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException): No lease on
/tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp
.-ext-10001/_tmp.00_0: File does not exist. Holder
DFSClient_attempt_1403771939632_0402_m_00_0_-1627633686_1 does not have
any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.
java:2398)

Killed jobs show the above and also the following message:
AttemptID:attempt_1403771939632_0402_m_00_0 Timed out after 600
secsContainer killed by the ApplicationMaster.

Also, in the node logs, I get a lot of pings like this:
INFO [IPC Server handler 17 on 40961]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
attempt_1403771939632_0362_m_02_0

For 5000 files (1gb compressed), the selection of a single column finishes,
but takes over 3 hours. For 10,000 files, the job hangs on about 4% map and
then errors out.

While the jobs are running, I notice that the containers are not evenly
distributed across the cluster. Some nodes lie idle, while the application
master node runs 7 containers, maxing out the 28gb of RAM allocated to
Hadoop on each slave node.

This is the output of netstat ­i while the column selection is running:
Kernel Interface table

Iface   MTU MetRX-OK RX-ERR RX-DRP RX-OVRTX-OK TX-ERR TX-DRP TX-OVR
Flg

eth0   1500   0 79515196  0 2265807 0 45694758  0  0  0
BMRU

eth1   1500   0 77410508  0  0  0 40815746  0  0  0
BMRU

lo65536   0 16593808  0  0  0 16593808  0  0  0
LRU





Are there some settings I am missing that mean the cluster isn¹t processing
this data as efficiently as it can?

I am very new to Hadoop and there are so many logs, etc, that
troubleshooting can be a bit overwhelming. Where else should I be looking to
try and diagnose what is wrong?

Thanks in advance for any help you can give!

Kind regards,
Ana 





Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

2014-08-02 Thread hadoop hive
Can you check the ulimit for tour user. Which might be causing this.
On Aug 2, 2014 8:54 PM, Ana Gillan ana.gil...@gmail.com wrote:

 Hi everyone,

 I am having an issue with MapReduce jobs running through Hive being killed
 after 600s timeouts and with very simple jobs taking over 3 hours (or just
 failing) for a set of files with a compressed size of only 1-2gb. I will
 try and provide as much information as I can here, so if someone can help,
 that would be really great.

 I have a cluster of 7 nodes (1 master, 6 slaves) with the following config:

 • Master node:

 – 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz

 – 64GB DDR3 SDRAM

 – 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5)

 • Slave nodes (each):

 – Intel Xeon 4-core E3-1220v3 @ 3.1GHz

 – 32GB DDR3 SDRAM

 – 4 x 2TB SATA-3 hard drive

 • Operating system on all nodes: openSUSE Linux 13.1

  We have the Apache BigTop package version 0.7, with Hadoop version
 2.0.6-alpha and Hive version 0.11.
 YARN has been configured as per these recommendations:
 http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

 I also set the following additional settings before running jobs:
 set yarn.nodemanager.resource.cpu-vcores=4;
 set mapred.tasktracker.map.tasks.maximum=4;
 set hive.hadoop.supports.splittable.combineinputformat=true;
 set hive.merge.mapredfiles=true;

 No one else uses this cluster while I am working.

 What I’m trying to do:
 I have a bunch of XML files on HDFS, which I am reading into Hive using
 this SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to
 create a series of tables from these files and finally run a Python script
 on one of them to perform some scientific calculations. The files are
 .xml.gz format and (uncompressed) are only about 4mb in size each. 
 hive.input.format
 is set to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to
 avoid the “small files problem.”

 Problems:
 My HQL statements work perfectly for up to 1000 of these files. Even for
 much larger numbers, doing select * works fine, which means the files are
 being read properly, but if I do something as simple as selecting just one
 column from the whole table for a larger number of files, containers start
 being killed and jobs fail with this error in the container logs:

 2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient:
 Failed to close file
 /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-ext-10001/_tmp.00_0
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on
 /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-ext-10001/_tmp.00_0:
 File does not exist. Holder
 DFSClient_attempt_1403771939632_0402_m_00_0_-1627633686_1 does not have
 any open files.
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2398)

 Killed jobs show the above and also the following message:
 AttemptID:attempt_1403771939632_0402_m_00_0 Timed out after 600
 secsContainer killed by the ApplicationMaster.

 Also, in the node logs, I get a lot of pings like this:
 INFO [IPC Server handler 17 on 40961]
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
 attempt_1403771939632_0362_m_02_0

 For 5000 files (1gb compressed), the selection of a single column
 finishes, but takes over 3 hours. For 10,000 files, the job hangs on about
 4% map and then errors out.

 While the jobs are running, I notice that the containers are not evenly
 distributed across the cluster. Some nodes lie idle, while the application
 master node runs 7 containers, maxing out the 28gb of RAM allocated
 to Hadoop on each slave node.

 This is the output of netstat –i while the column selection is running:

 Kernel Interface table

 Iface   MTU MetRX-OK RX-ERR RX-DRP RX-OVRTX-OK TX-ERR TX-DRP
 TX-OVR Flg

 eth0   1500   0 79515196  0 2265807 0 45694758  0  0
   0 BMRU

 eth1   1500   0 77410508  0  0  0 40815746  0  0
 0 BMRU

 lo65536   0 16593808  0  0  0 16593808  0  0
 0 LRU




 Are there some settings I am missing that mean the cluster isn’t
 processing this data as efficiently as it can?

 I am very new to Hadoop and there are so many logs, etc, that
 troubleshooting can be a bit overwhelming. Where else should I be looking
 to try and diagnose what is wrong?

 Thanks in advance for any help you can give!

 Kind regards,
 Ana




Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

2014-08-02 Thread Ana Gillan
For my own user? It is as follows:

core file size  (blocks, -c) 0

data seg size   (kbytes, -d) unlimited

scheduling priority (-e) 0

file size   (blocks, -f) unlimited

pending signals (-i) 483941

max locked memory   (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files  (-n) 1024

pipe size(512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority  (-r) 0

stack size  (kbytes, -s) 8192

cpu time   (seconds, -t) unlimited

max user processes  (-u) 800

virtual memory  (kbytes, -v) unlimited

file locks  (-x) unlimited


From:  hadoop hive hadooph...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Saturday, 2 August 2014 16:34
To:  user@hadoop.apache.org
Subject:  Re: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException)


Can you check the ulimit for tour user. Which might be causing this.

On Aug 2, 2014 8:54 PM, Ana Gillan ana.gil...@gmail.com wrote:
 Hi everyone,
 
 I am having an issue with MapReduce jobs running through Hive being killed
 after 600s timeouts and with very simple jobs taking over 3 hours (or just
 failing) for a set of files with a compressed size of only 1-2gb. I will try
 and provide as much information as I can here, so if someone can help, that
 would be really great.
 
 I have a cluster of 7 nodes (1 master, 6 slaves) with the following config:
 € Master node:
 
 ­ 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz
 
 ­ 64GB DDR3 SDRAM
 
 ­ 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5)
 
 € Slave nodes (each):
 
 ­ Intel Xeon 4-core E3-1220v3 @ 3.1GHz
 
 ­ 32GB DDR3 SDRAM
 
 ­ 4 x 2TB SATA-3 hard drive
 
 € Operating system on all nodes: openSUSE Linux 13.1
 
 We have the Apache BigTop package version 0.7, with Hadoop version 2.0.6-alpha
 and Hive version 0.11.
 YARN has been configured as per these recommendations:
 http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
 
 I also set the following additional settings before running jobs:
 set yarn.nodemanager.resource.cpu-vcores=4;
 set mapred.tasktracker.map.tasks.maximum=4;
 set hive.hadoop.supports.splittable.combineinputformat=true;
 set hive.merge.mapredfiles=true;
 
 No one else uses this cluster while I am working.
 
 What I¹m trying to do:
 I have a bunch of XML files on HDFS, which I am reading into Hive using this
 SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to create a
 series of tables from these files and finally run a Python script on one of
 them to perform some scientific calculations. The files are .xml.gz format and
 (uncompressed) are only about 4mb in size each. hive.input.format is set to
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to avoid the ³small
 files problem.² 
 
 Problems:
 My HQL statements work perfectly for up to 1000 of these files. Even for much
 larger numbers, doing select * works fine, which means the files are being
 read properly, but if I do something as simple as selecting just one column
 from the whole table for a larger number of files, containers start being
 killed and jobs fail with this error in the container logs:
 
 2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient:
 Failed to close file
 /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-
 ext-10001/_tmp.00_0
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.L
 easeExpiredException): No lease on
 /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-
 ext-10001/_tmp.00_0: File does not exist. Holder
 DFSClient_attempt_1403771939632_0402_m_00_0_-1627633686_1 does not have
 any open files.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.ja
 va:2398)
 
 Killed jobs show the above and also the following message:
 AttemptID:attempt_1403771939632_0402_m_00_0 Timed out after 600
 secsContainer killed by the ApplicationMaster.
 
 Also, in the node logs, I get a lot of pings like this:
 INFO [IPC Server handler 17 on 40961]
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
 attempt_1403771939632_0362_m_02_0
 
 For 5000 files (1gb compressed), the selection of a single column finishes,
 but takes over 3 hours. For 10,000 files, the job hangs on about 4% map and
 then errors out.
 
 While the jobs are running, I notice that the containers are not evenly
 distributed across the cluster. Some nodes lie idle, while the application
 master node runs 7 containers, maxing out the 28gb of RAM allocated to Hadoop
 on each slave node.
 
 This is the output of netstat ­i while the column selection is running:
 Kernel Interface table
 
 Iface   MTU MetRX-OK RX-ERR RX-DRP RX-OVRTX-OK TX-ERR TX-DRP TX-OVR
 Flg
 
 eth0   1500   0 79515196  0 2265807 0 

Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

2014-08-02 Thread Ana Gillan
Filemax across the cluster is set to over 6 million. I¹ve checked the open
file limits for the accounts used by the Hadoop daemons  and they have an
open file limit of 32K. This is confirmed by the various .out files, e.g.

/var/log/hadoop-hdfs/hadoop-hdfs-datanode-slave1.out

Contains open files (-n) 32768. Is this too low? What is the recommended
value for open files on all nodes? Also does my own user need to have the
same value?

I¹ve also tried running the same column selection on files crushed by the
filecrush program https://github.com/edwardcapriolo/filecrush/
This created 5 large files out of the 10,000 small files (still totally 2gb
compressed), but this job won¹t progress past 0% map.

From:  Ana Gillan ana.gil...@gmail.com
Date:  Saturday, 2 August 2014 16:36
To:  user@hadoop.apache.org
Subject:  Re: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException)

For my own user? It is as follows:

core file size  (blocks, -c) 0

data seg size   (kbytes, -d) unlimited

scheduling priority (-e) 0

file size   (blocks, -f) unlimited

pending signals (-i) 483941

max locked memory   (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files  (-n) 1024

pipe size(512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority  (-r) 0

stack size  (kbytes, -s) 8192

cpu time   (seconds, -t) unlimited

max user processes  (-u) 800

virtual memory  (kbytes, -v) unlimited

file locks  (-x) unlimited


From:  hadoop hive hadooph...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Saturday, 2 August 2014 16:34
To:  user@hadoop.apache.org
Subject:  Re: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException)


Can you check the ulimit for tour user. Which might be causing this.

On Aug 2, 2014 8:54 PM, Ana Gillan ana.gil...@gmail.com wrote:
 Hi everyone,
 
 I am having an issue with MapReduce jobs running through Hive being killed
 after 600s timeouts and with very simple jobs taking over 3 hours (or just
 failing) for a set of files with a compressed size of only 1-2gb. I will try
 and provide as much information as I can here, so if someone can help, that
 would be really great.
 
 I have a cluster of 7 nodes (1 master, 6 slaves) with the following config:
 € Master node:
 
 ­ 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz
 
 ­ 64GB DDR3 SDRAM
 
 ­ 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5)
 
 € Slave nodes (each):
 
 ­ Intel Xeon 4-core E3-1220v3 @ 3.1GHz
 
 ­ 32GB DDR3 SDRAM
 
 ­ 4 x 2TB SATA-3 hard drive
 
 € Operating system on all nodes: openSUSE Linux 13.1
 
 We have the Apache BigTop package version 0.7, with Hadoop version 2.0.6-alpha
 and Hive version 0.11.
 YARN has been configured as per these recommendations:
 http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
 
 I also set the following additional settings before running jobs:
 set yarn.nodemanager.resource.cpu-vcores=4;
 set mapred.tasktracker.map.tasks.maximum=4;
 set hive.hadoop.supports.splittable.combineinputformat=true;
 set hive.merge.mapredfiles=true;
 
 No one else uses this cluster while I am working.
 
 What I¹m trying to do:
 I have a bunch of XML files on HDFS, which I am reading into Hive using this
 SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to create a
 series of tables from these files and finally run a Python script on one of
 them to perform some scientific calculations. The files are .xml.gz format and
 (uncompressed) are only about 4mb in size each. hive.input.format is set to
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to avoid the ³small
 files problem.² 
 
 Problems:
 My HQL statements work perfectly for up to 1000 of these files. Even for much
 larger numbers, doing select * works fine, which means the files are being
 read properly, but if I do something as simple as selecting just one column
 from the whole table for a larger number of files, containers start being
 killed and jobs fail with this error in the container logs:
 
 2014-08-02 14:51:45,137 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient:
 Failed to close file
 /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-
 ext-10001/_tmp.00_0
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.L
 easeExpiredException): No lease on
 /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-
 ext-10001/_tmp.00_0: File does not exist. Holder
 DFSClient_attempt_1403771939632_0402_m_00_0_-1627633686_1 does not have
 any open files.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.ja
 va:2398)
 
 Killed jobs show the above and also the following message:
 AttemptID:attempt_1403771939632_0402_m_00_0 

Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

2014-08-02 Thread hadoop hive
32k seems fine for mapred user(hope you using this for fetching you data)
but if you have huge data on your system you can try 64k.

Did you try increasing you time from 600 sec to like 20 mins.

Can you also check on which stage its getting hanged or killed.

Thanks
 On Aug 2, 2014 9:38 PM, Ana Gillan ana.gil...@gmail.com wrote:

 Filemax across the cluster is set to over 6 million. I’ve checked the
 open file limits for the accounts used by the Hadoop daemons
  and they have an open file limit of 32K. This is confirmed by the various
 .out files, e.g.

 /var/log/hadoop-hdfs/hadoop-hdfs-datanode-slave1.out

 Contains open files (-n) 32768. Is this too low? What is the recommended
 value for open files on all nodes? Also does my own user need to have the
 same value?

 I’ve also tried running the same column selection on files crushed by the
 filecrush program https://github.com/edwardcapriolo/filecrush/
 This created 5 large files out of the 10,000 small files (still totally
 2gb compressed), but this job won’t progress past 0% map.

 From: Ana Gillan ana.gil...@gmail.com
 Date: Saturday, 2 August 2014 16:36
 To: user@hadoop.apache.org
 Subject: Re:
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

 For my own user? It is as follows:

 core file size  (blocks, -c) 0

 data seg size   (kbytes, -d) unlimited

 scheduling priority (-e) 0

 file size   (blocks, -f) unlimited

 pending signals (-i) 483941

 max locked memory   (kbytes, -l) 64

 max memory size (kbytes, -m) unlimited

 open files  (-n) 1024

 pipe size(512 bytes, -p) 8

 POSIX message queues (bytes, -q) 819200

 real-time priority  (-r) 0

 stack size  (kbytes, -s) 8192

 cpu time   (seconds, -t) unlimited

 max user processes  (-u) 800

 virtual memory  (kbytes, -v) unlimited

 file locks  (-x) unlimited

 From: hadoop hive hadooph...@gmail.com
 Reply-To: user@hadoop.apache.org
 Date: Saturday, 2 August 2014 16:34
 To: user@hadoop.apache.org
 Subject: Re:
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

 Can you check the ulimit for tour user. Which might be causing this.
 On Aug 2, 2014 8:54 PM, Ana Gillan ana.gil...@gmail.com wrote:

 Hi everyone,

 I am having an issue with MapReduce jobs running through Hive being
 killed after 600s timeouts and with very simple jobs taking over 3 hours
 (or just failing) for a set of files with a compressed size of only 1-2gb.
 I will try and provide as much information as I can here, so if someone can
 help, that would be really great.

 I have a cluster of 7 nodes (1 master, 6 slaves) with the following
 config:

 • Master node:

 – 2 x Intel Xeon 6-core E5-2620v2 @ 2.1GHz

 – 64GB DDR3 SDRAM

 – 8 x 2TB SAS 600 hard drive (arranged as RAID 1 and RAID 5)

 • Slave nodes (each):

 – Intel Xeon 4-core E3-1220v3 @ 3.1GHz

 – 32GB DDR3 SDRAM

 – 4 x 2TB SATA-3 hard drive

 • Operating system on all nodes: openSUSE Linux 13.1

  We have the Apache BigTop package version 0.7, with Hadoop version
 2.0.6-alpha and Hive version 0.11.
 YARN has been configured as per these recommendations:
 http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

 I also set the following additional settings before running jobs:
 set yarn.nodemanager.resource.cpu-vcores=4;
 set mapred.tasktracker.map.tasks.maximum=4;
 set hive.hadoop.supports.splittable.combineinputformat=true;
 set hive.merge.mapredfiles=true;

 No one else uses this cluster while I am working.

 What I’m trying to do:
 I have a bunch of XML files on HDFS, which I am reading into Hive using
 this SerDe https://github.com/dvasilen/Hive-XML-SerDe. I then want to
 create a series of tables from these files and finally run a Python script
 on one of them to perform some scientific calculations. The files are
 .xml.gz format and (uncompressed) are only about 4mb in size each. 
 hive.input.format
 is set to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat so as to
 avoid the “small files problem.”

 Problems:
 My HQL statements work perfectly for up to 1000 of these files. Even for
 much larger numbers, doing select * works fine, which means the files are
 being read properly, but if I do something as simple as selecting just one
 column from the whole table for a larger number of files, containers start
 being killed and jobs fail with this error in the container logs:

 2014-08-02 14:51:45,137 ERROR [Thread-3]
 org.apache.hadoop.hdfs.DFSClient: Failed to close file
 /tmp/hive-zslf023/hive_2014-08-02_12-33-59_857_6455822541748133957/_task_tmp.-ext-10001/_tmp.00_0
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on
 

Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

2014-08-02 Thread Ana Gillan
I¹m not sure which user is fetching the data, but I¹m assuming no one
changed that from the default. The data isn¹t huge in size, just in number,
so I suppose the open files limit is not the issue?

I¹m running the job again with mapred.task.timeout=120, but containers
are still being killed in the same wayŠ Just without the timeout message.
And it somehow massively slowed down the machine as well, so even typing
commands took a long time (???)

I¹m not sure what you mean by which stage it¹s getting killed on. If you
mean in the command line progress counters, it's always on Stage-1.
Also, this is the end of the container log for the killed container. Failed
and killed jobs always start fine with lots of these ³processing file² and
³processing alias² statements, but then suddenly warn about a DataStreamer
Exception and then are killed with an error, which is the same as the
warning. Not sure if this exception is the actual issue or if it¹s just a
knock-on effect of something else.

2014-08-02 17:47:38,618 INFO [main]
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w63.xml.gz
2014-08-02 17:47:38,641 INFO [main]
org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
2014-08-02 17:47:38,932 INFO [main]
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w67.xml.gz
2014-08-02 17:47:38,989 INFO [main]
org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
2014-08-02 17:47:42,675 INFO [main]
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w6i.xml.gz
2014-08-02 17:47:42,888 INFO [main]
org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
2014-08-02 17:47:45,416 WARN [Thread-8] org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException): No lease on
/tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.
-ext-10001/_tmp.06_0: File does not exist. Holder
DFSClient_attempt_1403771939632_0409_m_06_0_303479000_1 does not have
any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.
java:2398)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNames
ystem.java:2217)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam
esystem.java:2137)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRp
cServer.java:491)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslator
PB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNam
enodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto
bufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at org.apache.hadoop.ipc.Client.call(Client.java:1240)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.jav
a:202)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocati
onHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHand
ler.java:83)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBloc
k(ClientNamenodeProtocolTranslatorPB.java:311)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFS
OutputStream.java:1156)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DF
SOutputStream.java:1009)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java
:464)
2014-08-02 17:47:45,417 ERROR [Thread-3] org.apache.hadoop.hdfs.DFSClient:
Failed to close file

Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

2014-08-02 Thread hadoop hive
Hey try change ulimit to 64k for user which running query and change time
from scheduler which should be set to 600sec.

Check the jt logs also for further issues.

Thanks
 On Aug 2, 2014 11:09 PM, Ana Gillan ana.gil...@gmail.com wrote:

 I’m not sure which user is fetching the data, but I’m assuming no one
 changed that from the default. The data isn’t huge in size, just in number,
 so I suppose the open files limit is not the issue?

 I’m running the job again with mapred.task.timeout=120, but containers
 are still being killed in the same way… Just without the timeout message.
 And it somehow massively slowed down the machine as well, so even typing
 commands took a long time (???)

 I’m not sure what you mean by which stage it’s getting killed on. If you
 mean in the command line progress counters, it's always on Stage-1.
 Also, this is the end of the container log for the killed container.
 Failed and killed jobs always start fine with lots of these “processing
 file” and “processing alias” statements, but then suddenly warn about a
 DataStreamer Exception and then are killed with an error, which is the same
 as the warning. Not sure if this exception is the actual issue or if it’s
 just a knock-on effect of something else.

 2014-08-02 17:47:38,618 INFO [main]
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
 hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w63.xml.gz
 2014-08-02 17:47:38,641 INFO [main]
 org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
 foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
 2014-08-02 17:47:38,932 INFO [main]
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
 hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w67.xml.gz
 2014-08-02 17:47:38,989 INFO [main]
 org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
 foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
 2014-08-02 17:47:42,675 INFO [main]
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file
 hdfs://clustnm:8020/user/usnm123/foldernm/fivek/2w6i.xml.gz
 2014-08-02 17:47:42,888 INFO [main]
 org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias
 foldernm_xml_load for file hdfs://clustnm:8020/user/usnm123/foldernm/fivek
 2014-08-02 17:47:45,416 WARN [Thread-8] org.apache.hadoop.hdfs.DFSClient:
 DataStreamer Exception
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on
 /tmp/hive-usnm123/hive_2014-08-02_17-41-52_914_251548734850890001/_task_tmp.-ext-10001/_tmp.06_0:
 File does not exist. Holder
 DFSClient_attempt_1403771939632_0409_m_06_0_303479000_1 does not have
 any open files.
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2398)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2217)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2137)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:491)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:351)
 at
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40744)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

 at org.apache.hadoop.ipc.Client.call(Client.java:1240)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:311)
 at
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1156)
 at
 

Re: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException)

2014-08-02 Thread Ana Gillan
Ok, I will request this to be done, as I¹m not an admin, and then get back
to this thread on Monday. Thank you!

From:  hadoop hive hadooph...@gmail.com
Reply-To:  user@hadoop.apache.org
Date:  Saturday, 2 August 2014 18:50
To:  user@hadoop.apache.org
Subject:  Re: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode
.LeaseExpiredException)


Hey try change ulimit to 64k for user which running query and change time
from scheduler which should be set to 600sec.

Check the jt logs also for further issues.

Thanks




Re: Fair Scheduler issue

2014-08-02 Thread Yehia Elshater
Hi Julien,

Did you try to change yarn.nodemanager.resource.memory-mb to 13 GB for
example (the other 3 for OS) ?

Thanks



On 1 August 2014 05:41, Julien Naour julna...@gmail.com wrote:

 Hello,

 I'm currently using HDP 2.0 so it's Hadoop 2.2.0.
 My cluster consist in 4 nodes, 16 coeurs 16 GB RAM 4*3To each.

 Recently we passed from 2 users to 8. We need now a more appropriate
 Scheduler.
 We begin with Capacity Scheduler. There was some issues with the different
 queues particularly when using some spark shell that used some resources
 for a long time.
 So we decide to try Fair Scheduler which seems to be a good solution.
 The problem is that FairScheduler doesn't allow all available resources.
 It's capped at 73% of the available memory for one jobs 63% for 2 jobs and
 45% for 3 jobs. The problem could come from shells that take resources for
 a long time.

 We tried some configuration like
 yarn.scheduler.fair.user-as-default-queue=false
 or play with the minimum ressources allocated minResources in
 fair-scheduler.xml but it doesn't seems to resolve the issue.

 Any advices or good practices to held a good Fair Scheduler?

 Regards,

 Julien



Re: ResourceManager debugging

2014-08-02 Thread Yehia Elshater
Hi Naga,

Thanks a lot for your help. I have submitted multiple MapReduce jobs, the
debugger is attached successfully to eclipse and I put a breakpoint in
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager, but, eclipse
debugger still waits without any interruption. However, I put another
breakpoint in another class (for
example RMAppAttemptContainerAllocatedEvent), the debugger reached the
code. Do you have any idea why ResourceManager code was not reachable by
the debugger?

Thanks


On 2 August 2014 08:04, Naganarasimha G R (Naga) 
garlanaganarasi...@huawei.com wrote:

  Hi Yehia ,

 I set  *YARN_RESOURCEMANAGER_OPTS *in *installation
 folder/bin/yarn  *and i was able to debug.



  YARN_RESOURCEMANAGER_OPTS=$YARN_RESOURCEMANAGER_OPTS -Xdebug
 -Xrunjdwp:server=y,transport=dt_socket,address=7089,suspend=n



  Regards,

 Naga



 Huawei Technologies Co., Ltd.
 Phone:
 Fax:
 Mobile:  +91 9980040283
 Email: naganarasimh...@huawei.com
 Huawei Technologies Co., Ltd.
 Bantian, Longgang District,Shenzhen 518129, P.R.China
 http://www.huawei.com

  ¡
 This e-mail and its attachments contain confidential information from
 HUAWEI, which is intended only for the person or entity whose address is
 listed above. Any use of the information contained herein in any way
 (including, but not limited to, total or partial disclosure, reproduction,
 or dissemination) by persons other than the intended recipient(s) is
 prohibited. If you receive this e-mail in error, please notify the sender
 by phone or email immediately and delete it!
   --
 *From:* Yehia Elshater [y.z.elsha...@gmail.com]
 *Sent:* Saturday, August 02, 2014 09:22
 *To:* user@hadoop.apache.org
 *Subject:* ResourceManager debugging

   Hi,

  I am wondering how to remote debugging Yarn's RM using eclipse. I tried
 to adding the debugging options -Xdebug
 -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=1337 to
 YARN_OPTS but it did not work. Any suggestions ?

  Thanks




Exception in hadoop and java

2014-08-02 Thread Ekta Agrawal
Hi,

I am writing a code in java that connects to hadoop. Earlier it was running
fine. I wanted to add some charts and I used jfree api,it started giving
this error.chart is not using hadoop. I removed the chart ,but it keeps
coming.If anybody can look into it and help me in understanding that why
this error came up and how can I handle it?

*14/08/02 21:33:01 ERROR conf.Configuration: Failed to set
setXIncludeAware(true) for parser gnu.xml.dom.JAXPFactory@8f2ca6*
*:java.lang.**UnsupportedOperationException:*
*  setXIncludeAware is not supported on this JAXP implementation or
earlier: class gnu.xml.dom.JAXPFactory java.lang.*
*UnsupportedOperationException:*
*  setXIncludeAware is not supported on this JAXP implementation or
earlier: class gnu.xml.dom.JAXPFactoryat javax.xml.parsers.*
*DocumentBuilderFactory.**setXIncludeAware(**DocumentBuilderFactory.java:*
*589) at org.apache.hadoop.conf.**Configuration.loadResource(*
*Configuration.java:1143)at org.apache.hadoop.conf.*
*Configuration.loadResources(*
*Configuration.java:1119)at org.apache.hadoop.conf.*
*Configuration.getProps(*
*Configuration.java:1063) at org.apache.hadoop.conf.*
*Configuration.get(*
*Configuration.java:470)at org.apache.hadoop.fs.*
*FileSystem.getDefaultUri(*
*FileSystem.java:131)at org.apache.hadoop.fs.*
*FileSystem.get(FileSystem.*
*java:123)at myclass1.myfunction1(**myclass1**.*
*java:39) at myclass1.main(**myclass1**.java:25)*

Thanks,
Ekta