Cloudera Manager Installation is failing
Hi, I am trying to install Cloudera manager but it is failing and below is the log file: I have uninstalled postgres and tried again but still the same error. [root@nncloudera cloudera-manager-installer]# more 5.start-embedded-db.log mktemp: failed to create file via template `/tmp/': Permission denied /usr/share/cmf/bin/initialize_embedded_db.sh: line 393: $PASSWORD_TMP_FILE: ambiguous redirect The files belonging to this database system will be owned by user cloudera-scm. This user must also own the server process. The database cluster will be initialized with locale en_US.UTF8. The default text search configuration will be set to english. fixing permissions on existing directory /var/lib/cloudera-scm-server-db/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in /var/lib/cloudera-scm-server-db/data/base/1 ... ok initializing pg_authid ... ok initdb: could not open file for reading: No such file or directory initdb: removing contents of data directory /var/lib/cloudera-scm-server-db/data Could not initialize database server. This usually means that your PostgreSQL installation failed or isn't working properly. PostgreSQL is installed using the set of repositories found on this machine. Please ensure that PostgreSQL can be installed. Please also uninstall any other instances of PostgreSQL and then try again., giving up Please suggest. Thanks Krish
Re: Cloudera Manager Installation is failing
Thanks Rich On Mon, Mar 2, 2015 at 2:23 PM, Rich Haase rha...@pandora.com wrote: Try posting this question on the Cloudera forum. http://community.cloudera.com/ On Mar 2, 2015, at 3:21 PM, Krish Donald gotomyp...@gmail.com wrote: Hi, I am trying to install Cloudera manager but it is failing and below is the log file: I have uninstalled postgres and tried again but still the same error. [root@nncloudera cloudera-manager-installer]# more 5.start-embedded-db.log mktemp: failed to create file via template `/tmp/': Permission denied /usr/share/cmf/bin/initialize_embedded_db.sh: line 393: $PASSWORD_TMP_FILE: ambiguous redirect The files belonging to this database system will be owned by user cloudera-scm. This user must also own the server process. The database cluster will be initialized with locale en_US.UTF8. The default text search configuration will be set to english. fixing permissions on existing directory /var/lib/cloudera-scm-server-db/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in /var/lib/cloudera-scm-server-db/data/base/1 ... ok initializing pg_authid ... ok initdb: could not open file for reading: No such file or directory initdb: removing contents of data directory /var/lib/cloudera-scm-server-db/data Could not initialize database server. This usually means that your PostgreSQL installation failed or isn't working properly. PostgreSQL is installed using the set of repositories found on this machine. Please ensure that PostgreSQL can be installed. Please also uninstall any other instances of PostgreSQL and then try again., giving up Please suggest. Thanks Krish
AW: AW: Hadoop 2.6.0 - No DataNode to stop
Hi, thanks for your help. The HADOOP_PID_DIR variable is pointing to /var/run/cluster/hadoop (which has hdfs:hadoop) as it’s owner. 3 PID are created there (datanode namenode and secure_dn). It looks like the PID was written but there was a readproblem. I did chmod –R 777 on the folder and now the Datanodes are Stopped correctly. It only works when I’m running the start and stop command as user HDFS. If I try to start and stop as root (like its documented in the Documentation I still get the “no Datanode to stop” error. Is it important to start the DN as root? The only thing I recognized is the secure_dn PID-File is not created when im starting the Datanode as HDFS-User. Is this a Problem? Greets DK Von: Ulul [mailto:had...@ulul.org] Gesendet: Montag, 2. März 2015 21:50 An: user@hadoop.apache.org Betreff: Re: AW: Hadoop 2.6.0 - No DataNode to stop Hi The hadoop-daemon.sh script prints the no $command to stop if it doesn'f find the pid file. You should echo the $pid variable and see if you hava a correct pid file there. Ulul Le 02/03/2015 13:53, Daniel Klinger a écrit : Thanks for your help. But unfortunatly this didn’t do the job. Here’s the Shellscript I’ve written to start my cluster (the scripts on the other node only contains the command to start the datanode respectively the command to start the Nodemanager on the other node (with the right user (hdfs / yarn)): #!/bin/bash # Start HDFS- # Start Namenode su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode wait # Start all Datanodes export HADOOP_SECURE_DN_USER=hdfs su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode wait ssh root@hadoop-data.klinger.local mailto:root@hadoop-data.klinger.local 'bash startDatanode.sh' wait # Start Resourcemanager su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager wait # Start Nodemanager on all Nodes su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager wait ssh root@hadoop-data.klinger.local mailto:root@hadoop-data.klinger.local 'bash startNodemanager.sh' wait # Start Proxyserver #su - yarn -c $HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR #wait # Start Historyserver su - mapred -c $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR wait This script generates the following output: starting namenode, logging to /var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out starting datanode, logging to /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out starting datanode, logging to /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out starting resourcemanager, logging to /var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out starting nodemanager, logging to /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out starting nodemanager, logging to /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out starting historyserver, logging to /var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out Following my stopscript and it’s output: #!/bin/bash # Stop HDFS # Stop Namenode su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode # Stop all Datanodes su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode ssh root@hadoop-data.klinger.local mailto:root@hadoop-data.klinger.local 'bash stopDatanode.sh' # Stop Resourcemanager su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager #Stop Nodemanager on all Hosts su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager ssh root@hadoop-data.klinger.local mailto:root@hadoop-data.klinger.local 'bash stopNodemanager.sh' #Stop Proxyserver #su - yarn -c $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR #Stop Historyserver su - mapred -c $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR stopping namenode no datanode to stop no datanode to stop stopping resourcemanager stopping nodemanager stopping nodemanager nodemanager did not stop gracefully after 5 seconds: killing with kill -9 stopping historyserver Is there may be anything wrong with my commands? Greets DK Von: Varun Kumar [mailto:varun@gmail.com] Gesendet: Montag, 2. März 2015 05:28 An: user Betreff: Re: Hadoop 2.6.0 - No DataNode to
Re: Cloudera Manager Installation is failing
Try posting this question on the Cloudera forum. http://community.cloudera.com/ On Mar 2, 2015, at 3:21 PM, Krish Donald gotomyp...@gmail.commailto:gotomyp...@gmail.com wrote: Hi, I am trying to install Cloudera manager but it is failing and below is the log file: I have uninstalled postgres and tried again but still the same error. [root@nncloudera cloudera-manager-installer]# more 5.start-embedded-db.log mktemp: failed to create file via template `/tmp/': Permission denied /usr/share/cmf/bin/initialize_embedded_db.sh: line 393: $PASSWORD_TMP_FILE: ambiguous redirect The files belonging to this database system will be owned by user cloudera-scm. This user must also own the server process. The database cluster will be initialized with locale en_US.UTF8. The default text search configuration will be set to english. fixing permissions on existing directory /var/lib/cloudera-scm-server-db/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in /var/lib/cloudera-scm-server-db/data/base/1 ... ok initializing pg_authid ... ok initdb: could not open file for reading: No such file or directory initdb: removing contents of data directory /var/lib/cloudera-scm-server-db/data Could not initialize database server. This usually means that your PostgreSQL installation failed or isn't working properly. PostgreSQL is installed using the set of repositories found on this machine. Please ensure that PostgreSQL can be installed. Please also uninstall any other instances of PostgreSQL and then try again., giving up Please suggest. Thanks Krish
Re: Data locality
hi, folks, I have the similar question. Is there an easy way to tell(from a user perspective) whether short circuit is enabled? thanks Demai On Mon, Mar 2, 2015 at 11:46 AM, Fei Hu hufe...@gmail.com wrote: Hi All, I developed a scheduler for data locality. Now I want to test the performance of the scheduler, so I need to monitor how many data are read remotely. Is there any tool for monitoring the volume of data moved around the cluster? Thanks, Fei
Push or pull in yarn
Hi All! Resently I read an article about facebook's corona https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920 they solved in mr1 pull based task assignment: task trackers send heartbeat to jobtracker and recive new tasks in respond. This approach wasted 10-20 seconds for each tasks. My question is about YARN: was this trouble solved in YARN or not? Best regards, Mezentsev Pavel
Re: Copy data from local disc with WebHDFS?
1. I am using these 2 commands below to try to copy data from local disk to HDFS. Unfortunately these commands are not working, and I don't understand why they are not working. I have configured HDFS to use WebHDFS protocol. How I copy data from the local disk to HDFS using WebHDfS protocol? xubuntu@hadoop-coc-1:~/Programs/hadoop$ *hdfs dfs -copyFromLocal ~/input1 webhdfs://192.168.56.101:8080/ http://192.168.56.101:8080/ * Java HotSpot(TM) Client VM warning: You have loaded library /home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. 15/03/02 11:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused xubuntu@hadoop-coc-1:~/Programs/hadoop$ *curl -i -X PUT -T ~/input1 http://192.168.56.101:8080/?op=CREATE; http://192.168.56.101:8080/?op=CREATE* HTTP/1.1 100 Continue HTTP/1.1 405 HTTP method PUT is not supported by this URL Date: Mon, 02 Mar 2015 16:50:36 GMT Pragma: no-cache Date: Mon, 02 Mar 2015 16:50:36 GMT Pragma: no-cache Content-Length: 0 Server: Jetty(6.1.26) 2. Every time I launch a command in YARN I get a java hotspot warning (warning below). How I remove the Java HotSpotWarning? xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1 webhdfs://192.168.56.101:8080/ Java HotSpot(TM) Client VM warning: You have loaded library /home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. Thanks, On Monday, March 2, 2015, xeonmailinglist xeonmailingl...@gmail.com wrote: Hi, 1 - I have HDFS running with WebHDFS protocol. I want to copy data from local disk to HDFS, but I get the error below. How I copy data from the local disk to HDFS? xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1 webhdfs://192.168.56.101:8080/ Java HotSpot(TM) Client VM warning: You have loaded library /home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. 15/03/02 11:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused xubuntu@hadoop-coc-1:~/Programs/hadoop$ curl -i -X PUT -T ~/input1 http://192.168.56.101:8080/?op=CREATE; http://192.168.56.101:8080/?op=CREATE HTTP/1.1 100 Continue HTTP/1.1 405 HTTP method PUT is not supported by this URL Date: Mon, 02 Mar 2015 16:50:36 GMT Pragma: no-cache Date: Mon, 02 Mar 2015 16:50:36 GMT Pragma: no-cache Content-Length: 0 Server: Jetty(6.1.26) $ netstat -plnet tcp0 0 192.168.56.101:8080 0.0.0.0:* LISTEN 1000 587397 8229/java tcp0 0 0.0.0.0:43690.0.0.0:* LISTEN 1158049- tcp0 0 127.0.0.1:530.0.0.0:* LISTEN 0 8336- tcp0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 7102- tcp0 0 127.0.0.1:631 0.0.0.0:* LISTEN 0 104794 - tcp0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1000 588404 8464/java tcp0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1000 589155 8464/java tcp0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1000 589169 8464/java tcp0 0 192.168.56.101:6600 0.0.0.0:* LISTEN 1000 587403 8229/java tcp6 0
Re: how to check hdfs
Hi, Kindly install hadoop-hdfs rpm in your machine.. Rg: Vicky On Mon, Mar 2, 2015 at 11:19 PM, Shengdi Jin jinshen...@gmail.com wrote: Hi all, I just start to learn hadoop, I have a naive question I used hdfs dfs -ls /home/cluster to check the content inside. But I get error ls: No FileSystem for scheme: hdfs My configuration file core-site.xml is like configuration property namefs.defaultFS/name valuehdfs://master:9000/value /property /configuration hdfs-site.xml is like configuration property namedfs.replication/name value2/value /property property namedfs.name.dir/name valuefile:/home/cluster/mydata/hdfs/namenode/value /property property namedfs.data.dir/name valuefile:/home/cluster/mydata/hdfs/datanode/value /property /configuration is there any thing wrong ? Thanks a lot.
Re: AW: Hadoop 2.6.0 - No DataNode to stop
Hi The hadoop-daemon.sh script prints the no $command to stop if it doesn'f find the pid file. You should echo the $pid variable and see if you hava a correct pid file there. Ulul Le 02/03/2015 13:53, Daniel Klinger a écrit : Thanks for your help. But unfortunatly this didn’t do the job. Here’s the Shellscript I’ve written to start my cluster (the scripts on the other node only contains the command to start the datanode respectively the command to start the Nodemanager on the other node (with the right user (hdfs / yarn)): #!/bin/bash # Start HDFS- # Start Namenode su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode wait # Start all Datanodes export HADOOP_SECURE_DN_USER=hdfs su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode wait ssh root@hadoop-data.klinger.local 'bash startDatanode.sh' wait # Start Resourcemanager su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager wait # Start Nodemanager on all Nodes su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager wait ssh root@hadoop-data.klinger.local 'bash startNodemanager.sh' wait # Start Proxyserver #su - yarn -c $HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR #wait # Start Historyserver su - mapred -c $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR wait This script generates the following output: starting namenode, logging to /var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out starting datanode, logging to /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out starting datanode, logging to /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out starting resourcemanager, logging to /var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out starting nodemanager, logging to /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out starting nodemanager, logging to /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out starting historyserver, logging to /var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out Following my stopscript and it’s output: #!/bin/bash # Stop HDFS # Stop Namenode su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode # Stop all Datanodes su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode ssh root@hadoop-data.klinger.local 'bash stopDatanode.sh' # Stop Resourcemanager su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager #Stop Nodemanager on all Hosts su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager ssh root@hadoop-data.klinger.local 'bash stopNodemanager.sh' #Stop Proxyserver #su - yarn -c $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR #Stop Historyserver su - mapred -c $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR stopping namenode no datanode to stop no datanode to stop stopping resourcemanager stopping nodemanager stopping nodemanager nodemanager did not stop gracefully after 5 seconds: killing with kill -9 stopping historyserver Is there may be anything wrong with my commands? Greets DK *Von:*Varun Kumar [mailto:varun@gmail.com] *Gesendet:* Montag, 2. März 2015 05:28 *An:* user *Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop 1.Stop the service 2.Change the permissions for log and pid directory once again to hdfs. 3.Start service with hdfs. This will resolve the issue On Sun, Mar 1, 2015 at 6:40 PM, Daniel Klinger d...@web-computing.de mailto:d...@web-computing.de wrote: Thanks for your answer. I put the FQDN of the DataNodes in the slaves file on each node (one FQDN per line). Here’s the full DataNode log after the start (the log of the other DataNode is exactly the same): 2015-03-02 00:29:41,841 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT] 2015-03-02 00:29:42,207 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-03-02 00:29:42,312 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-03-02 00:29:42,313 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2015-03-02 00:29:42,319 INFO
Re: Permission Denied
David, Thanks for the information. I've issued those two commands in my hadoop shell and still get the same error when I try to initialize accumulo in *its* shell. : 2015-03-02 13:30:41,175 [init.Initialize] FATAL: Failed to initialize filesystem org.apache.hadoop.security.AccessControlException: Permission denied: user=accumulo, access=WRITE, inode=/accumulo: accumulo.supergroup:supergroup:drwxr-xr-x My comment that I had 3 users was meant in a linux sense, not in a hadoop sense. So (to borrow terminoloy from RDF or XML) is there something I have to do in my hadoop setup (running under linix:hadoop) or my accumulo setup (running under linux:accumulo) so that the accumuulo I/O gets processed as from someone in the hadoop:supergroup? I tried running the accumulo init from the linux:hadoop user and it worked. I'm not sure if any permissions/etc were hosed by doing it there. I'll see. Thanks for you help. (By the way, is it wrong or a bad idea to split the work into three linux:users, or should it all be done in one linux:user space?) Dave Patterson On Sun, Mar 1, 2015 at 8:35 PM, dlmarion dlmar...@comcast.net wrote: hadoop fs -mkdir /accumulo hadoop fs -chown accumulo:supergroup /accumulo Original message From: David Patterson patt...@gmail.com Date:03/01/2015 7:04 PM (GMT-05:00) To: user@hadoop.apache.org Cc: Subject: Re: Permission Denied David, Thanks for the reply. Taking the questions in the opposite order, my accumulo-site.xml does not have volumes specified. I edited the accumulo-site.xml so it now has property nameinstance.volumes/name valuehdfs://localhost:9000/accumulo/value descriptioncomma separated list of URIs for volumes. example: hdfs://localhost:9000/accumulo/description /property and got the same error. How can I precreate /accumulo ? Dave Patterson On Sun, Mar 1, 2015 at 3:50 PM, david marion dlmar...@hotmail.com wrote: It looks like / is owned by hadoop.supergroup and the perms are 755. You could precreate /accumulo and chown it appropriately, or set the perms for / to 775. Init is trying to create /accumulo in hdfs as the accumulo user and your perms dont allow it. Do you have instance.volumes set in accumulo-site.xml? Original message From: David Patterson patt...@gmail.com Date:03/01/2015 3:36 PM (GMT-05:00) To: user@hadoop.apache.org Cc: Subject: Permission Denied I'm trying to create an Accumulo/Hadoop/Zookeeper configuration on a single (Ubuntu) machine, with Hadoop 2.6.0, Zookeeper 3.4.6 and Accumulo 1.6.1. I've got 3 userids for these components that are in the same group and no other users are in that group. I have zookeeper running, and hadoop as well. Hadoop's core-site.xml file has the hadoop.tmp.dir set to /app/hadoop/tmp.The /app/hadoop/tmp directory is owned by the hadoop user and has permissions that allow other members of the group to write (drwxrwxr-x). When I try to initialize Accumulo, with bin/accumulo init, I get FATAL: Failed to initialize filesystem. org.apache.hadoop.security.AccessControlException: Permission denied: user=accumulo, access=WRITE, inode=/:hadoop:supergroup:drwxr-xr-x So, my main question is which directory do I need to give group-write permission so the accumulo user can write as needed so it can initialize? The second problem is that the Accumulo init reports [Configuration.deprecation] INFO : fs.default.name is deprecated. Instead use fs.defaultFS. However, the hadoop core-site.xml file contains: namefs.defaultFS/name valuehdfs://localhost:9000/value Is there somewhere else that this value (fs.default.name) is specified? Could it be due to Accumulo having a default value and not getting the override from hadoop because of the problem listed above? Thanks Dave Patterson patt...@gmail.com
Re: How to find bottlenecks of the cluster ?
This is a non-sense ; you have to tell us under which conditions you want to find a bottleneck. Regardless the workload, we mostly use OpenTSDB to check cpu times (iowait / user / sys / idle), disk usage (await, ios in progress...) and memory (numa allocations, buffers, cache, dirty pages...) On 2 March 2015 at 08:20, Krish Donald gotomyp...@gmail.com wrote: Basically we have 4 points to consider, CPU , Memory, IO and Network So how to see which one is causing the bottleneck ? What parameters we should consider etc ? On Sun, Mar 1, 2015 at 10:57 PM, Nishanth S nishanth.2...@gmail.com wrote: This is a vast topic.Can you tell what components are there in your data pipe line and how data flows in to system and the way its processed.There are several inbuilt tests like testDFSIO and terasort that you can run. -Nishan On Sun, Mar 1, 2015 at 9:45 PM, Krish Donald gotomyp...@gmail.com wrote: Hi, I wanted to understand, how should we find out the bottleneck of the cluster? Thanks Krish -- *Adrien Mogenet* Head of Backend/Infrastructure adrien.moge...@contentsquare.com (+33)6.59.16.64.22 http://www.contentsquare.com 4, avenue Franklin D. Roosevelt - 75008 Paris
AW: Hadoop 2.6.0 - No DataNode to stop
Thanks for your help. But unfortunatly this didn’t do the job. Here’s the Shellscript I’ve written to start my cluster (the scripts on the other node only contains the command to start the datanode respectively the command to start the Nodemanager on the other node (with the right user (hdfs / yarn)): #!/bin/bash # Start HDFS- # Start Namenode su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode wait # Start all Datanodes export HADOOP_SECURE_DN_USER=hdfs su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode wait ssh root@hadoop-data.klinger.local 'bash startDatanode.sh' wait # Start Resourcemanager su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager wait # Start Nodemanager on all Nodes su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager wait ssh root@hadoop-data.klinger.local 'bash startNodemanager.sh' wait # Start Proxyserver #su - yarn -c $HADOOP_YARN_HOME/bin/yarn start proxyserver --config $HADOOP_CONF_DIR #wait # Start Historyserver su - mapred -c $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR wait This script generates the following output: starting namenode, logging to /var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out starting datanode, logging to /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out starting datanode, logging to /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out starting resourcemanager, logging to /var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out starting nodemanager, logging to /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out starting nodemanager, logging to /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out starting historyserver, logging to /var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out Following my stopscript and it’s output: #!/bin/bash # Stop HDFS # Stop Namenode su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode # Stop all Datanodes su - hdfs -c $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode ssh root@hadoop-data.klinger.local 'bash stopDatanode.sh' # Stop Resourcemanager su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager #Stop Nodemanager on all Hosts su - yarn -c $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager ssh root@hadoop-data.klinger.local 'bash stopNodemanager.sh' #Stop Proxyserver #su - yarn -c $HADOOP_YARN_HOME/bin/yarn stop proxyserver --config $HADOOP_CONF_DIR #Stop Historyserver su - mapred -c $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR stopping namenode no datanode to stop no datanode to stop stopping resourcemanager stopping nodemanager stopping nodemanager nodemanager did not stop gracefully after 5 seconds: killing with kill -9 stopping historyserver Is there may be anything wrong with my commands? Greets DK Von: Varun Kumar [mailto:varun@gmail.com] Gesendet: Montag, 2. März 2015 05:28 An: user Betreff: Re: Hadoop 2.6.0 - No DataNode to stop 1.Stop the service 2.Change the permissions for log and pid directory once again to hdfs. 3.Start service with hdfs. This will resolve the issue On Sun, Mar 1, 2015 at 6:40 PM, Daniel Klinger d...@web-computing.de mailto:d...@web-computing.de wrote: Thanks for your answer. I put the FQDN of the DataNodes in the slaves file on each node (one FQDN per line). Here’s the full DataNode log after the start (the log of the other DataNode is exactly the same): 2015-03-02 00:29:41,841 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT] 2015-03-02 00:29:42,207 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-03-02 00:29:42,312 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-03-02 00:29:42,313 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2015-03-02 00:29:42,319 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is hadoop.klinger.local 2015-03-02 00:29:42,327 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 0 2015-03-02 00:29:42,350 INFO
Re: How to set AM attempt interval?
Hi Vinod, Here is Diagnostics message from RM Web UI page: Application application_1424919411720_0878 failed 10 times due to Error launching appattempt_1424919411720_0878_10. Got exception: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) . Failing the application. The log link only show following messages and doesn't produce some stdout and stderr file: Logs not available for container_1424919411720_0878_08_01_14. Aggregation may not be complete, Check back later or try the nodemanager at hadoopdn01:8041 Here is the screenshot: https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png Thank you. On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: That's an old JIRA. The right solution is not an AM-retry interval but launching the AM somewhere. Why is your AM failing in the first place? If it is due to full-disk, the situation should be better with YARN-1781 - can you use the configuration (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage) added at YARN-1781? +Vinod On Feb 27, 2015, at 7:31 AM, Ted Yu yuzhih...@gmail.com wrote: Looks like this is related: https://issues.apache.org/jira/browse/YARN-964 On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid nur.kholis.ma...@gmail.com wrote: Hi All, I have many jobs failed because AM trying to rerun job in very short interval (only in 6 second). How can I add the interval to bigger value? https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png Thank you.
share the same namespace in 2 YARN instances?
Hi, I was reading about Federation of HDFS, which is possible in YARN (http://www.devx.com/opensource/enhance-existing-hdfs-architecture-with-hadoop-federation.html), and I started to wonder if is it possible to have 2 YARN runtimes that share the same HDFS namespace? Thanks,
Copy data from local disc with WebHDFS?
Hi, 1 - I have HDFS running with WebHDFS protocol. I want to copy data from local disk to HDFS, but I get the error below. How I copy data from the local disk to HDFS? |xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1 webhdfs://192.168.56.101:8080/ Java HotSpot(TM) Client VM warning: You have loaded library /home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. 15/03/02 11:50:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused copyFromLocal: Call From hadoop-coc-1/192.168.56.101 to hadoop-coc-1:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused xubuntu@hadoop-coc-1:~/Programs/hadoop$ curl -i -X PUT -T ~/input1 http://192.168.56.101:8080/?op=CREATE; HTTP/1.1 100 Continue HTTP/1.1 405 HTTP method PUT is not supported by this URL Date: Mon, 02 Mar 2015 16:50:36 GMT Pragma: no-cache Date: Mon, 02 Mar 2015 16:50:36 GMT Pragma: no-cache Content-Length: 0 Server: Jetty(6.1.26) | |$ netstat -plnet tcp0 0 192.168.56.101:8080 0.0.0.0:* LISTEN 1000 587397 8229/java tcp0 0 0.0.0.0:43690.0.0.0:* LISTEN 1158049- tcp0 0 127.0.0.1:530.0.0.0:* LISTEN 0 8336- tcp0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 7102- tcp0 0 127.0.0.1:631 0.0.0.0:* LISTEN 0 104794 - tcp0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1000 588404 8464/java tcp0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1000 589155 8464/java tcp0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1000 589169 8464/java tcp0 0 192.168.56.101:6600 0.0.0.0:* LISTEN 1000 587403 8229/java tcp6 0 0 :::22 :::*LISTEN 0 7086- tcp6 0 0 ::1:631 :::*LISTEN 0 104793 - | 2 - How I remove the Warning that I am always having every time I launch a command in YARN? |xubuntu@hadoop-coc-1:~/Programs/hadoop$ hdfs dfs -copyFromLocal ~/input1 webhdfs://192.168.56.101:8080/ Java HotSpot(TM) Client VM warning: You have loaded library /home/xubuntu/Programs/hadoop-2.6.0/lib/native/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. |
how to catch exception when data cannot be replication to any datanode
Hey I got the following error in the application logs when trying to put a file to DFS. 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1). There are 317 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) at org.apache.hadoop.ipc.Client.call(Client.java:1409) at org.apache.hadoop.ipc.Client.call(Client.java:1362) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy23.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362) at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy24.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) This results in empty file in HDFS. I did some search through this email thread and found that this could be caused by disk full, or data node unreachable. However, this exception was only logged as WARN level when FileSystem.close is called, and never thrown visible to client. My question is, on the client level, How can I catch this exception and handle it? Chen -- Chen Song
Re: Permission Denied
Splitting into three unix users is a good idea. Generally, none of the linux users should need access to any of the local resources owned by the others. (that is, the user running the accumulo processes shouldn't be able to interfere with the backing files used by the HDFS processes). By default, the linux user that drives a particular process will be resolved to a Hadoop user by the NameNode process. Presuming your Accumulo services are running under the linux user accumulo, you should ensure that user exists on the linux node that runs the NameNode. The main issue with running init as the hadoop user is that by default it's likely going to write the accumulo directories as owned by the user that created them. Presuming you are using Accumulo because you have security requirements, the common practice is to make sure only the user that runs Accumulo processes can write to /accumulo and that only that user can read /accumulo/tables and /accumulo/wal. This ensures that other users with access to the HDFS cluster won't be able to bypass the cell-level access controls provided by Accumulo. While you are setting up HDFS directories, you should also create a home directory for the user that runs Accumulo processes. If your HDFS instance is set to use the trash feature (either in server configs or the client configs made available to Accumulo), then by default Accumulo will attempt to use it. Without a home directory, this will result in failures. Alternatively, you can ensure Accumulo doesn't rely on the trash feature by setting gc.trash.ignore in your accumulo-site.xml. One other note: I edited the accumulo-site.xml so it now has property nameinstance.volumes/name valuehdfs://localhost:9000/accumulo/value descriptioncomma separated list of URIs for volumes. example: hdfs://localhost:9000/accumulo/description /property You will save yourself headache later if you stick with fully qualified domain names for all HDFS, ZooKeeper, and Accumulo connections. -- Sean On Mon, Mar 2, 2015 at 8:13 AM, David Patterson patt...@gmail.com wrote: David, Thanks for the information. I've issued those two commands in my hadoop shell and still get the same error when I try to initialize accumulo in *its* shell. : 2015-03-02 13:30:41,175 [init.Initialize] FATAL: Failed to initialize filesystem org.apache.hadoop.security.AccessControlException: Permission denied: user=accumulo, access=WRITE, inode=/accumulo: accumulo.supergroup:supergroup:drwxr-xr-x My comment that I had 3 users was meant in a linux sense, not in a hadoop sense. So (to borrow terminoloy from RDF or XML) is there something I have to do in my hadoop setup (running under linix:hadoop) or my accumulo setup (running under linux:accumulo) so that the accumuulo I/O gets processed as from someone in the hadoop:supergroup? I tried running the accumulo init from the linux:hadoop user and it worked. I'm not sure if any permissions/etc were hosed by doing it there. I'll see. Thanks for you help. (By the way, is it wrong or a bad idea to split the work into three linux:users, or should it all be done in one linux:user space?) Dave Patterson On Sun, Mar 1, 2015 at 8:35 PM, dlmarion dlmar...@comcast.net wrote: hadoop fs -mkdir /accumulo hadoop fs -chown accumulo:supergroup /accumulo Original message From: David Patterson patt...@gmail.com Date:03/01/2015 7:04 PM (GMT-05:00) To: user@hadoop.apache.org Cc: Subject: Re: Permission Denied David, Thanks for the reply. Taking the questions in the opposite order, my accumulo-site.xml does not have volumes specified. I edited the accumulo-site.xml so it now has property nameinstance.volumes/name valuehdfs://localhost:9000/accumulo/value descriptioncomma separated list of URIs for volumes. example: hdfs://localhost:9000/accumulo/description /property and got the same error. How can I precreate /accumulo ? Dave Patterson On Sun, Mar 1, 2015 at 3:50 PM, david marion dlmar...@hotmail.com wrote: It looks like / is owned by hadoop.supergroup and the perms are 755. You could precreate /accumulo and chown it appropriately, or set the perms for / to 775. Init is trying to create /accumulo in hdfs as the accumulo user and your perms dont allow it. Do you have instance.volumes set in accumulo-site.xml? Original message From: David Patterson patt...@gmail.com Date:03/01/2015 3:36 PM (GMT-05:00) To: user@hadoop.apache.org Cc: Subject: Permission Denied I'm trying to create an Accumulo/Hadoop/Zookeeper configuration on a single (Ubuntu) machine, with Hadoop 2.6.0, Zookeeper 3.4.6 and Accumulo 1.6.1. I've got 3 userids for these components that are in the same group and no other users are in that group. I have zookeeper running, and hadoop as well. Hadoop's core-site.xml file has the hadoop.tmp.dir set to /app/hadoop/tmp.The /app/hadoop/tmp directory is owned by
Re: how to catch exception when data cannot be replication to any datanode
Which hadoop release are you using ? In branch-2, I see this IOE in BlockManager : if (targets.length minReplication) { throw new IOException(File + src + could only be replicated to + targets.length + nodes instead of minReplication (= + minReplication + ). There are Cheers On Mon, Mar 2, 2015 at 8:44 AM, Chen Song chen.song...@gmail.com wrote: Hey I got the following error in the application logs when trying to put a file to DFS. 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1). There are 317 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) at org.apache.hadoop.ipc.Client.call(Client.java:1409) at org.apache.hadoop.ipc.Client.call(Client.java:1362) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy23.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362) at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy24.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) This results in empty file in HDFS. I did some search through this email thread and found that this could be caused by disk full, or data node unreachable. However, this exception was only logged as WARN level when FileSystem.close is called, and never thrown visible to client. My question is, on the client level, How can I catch this exception and handle it? Chen -- Chen Song
Monitor data transformation
Hi All, I developed a scheduler for data locality. Now I want to test the performance of the scheduler, so I need to monitor how many data are read remotely. Is there every tool for monitoring the volume of data moved around the cluster? Thanks, Fei
how to check hdfs
Hi all, I just start to learn hadoop, I have a naive question I used hdfs dfs -ls /home/cluster to check the content inside. But I get error ls: No FileSystem for scheme: hdfs My configuration file core-site.xml is like configuration property namefs.defaultFS/name valuehdfs://master:9000/value /property /configuration hdfs-site.xml is like configuration property namedfs.replication/name value2/value /property property namedfs.name.dir/name valuefile:/home/cluster/mydata/hdfs/namenode/value /property property namedfs.data.dir/name valuefile:/home/cluster/mydata/hdfs/datanode/value /property /configuration is there any thing wrong ? Thanks a lot.
QUERY
Hello sir/madam, I am doing research on hadoop job scheduling. I want to modify hadoop job scheduling algorithm. I have downloaded the source code hadoop-2.2.0 from apache hadoop website and i have build the same using mvn package -Pdist,native,docs,src -DskipTests -Dtar command to build the jar in fedora.Since you have worked on the same. I would request you to kindly guide me on few queries regarding the same that are:- 1) what are the steps to modify source code? 2) how to compile and test the modification i have made to the source code? Thanking you.
Data locality
Hi All, I developed a scheduler for data locality. Now I want to test the performance of the scheduler, so I need to monitor how many data are read remotely. Is there any tool for monitoring the volume of data moved around the cluster? Thanks, Fei
Re: how to catch exception when data cannot be replication to any datanode
I am using CDH5.1.0, which is hadoop 2.3.0. On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu yuzhih...@gmail.com wrote: Which hadoop release are you using ? In branch-2, I see this IOE in BlockManager : if (targets.length minReplication) { throw new IOException(File + src + could only be replicated to + targets.length + nodes instead of minReplication (= + minReplication + ). There are Cheers On Mon, Mar 2, 2015 at 8:44 AM, Chen Song chen.song...@gmail.com wrote: Hey I got the following error in the application logs when trying to put a file to DFS. 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1). There are 317 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) at org.apache.hadoop.ipc.Client.call(Client.java:1409) at org.apache.hadoop.ipc.Client.call(Client.java:1362) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy23.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362) at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy24.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) This results in empty file in HDFS. I did some search through this email thread and found that this could be caused by disk full, or data node unreachable. However, this exception was only logged as WARN level when FileSystem.close is called, and never thrown visible to client. My question is, on the client level, How can I catch this exception and handle it? Chen -- Chen Song -- Chen Song
Re: how to catch exception when data cannot be replication to any datanode
Also, it could be thrown out in BlockManager but on DFSClient side, it just catch that exception and logs it as a warning. The problem here is that the caller has no way to detect this error and only see an empty file (0 bytes) after the fact. Chen On Mon, Mar 2, 2015 at 2:41 PM, Chen Song chen.song...@gmail.com wrote: I am using CDH5.1.0, which is hadoop 2.3.0. On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu yuzhih...@gmail.com wrote: Which hadoop release are you using ? In branch-2, I see this IOE in BlockManager : if (targets.length minReplication) { throw new IOException(File + src + could only be replicated to + targets.length + nodes instead of minReplication (= + minReplication + ). There are Cheers On Mon, Mar 2, 2015 at 8:44 AM, Chen Song chen.song...@gmail.com wrote: Hey I got the following error in the application logs when trying to put a file to DFS. 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968 org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could only be replicated to 0 nodes instead of minReplication (=1). There are 317 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) at org.apache.hadoop.ipc.Client.call(Client.java:1409) at org.apache.hadoop.ipc.Client.call(Client.java:1362) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy23.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362) at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy24.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) This results in empty file in HDFS. I did some search through this email thread and found that this could be caused by disk full, or data node unreachable. However, this exception was only logged as WARN level when FileSystem.close is called, and never thrown visible to client. My question is, on the client level, How can I catch this exception and handle it? Chen -- Chen Song -- Chen Song -- Chen Song