Re: HDFS upgrade problem of fsImage
Yes realized that and I see your point :-) However seems like some fs inconsistency present, did you attempt rollback/finalizeUpgrade and check? For that error, FSImage.java/code finds a previous fs state - // Upgrade is allowed only if there are // no previous fs states in any of the directories for (IteratorStorageDirectory it = storage.dirIterator(); it.hasNext();) { StorageDirectory sd = it.next(); if (sd.getPreviousDir().exists()) throw new InconsistentFSStateException(sd.getRoot(), previous fs state should not exist during upgrade. + Finalize or rollback first.); } Thanks Rekha From: Azuryy Yu azury...@gmail.commailto:azury...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Thursday 21 November 2013 5:19 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Cc: hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org Subject: Re: HDFS upgrade problem of fsImage I insist hot upgrade on the test cluster because I want hot upgrade on the prod cluster. On 2013-11-21 7:23 PM, Joshi, Rekha rekha_jo...@intuit.commailto:rekha_jo...@intuit.com wrote: Hi Azurry, This error occurs when FSImage finds previous fs state, and as log states you would need to either finalizeUpgrade or rollback to proceed.Below - bin/hadoop dfsadmin –finalizeUpgrade hadoop dfsadmin –rollback On side note for a small test cluster on which one might suspect you are the only user, why wouldn't you insist on hot upgrade? :-) Thanks Rekha Some helpful guidelines for upgrade here - http://wiki.apache.org/hadoop/Hadoop_Upgrade https://twiki.grid.iu.edu/bin/view/Storage/HadoopUpgrade http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html#Upgrading_from_older_release_to_0.23_and_configuring_federation From: Azuryy Yu azury...@gmail.commailto:azury...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Thursday 21 November 2013 9:48 AM To: hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org, user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: HDFS upgrade problem of fsImage Hi Dear, I have a small test cluster with hadoop-2.0x, and HA configuraded, but I want to upgrade to hadoop-2.2. I dont't want to stop cluster during upgrade, so my steps are: 1) on standby NN: hadoop-dameon.sh stop namenode 2) remove HA configuration in the conf 3) hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster but Exception in the NN log, so how to upgrade and don't stop the whole cluster. Thanks. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hdfs/name is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback first. at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)
Re: HDFS upgrade problem of fsImage
Hi Azurry, This error occurs when FSImage finds previous fs state, and as log states you would need to either finalizeUpgrade or rollback to proceed.Below - bin/hadoop dfsadmin –finalizeUpgrade hadoop dfsadmin –rollback On side note for a small test cluster on which one might suspect you are the only user, why wouldn't you insist on hot upgrade? :-) Thanks Rekha Some helpful guidelines for upgrade here - http://wiki.apache.org/hadoop/Hadoop_Upgrade https://twiki.grid.iu.edu/bin/view/Storage/HadoopUpgrade http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html#Upgrading_from_older_release_to_0.23_and_configuring_federation From: Azuryy Yu azury...@gmail.commailto:azury...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Thursday 21 November 2013 9:48 AM To: hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org, user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: HDFS upgrade problem of fsImage Hi Dear, I have a small test cluster with hadoop-2.0x, and HA configuraded, but I want to upgrade to hadoop-2.2. I dont't want to stop cluster during upgrade, so my steps are: 1) on standby NN: hadoop-dameon.sh stop namenode 2) remove HA configuration in the conf 3) hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster but Exception in the NN log, so how to upgrade and don't stop the whole cluster. Thanks. org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /hdfs/name is in an inconsistent state: previous fs state should not exist during upgrade. Finalize or rollback first. at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)
Re: how to store video,audio files into hdfs through java program
Hi, Think storing is not much of an issue, as much as some thought would be required in processing. Think you can, to the basic be able to use SequenceFileInputFormat, ByteArrayInputStream (and corresponding output) for the binary files. There are some experiments on audio, video here - Audio - http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/ I had heard sometime back MapR has some video streaming functionality.Try looking into? Thanks Rekha From: mallik arjun mallik.cl...@gmail.commailto:mallik.cl...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Friday 8 November 2013 8:26 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: how to store video,audio files into hdfs through java program hi all, is there any way to store video,audio files to store into hdfs. provide some examples. thanks, mallik.
Re: can someone help me how to disable the log info in terminal when type command bin/yarn node in YARN
Almost never silenced the logs on terminal, only tuned config for path/retention period on logs, so just top of mind, mostly –S/--silent for no logs, -V/--verbose for max logs possible works on executables, --help will confirm if it is possible. If it doesn't work, well it should :-) Thanks Rekha From: pengwenwu2008 pengwenwu2...@163.commailto:pengwenwu2...@163.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Mon, 4 Mar 2013 15:29:43 +0800 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: can someone help me how to disable the log info in terminal when type command bin/yarn node in YARN Hi All, can someone help me how to disable the log info in terminal when type command bin/yarn node in YARN bin/yarn node 13/03/04 02:24:43 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/03/04 02:24:43 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Invalid Command Usage : usage: node -list Lists all the nodes. Regards, Wenwu,Peng
Re: Hadoop Security and Kerberos
Hi Yongzhi, Well, I don't know if this will help, but I looked into source code, can see all token, authentication related features discussed in the design under- o.a.h.hdfs.security.*, o.a.h.mapreduce.security.*, o.a.h.security.* , o.a.h.security.authentication.* And HADOOP-4487 is marked fixed now, so there might be explicit bug issue, but features are in. Comparing the release notes can also give more details - http://hadoop.apache.org/docs/r1.0.3/releasenotes.html with http://hadoop.apache.org/docs/r1.0.0/releasenotes.html Owen session on security is good, albeit a bit old - http://developer.yahoo.com/blogs/ydn/posts/2010/07/hadoop_security_in_detai l/ For kerberos itself, this is neat - http://www.ornl.gov/~jar/HowToKerb.html and http://www.cmf.nrl.navy.mil/krb/kerberos-faq.html So installing kerberos itself would be almost similar steps across CDH4, Hortonworks , Yahoo! - only configuration would need to be correctly setup in kerberos.principal, authentication.type in core-site.xml Some more examples - http://hortonworks.com/blog/fine-tune-your-apache-hadoop-security-settings/ #more-1124 https://cwiki.apache.org/GIRAPH/quick-start-running-giraph-with-secure-hado op.html Thanks Rekha On 16/09/12 8:57 AM, Yongzhi Wang wang.yongzhi2...@gmail.com wrote: Dear All, I am confused about the usage of Kerberos on Hadoop 1.0.3. I have difficulty in finding some documents to configure of the security feature of HADOOP 1.0.3. Specifically, how should I configure the Hadoop, so that I can use Kerberos? The only document that is related with this question is CDH4 Security Guide (https://ccp.cloudera.com/display/CDH4DOC/CDH4+Security+Guide), an instruction about the security configuration for CloudEra Distributed Hadoop. But I am not sure if this guide can be directly used to configure the Apache Hadoop 1.0.3. Afterall, I don't know how many differences exist between the CDH4 and Apache Hadoop 1.0.3. I read some materials published by the hadoop development team, including the documentation posted on the apache website (http://hadoop.apache.org/docs/r1.0.3/index.html) and the Hadoop Security Design document proposed by Yahoo! in 2009. Unfortunately, I still can not generate a clear vision after I read those documents. All my questions are derived from one basic question: Are all of the design features in Hadoop Security Design included in the release 1.0.3? If not, which of those features are introduced in release 1.0.3? Which features are included in the Hadoop 2.0? Which features are still not implemented? For example, the Hadoop Security Design document mentioned three types of tokens (Delegation Token, Block Access Token and Job Token). Did release 1.0.3 support all the three types of tokens? In the 1.0.3 document hdfs permission guide (http://hadoop.apache.org/docs/r1.0.3/hdfs_permissions_guide.html), it mentions that In this release of Hadoop the identity of a client process is just whatever the host operating system says it is. For Unix-like systems, ..In the future there will be other ways of establishing user identity (think Kerberos, LDAP, and others). ... It seems the 1.0.3 does not fully support Kerberos. If in that case, to what degree does the release 1.0.3 support Kerberos? So my question is: 1. Is there any document comparing the security feature in each release of hadoop with the Hadoop Security Design proposed by Yahoo! ? 2. In release 1.0.3, which component of hadoop can use Kerberos to leverage security? In order to use the Kerberos, how should I configure Hadoop? I am not very familiar with Kerberos. So if I have some misunderstanding, please feel free to point out. Thanks! Best regards, Yongzhi
Re: Hadoop HDFS and Mapreducer question
Refer hadoop put, get syntax for placing input files on hdfs (automate script) and pig dump, store after mapreduce to have your output directory - http://pig.apache.org/docs/r0.9.2/start.html#Pig+Tutorial+Files Thanks Rekha From: A Geek dw...@live.commailto:dw...@live.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Tue, 18 Sep 2012 05:04:05 + To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Hadoop HDFS and Mapreducer question Hello All, I'm learning hadoop, hdfs etc and currently tying to solve one issue. Can someone help me how to start attacking the following problem: I'm trying to come up with some sample code to store the files \YEAR\Month\Date\account structure using Hadoop technique s. Example: The file will submit to the program as below Test_20120917_ACC1.csv and Test_20120916_ACC2.csv HDFS has to create structure as below HDFS_HOME\2012\09\17\ACC1\Test_20120917_ACC1.csv HDFS_HOME\2012\09\16\ACC1\Test_20120916_ACC2.csv Can someone give me pointers on how to start on this. Highly Appreciated. Thanks for reading the question. Thanks, DW
Re: How get information messages when a JobControl is used ?
Good that web hdfs is sufficient for now, Piter! The counters are part of o.a.h.mapreduce.Job so you can get them as job.getCounters()..etc or via JobInProgress. It is not a JobControl feature as such, so they will not be directly in JobControl/ControlledJob API. However Bertrand's point is an important one, if there are identified synchronization, concurrency issues on JobControl, the values on webhdfs too will reflect that ..unless as the collection of counters is happening within context..hmm..well.. Do please keep us updated if the values you see seem incorrect! Thanks Rekha From: Piter85 Piter85 piter1...@hotmail.itmailto:piter1...@hotmail.it Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Wed, 12 Sep 2012 13:12:23 +0200 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: RE: How get information messages when a JobControl is used ? Hi Rekha and Bertrand! Thanks for the answers! Ok I see that in web interface (_logs-history-job_.) there are infos about executions of jobs. I hope that this infos will be enough for me. As I said before, scanning APIs, the only method that I found was ControlledJob:toString(). Bye! :) Piter Date: Wed, 12 Sep 2012 12:21:24 +0200 Subject: Re: How get information messages when a JobControl is used ? From: decho...@gmail.commailto:decho...@gmail.com To: user@hadoop.apache.orgmailto:user@hadoop.apache.org But as far as I know there is no way to have a snapshot of the JobControl state. https://issues.apache.org/jira/browse/MAPREDUCE-3562 I was trying only to get the state of all jobs and it is not possible to get a consistent view. For Map/Reduce progress, I guess you could the same by digging into the APIs. But I am afraid you will have the same problems. Regards Bertrand On Wed, Sep 12, 2012 at 12:09 PM, Joshi, Rekha rekha_jo...@intuit.commailto:rekha_jo...@intuit.com wrote: Hi Piter, JobControl just means there are multiple complex jobs, but you will see the information for each job on your hadoop web interface webhdfs still, wouldn't you? Or if that does not work, you might need to use Reporters/Counters to get the log info data in custom format as needed. Thanks Rekha From: Piter85 Piter85 piter1...@hotmail.itmailto:piter1...@hotmail.it Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Wed, 12 Sep 2012 11:34:27 +0200 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: How get information messages when a JobControl is used ? Hi! I'm using JobControl (v. 1.0.3) to chain two MapReduce applications. It works and creates output data, but it doesn't give me back information messages as number of mappers, number of records in input or in output, etc... It only returns messages like this : 12/09/12 09:56:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/09/12 09:56:38 INFO input.FileInputFormat: Total input paths to process : 4 I tried to use ControlledJob's toString() method but it returns me only this kind of message: job name:songsort job id:jobctrl1 job state:RUNNING job mapred id:job_201209120942_0005 job message:just initialized job has 1 dependeng jobs: depending job 0:canzoni Any idea to get back remainder infos? Bye! -- Bertrand Dechoux
Re: Non utf-8 chars in input
Hi Ajay, Try SequenceFileAsBinaryInputFormat ? Thanks Rekha On 11/09/12 11:24 AM, Ajay Srivastava ajay.srivast...@guavus.com wrote: Hi, I am using default inputFormat class for reading input from text files but the input file has some non utf-8 characters. I guess that TextInputFormat class is default inputFormat class and it replaces these non utf-8 chars by \uFFFD. If I do not want this behavior and need actual char in my mapper what should be the correct inputFormat class ? Regards, Ajay Srivastava
Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined
Hi Andy, If you are referring to HADOOP_CLASSPATH, that is env variable on your cluster or effected via config xml.But if you need your own environment variables for streaming you may use -cmdenv PATH= on your streaming command.Or if you have specific jars for the streaming process -libjars application.jar on command works. Thanks Rekha From: Andy Xue andyxuey...@gmail.commailto:andyxuey...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Tue, 4 Sep 2012 17:41:39 +1000 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined Hi: I wish to use Hadoop streaming to run a program which requires specific PATH and CLASSPATH variables. I have set these two variables in both /etc/profile and ~/.bashrc on all slaves (and restarted these slaves). However, when I run the hadoop streaming job, the program generates error messages saying that these environment variables are not correctly set. Does it mean that I did not set these environment variables in the correct place? How should I define these environment variables? I use Hadoop 1.0.3 and all machines are running Ubuntu 12.04 Thank you for your time and help. Have a great day. Andy
Re: error in shuffle in InMemoryMerger
Hi Abhay, Ideally the error line - Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/map_128.out suggests you either do not have permissions for output folder or disk is full. Also 5 is not a big number on thread spawning, (infact, default on parallelcopies) to recommend reducing it, but a lower value might work.only long-term indications are for your system to under-go node maintenance. Thanks Rekha From: Abhay Ratnaparkhi abhay.ratnapar...@gmail.commailto:abhay.ratnapar...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Tue, 28 Aug 2012 14:52:27 +0530 To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: error in shuffle in InMemoryMerger Hello, I am getting following error when reduce task is running. mapreduce.reduce.shuffle.parallelcopies property is set to 5. org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in InMemoryMerger - Thread to merge in-memory shuffled map-outputs at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(AccessController.java:284) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:773) at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/map_128.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputF org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in InMemoryMerger - Thread to merge in-memory shuffled map-outputs at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at org.apache.hadoop.mapred.Child$4.run(Child.java:217) at java.security.AccessController.doPrivileged(AccessController.java:284) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:773) at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/map_119.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputF Regards, Abhay