Re: HDFS upgrade problem of fsImage

2013-11-22 Thread Joshi, Rekha
Yes realized that and I see your point :-) However seems like some fs 
inconsistency present, did you attempt rollback/finalizeUpgrade and check?

For that error, FSImage.java/code finds a previous fs state -

// Upgrade is allowed only if there are

// no previous fs states in any of the directories

for (IteratorStorageDirectory it = storage.dirIterator(); it.hasNext();) {

  StorageDirectory sd = it.next();

  if (sd.getPreviousDir().exists())

throw new InconsistentFSStateException(sd.getRoot(),

previous fs state should not exist during upgrade. 

+ Finalize or rollback first.);

}


Thanks

Rekha


From: Azuryy Yu azury...@gmail.commailto:azury...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Thursday 21 November 2013 5:19 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Cc: hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org 
hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org
Subject: Re: HDFS upgrade problem of fsImage


I insist hot upgrade on the test cluster because I want hot upgrade on the prod 
cluster.

On 2013-11-21 7:23 PM, Joshi, Rekha 
rekha_jo...@intuit.commailto:rekha_jo...@intuit.com wrote:

Hi Azurry,

This error occurs when FSImage finds previous fs state, and as log states you 
would need to either finalizeUpgrade or rollback to proceed.Below -

bin/hadoop dfsadmin –finalizeUpgrade
hadoop dfsadmin –rollback

On side note for a small test cluster on which one might suspect you are the 
only user, why wouldn't you insist on hot upgrade? :-)

Thanks
Rekha


Some helpful guidelines for upgrade here -

http://wiki.apache.org/hadoop/Hadoop_Upgrade

https://twiki.grid.iu.edu/bin/view/Storage/HadoopUpgrade

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html#Upgrading_from_older_release_to_0.23_and_configuring_federation


From: Azuryy Yu azury...@gmail.commailto:azury...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Thursday 21 November 2013 9:48 AM
To: hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org 
hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org, 
user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: HDFS upgrade problem of fsImage

Hi Dear,

I have a small test cluster with hadoop-2.0x, and HA configuraded, but I want 
to upgrade to hadoop-2.2.

I dont't want to stop cluster during upgrade, so my steps are:

1)  on standby NN: hadoop-dameon.sh stop namenode
2)  remove HA configuration in the conf
3)   hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster

but Exception in the NN log, so how to upgrade and don't stop the whole cluster.
Thanks.


org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/hdfs/name is in an inconsistent state: previous fs state should not exist 
during upgrade. Finalize or rollback first.
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)


Re: HDFS upgrade problem of fsImage

2013-11-21 Thread Joshi, Rekha
Hi Azurry,

This error occurs when FSImage finds previous fs state, and as log states you 
would need to either finalizeUpgrade or rollback to proceed.Below -

bin/hadoop dfsadmin –finalizeUpgrade
hadoop dfsadmin –rollback

On side note for a small test cluster on which one might suspect you are the 
only user, why wouldn't you insist on hot upgrade? :-)

Thanks
Rekha


Some helpful guidelines for upgrade here -

http://wiki.apache.org/hadoop/Hadoop_Upgrade

https://twiki.grid.iu.edu/bin/view/Storage/HadoopUpgrade

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/Federation.html#Upgrading_from_older_release_to_0.23_and_configuring_federation


From: Azuryy Yu azury...@gmail.commailto:azury...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Thursday 21 November 2013 9:48 AM
To: hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org 
hdfs-...@hadoop.apache.orgmailto:hdfs-...@hadoop.apache.org, 
user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: HDFS upgrade problem of fsImage

Hi Dear,

I have a small test cluster with hadoop-2.0x, and HA configuraded, but I want 
to upgrade to hadoop-2.2.

I dont't want to stop cluster during upgrade, so my steps are:

1)  on standby NN: hadoop-dameon.sh stop namenode
2)  remove HA configuration in the conf
3)   hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster

but Exception in the NN log, so how to upgrade and don't stop the whole cluster.
Thanks.


org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/hdfs/name is in an inconsistent state: previous fs state should not exist 
during upgrade. Finalize or rollback first.
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)


Re: how to store video,audio files into hdfs through java program

2013-11-08 Thread Joshi, Rekha
Hi,

Think storing is not much of an issue, as much as some thought would be 
required in processing.
Think you can, to the basic be able to use SequenceFileInputFormat, 
ByteArrayInputStream (and corresponding output) for the binary files.

There are some experiments on audio, video here -
Audio - 
http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/
I had heard sometime back MapR has some video streaming functionality.Try 
looking into?

Thanks
Rekha

From: mallik arjun mallik.cl...@gmail.commailto:mallik.cl...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Friday 8 November 2013 8:26 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: how to store video,audio files into hdfs through java program

hi all,

is there any way to store video,audio files to store into hdfs.

provide some examples.

thanks,
mallik.


Re: can someone help me how to disable the log info in terminal when type command bin/yarn node in YARN

2013-03-04 Thread Joshi, Rekha
Almost never silenced the logs on terminal, only tuned config for 
path/retention period on logs, so just top of mind, mostly –S/--silent for no 
logs, -V/--verbose for max logs possible works on executables, --help will 
confirm if it is possible.

If it doesn't work, well it should :-)

Thanks
Rekha

From: pengwenwu2008 pengwenwu2...@163.commailto:pengwenwu2...@163.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Mon, 4 Mar 2013 15:29:43 +0800
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: can someone help me how to disable the log info in terminal when type 
command bin/yarn node in YARN

Hi All,
can someone help me how to disable the log info in terminal when type command 
bin/yarn node in YARN

 bin/yarn node
13/03/04 02:24:43 INFO service.AbstractService: 
Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
13/03/04 02:24:43 INFO service.AbstractService: 
Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
Invalid Command Usage :
usage: node
 -list Lists all the nodes.
Regards,
Wenwu,Peng




Re: Hadoop Security and Kerberos

2012-09-17 Thread Joshi, Rekha
Hi Yongzhi,

Well, I don't know if this will help, but I looked into source code, can
see all token, authentication related features discussed in the design
under- o.a.h.hdfs.security.*, o.a.h.mapreduce.security.*, o.a.h.security.*
,  o.a.h.security.authentication.*
And HADOOP-4487 is marked fixed now, so there might be explicit bug issue,
but features are in.
Comparing the release notes can also give more details -
http://hadoop.apache.org/docs/r1.0.3/releasenotes.html with
http://hadoop.apache.org/docs/r1.0.0/releasenotes.html

Owen session on security is good, albeit a bit old -
http://developer.yahoo.com/blogs/ydn/posts/2010/07/hadoop_security_in_detai
l/
For kerberos itself, this is neat -
http://www.ornl.gov/~jar/HowToKerb.html and
http://www.cmf.nrl.navy.mil/krb/kerberos-faq.html

So installing kerberos itself would be almost similar steps across CDH4,
Hortonworks , Yahoo! - only configuration would need to be correctly setup
in kerberos.principal, authentication.type in core-site.xml
Some more examples -
http://hortonworks.com/blog/fine-tune-your-apache-hadoop-security-settings/
#more-1124
https://cwiki.apache.org/GIRAPH/quick-start-running-giraph-with-secure-hado
op.html 

Thanks
Rekha



On 16/09/12 8:57 AM, Yongzhi Wang wang.yongzhi2...@gmail.com wrote:

Dear All,

I am confused about the usage of Kerberos on Hadoop 1.0.3.

I have difficulty in finding some documents to configure of the
security feature of HADOOP 1.0.3. Specifically, how should I configure
the Hadoop, so that I can use Kerberos? The only document that is
related with this question is CDH4 Security Guide
(https://ccp.cloudera.com/display/CDH4DOC/CDH4+Security+Guide), an
instruction about the security configuration for CloudEra Distributed
Hadoop. But I am not sure if this guide can be directly used to
configure the Apache Hadoop 1.0.3. Afterall, I don't know how many
differences exist between the CDH4 and Apache Hadoop 1.0.3.

I read some materials published by the hadoop development team,
including the documentation posted on the apache website
(http://hadoop.apache.org/docs/r1.0.3/index.html) and the Hadoop
Security Design document proposed by Yahoo! in 2009. Unfortunately, I
still can not generate a clear vision after I read those documents.
All my questions are derived from one basic question: Are all of the
design features in Hadoop Security Design included in the release
1.0.3? If not, which of those features are introduced in release
1.0.3? Which features are included in the Hadoop 2.0? Which features
are still not implemented?

For example, the Hadoop Security Design document mentioned three
types of tokens (Delegation Token, Block Access Token and Job Token).
Did release 1.0.3 support all the three types of tokens?

In the 1.0.3 document hdfs permission guide
(http://hadoop.apache.org/docs/r1.0.3/hdfs_permissions_guide.html), it
mentions that In this release of Hadoop the identity of a client
process is just whatever the host operating system says it is. For
Unix-like systems, ..In the future there will be other ways of
establishing user identity (think Kerberos, LDAP, and others).
... It seems the 1.0.3 does not fully support Kerberos. If in
that case, to what degree does the release 1.0.3 support Kerberos?

So my question is:

 1. Is there any document comparing the security feature in each
release of hadoop with the Hadoop Security Design proposed by Yahoo!
?
 2. In release 1.0.3, which component of hadoop can use Kerberos to
leverage security? In order to use the Kerberos, how should I
configure Hadoop?

I am not very familiar with Kerberos. So if I have some
misunderstanding, please feel free to point out.

Thanks!

Best regards,
Yongzhi



Re: Hadoop HDFS and Mapreducer question

2012-09-17 Thread Joshi, Rekha
Refer hadoop put, get syntax for placing input files on hdfs (automate script) 
and pig dump, store after mapreduce to have your output directory - 
http://pig.apache.org/docs/r0.9.2/start.html#Pig+Tutorial+Files

Thanks
Rekha

From: A Geek dw...@live.commailto:dw...@live.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Tue, 18 Sep 2012 05:04:05 +
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Hadoop HDFS and Mapreducer question

Hello All,
I'm learning hadoop, hdfs etc and currently tying to solve one issue. Can 
someone help me how to start attacking the following problem:

I'm trying to come up with some sample code  to store the files  
\YEAR\Month\Date\account structure using Hadoop technique s.

Example: The file will submit to the program as below
Test_20120917_ACC1.csv and Test_20120916_ACC2.csv

HDFS has to create structure as below

HDFS_HOME\2012\09\17\ACC1\Test_20120917_ACC1.csv
HDFS_HOME\2012\09\16\ACC1\Test_20120916_ACC2.csv

Can someone give me pointers on how to start on this. Highly Appreciated. 
Thanks for reading the question.

Thanks,
DW


Re: How get information messages when a JobControl is used ?

2012-09-12 Thread Joshi, Rekha
Good that web hdfs is sufficient for now, Piter!
The counters are part of o.a.h.mapreduce.Job so you can get them as 
job.getCounters()..etc or via JobInProgress. It is not a JobControl feature as 
such, so they will not be directly in JobControl/ControlledJob API.

However Bertrand's point is an important one, if there are identified 
synchronization, concurrency issues on JobControl, the values on webhdfs too 
will reflect that ..unless as the collection of counters is happening within 
context..hmm..well..
Do please keep us updated if the values you see seem incorrect!

Thanks
Rekha

From: Piter85 Piter85 piter1...@hotmail.itmailto:piter1...@hotmail.it
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Wed, 12 Sep 2012 13:12:23 +0200
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: RE: How get information messages when a JobControl is used ?

Hi Rekha and Bertrand! Thanks for the answers! Ok I see that in web interface 
(_logs-history-job_.) there are infos about executions of jobs.
I hope that this infos will be enough for me.

As I said before, scanning APIs, the only method that I found was 
ControlledJob:toString().

Bye! :)

Piter


Date: Wed, 12 Sep 2012 12:21:24 +0200
Subject: Re: How get information messages when a JobControl is used ?
From: decho...@gmail.commailto:decho...@gmail.com
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org

But as far as I know there is no way to have a snapshot of the JobControl state.
https://issues.apache.org/jira/browse/MAPREDUCE-3562

I was trying only to get the state of all jobs and it is not possible to get a 
consistent view.
For Map/Reduce progress, I guess you could the same by digging into the APIs.
But I am afraid you will have the same problems.

Regards

Bertrand

On Wed, Sep 12, 2012 at 12:09 PM, Joshi, Rekha 
rekha_jo...@intuit.commailto:rekha_jo...@intuit.com wrote:
Hi Piter,

JobControl just means there are multiple complex jobs, but you will see the 
information for each job on your hadoop web interface webhdfs still, wouldn't 
you?
Or if that does not work, you might need to use Reporters/Counters to get the 
log info data in custom format as needed.

Thanks
Rekha


From: Piter85 Piter85 piter1...@hotmail.itmailto:piter1...@hotmail.it
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Wed, 12 Sep 2012 11:34:27 +0200
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: How get information messages when a JobControl is used ?

Hi! I'm using JobControl (v. 1.0.3)  to chain two MapReduce applications. It 
works and creates output data, but it doesn't give me back information messages 
as number of mappers, number of records in input or in output, etc...

It only returns messages like this :
12/09/12 09:56:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
12/09/12 09:56:38 INFO input.FileInputFormat: Total input paths to process : 4

I tried to use ControlledJob's toString() method but it returns me only this 
kind of message:

job name:songsort
job id:jobctrl1
job state:RUNNING
job mapred id:job_201209120942_0005
job message:just initialized
job has 1 dependeng jobs:
 depending job 0:canzoni

Any idea to get back remainder infos?

Bye!



--
Bertrand Dechoux


Re: Non utf-8 chars in input

2012-09-11 Thread Joshi, Rekha
Hi Ajay,

Try SequenceFileAsBinaryInputFormat ?


Thanks
Rekha

On 11/09/12 11:24 AM, Ajay Srivastava ajay.srivast...@guavus.com wrote:

Hi,

I am using default inputFormat class for reading input from text files
but the input file has some non utf-8 characters.
I guess that TextInputFormat class is default inputFormat class and it
replaces these non utf-8 chars by \uFFFD. If I do not want this
behavior and need actual char in my mapper what should be the correct
inputFormat class ?



Regards,
Ajay Srivastava



Re: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

2012-09-04 Thread Joshi, Rekha
Hi Andy,

If you are referring to HADOOP_CLASSPATH, that is env variable on your cluster 
or effected via config xml.But if you need your own environment variables for 
streaming you may use -cmdenv PATH= on your streaming command.Or if you have 
specific jars for the streaming process -libjars application.jar on command 
works.

Thanks
Rekha

From: Andy Xue andyxuey...@gmail.commailto:andyxuey...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Tue, 4 Sep 2012 17:41:39 +1000
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Hadoop Streaming: Does not recognise PATH and CLASSPATH defined

Hi:

I wish to use Hadoop streaming to run a program which requires specific PATH 
and CLASSPATH variables. I have set these two variables in both /etc/profile 
and ~/.bashrc on all slaves (and restarted these slaves). However, when I run 
the hadoop streaming job, the program generates error messages saying that 
these environment variables are not correctly set.

Does it mean that I did not set these environment variables in the correct 
place? How should I define these environment variables?
I use Hadoop 1.0.3 and all machines are running Ubuntu 12.04

Thank you for your time and help. Have a great day.

Andy


Re: error in shuffle in InMemoryMerger

2012-08-28 Thread Joshi, Rekha
Hi Abhay,

Ideally the error line - Caused by: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid 
local directory for output/map_128.out suggests you either do not have 
permissions for output folder or disk is full.

Also 5 is not a big number on thread spawning, (infact, default on 
parallelcopies) to recommend reducing it, but a lower value might work.only 
long-term indications are for your system to under-go node maintenance.

Thanks
Rekha

From: Abhay Ratnaparkhi 
abhay.ratnapar...@gmail.commailto:abhay.ratnapar...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Tue, 28 Aug 2012 14:52:27 +0530
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: error in shuffle in InMemoryMerger

Hello,

I am getting following error when reduce task is running.
mapreduce.reduce.shuffle.parallelcopies  property is set to 5.

org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in InMemoryMerger - Thread to merge in-memory shuffled map-outputs at 
org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:217) at 
java.security.AccessController.doPrivileged(AccessController.java:284) at 
javax.security.auth.Subject.doAs(Subject.java:573) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:773)
 at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid 
local directory for output/map_128.out at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351)
 at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
 at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputF 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in InMemoryMerger - Thread to merge in-memory shuffled map-outputs at 
org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:217) at 
java.security.AccessController.doPrivileged(AccessController.java:284) at 
javax.security.auth.Subject.doAs(Subject.java:573) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:773)
 at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid 
local directory for output/map_119.out at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351)
 at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
 at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputF

Regards,
Abhay