Re: DFSClient error

2012-04-27 Thread John George
Can you run a regular 'hadoop fs' (put orls or get) command?
If yes, how about a wordcount example?
'path/hadoop jar pathhadoop-*examples*.jar wordcount input output'


-Original Message-
From: Mohit Anchlia mohitanch...@gmail.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Fri, 27 Apr 2012 14:36:49 -0700
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Subject: Re: DFSClient error

I even tried to reduce number of jobs but didn't help. This is what I see:

datanode logs:

Initializing secure datanode resources
Successfully obtained privileged resources (streaming port =
ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port =
sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075])
Starting regular datanode initialization
26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value
of 143

userlogs:

2012-04-26 19:35:22,801 WARN
org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is
available
2012-04-26 19:35:22,801 INFO
org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library
loaded
2012-04-26 19:35:22,808 INFO
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded 
initialized native-zlib library
2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /125.18.62.197:50010, add to deadNodes and continue
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at
org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien
t.java:1664)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j
ava:2383)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java
:2056)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr
essorStream.java:97)
at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt
ream.java:87)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j
ava:75)
at java.io.InputStream.read(InputStream.java:85)
at
org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
at
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe
cordReader.java:114)
at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead
er.nextKeyValue(PigRecordReader.java:187)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT
ask.java:456)
at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /125.18.62.204:50010, add to deadNodes and continue
java.io.EOFException

namenode logs:

2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201204261140_0244 added successfully for user 'hadoop' to queue
'default'
2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201204261140_0244
2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger:
USER=hadoop  IP=125.18.62.196OPERATION=SUBMIT_JOB
TARGET=job_201204261140_0244RESULT=SUCCESS
2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201204261140_0244
2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Exception
in
createBlockOutputStream 125.18.62.198:50010 java.io.IOException: Bad
connect ack with firstBadLink as 125.18.62.197:50010
2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_2499580289951080275_22499
2012-04-26 16:12:53,582 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
datanode 125.18.62.197:50010
2012-04-26 16:12:53,594 INFO org.apache.hadoop.mapred.JobInProgress:
jobToken generated and stored with users keys in
/data/hadoop/mapreduce/job_201204261140_0244/jobToken
2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Input
size for job job_201204261140_0244 = 73808305. Number of splits = 1
2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress:

Re: The meaning of FileSystem in context of OutputFormat storage

2012-04-25 Thread John George
I think what it means is that the output files can be stored in any of the
possible implementation of the FileSystem abstract class depending on the
user requirement. So, it could be stored in DistributedFileSystem,
LocalFileSystem etc...


Regards,
John George

-Original Message-
From: Jay Vyas jayunit...@gmail.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Wed, 25 Apr 2012 10:01:25 -0500
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Subject: The meaning of FileSystem in context of OutputFormat storage

I just saw this line in the javadocs for OutputFormat:

Output files are stored in a
FileSystemhttp://hadoop.apache.org/common/docs/current/api/org/apache/had
oop/fs/FileSystem.html.


Seems like an odd sentence.  What is the implication here -- is this
implying anything other than the obvious ?

-- 
Jay Vyas
MMSB/UCHC



Re: setting client retry

2012-04-12 Thread John George
There are several different types of 'client retries'. The following are
some that I know of.

My guess is that you meant the following one. If so, it is defined in
core-site.xml
ipc.client.connect.max.retries (default value: 10) - Indicates the
number of retries a client will make to establish a server connection.

The other type of retries that I can think of on hdfs side:
dfs.client.block.write.retries (default value: 3) - As the name
suggests, this is the number of times a DFS client retries write to the
DataNodes.

dfs.client.block.write.locateFollowingBlock.retries (default value: 5) -
On certain exceptions, the client might retry when trying to get an
additional block from NN and this configuration controls that.


There might be more. Feel free to let me know if you meant something else.

Regards,
John George

-Original Message-
From: Rita rmorgan...@gmail.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Thu, 12 Apr 2012 07:35:43 -0500
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Subject: setting client retry

In the hdfs-site.xml file what argument do I need to set for client
retries? Also, what is the default parameter?

-- 
--- Get your facts first, then you can distort them as you please.--



Re: Hadoopp_ClassPath issue.

2012-04-11 Thread John George
Dharin,
I believe the properties you are looking for are the following:
HADOOP_USER_CLASSPATH_FIRST: When defined, this will let the user
suggested classpath to the beginning of global classpath. So, you would
have to do something like 'export HADOOP_USER_CLASSPATH_FIRST=true'. If
you are on 2.0 (or 0.23), please refer bin/hadoop-config.sh for more
information. If you are on 1.0 (or 0.20), refer to hadoop script.

Now, if you want to run an M/R job by passing your own jar and you want
that jar to be used first, you want to set the config parameter
'mapreduce.job.user.classpath.first' and then the user provided jar will
be put in before $HADOOP_CLASSPATH.

Hope this makes sense.

Also, these will work on 1.0 (or 0.23) above.

Refer:
https://issues.apache.org/jira/browse/MAPREDUCE-3696 (for 2.0, 0.23)

https://issues.apache.org/jira/browse/MAPREDUCE-1938 (1.0, 0.20)


Thanks,
John George

-Original Message-
From: dmaniar dharin.man...@gmail.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Tue, 10 Apr 2012 21:09:10 -0700
To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org
Subject: Hadoopp_ClassPath issue.


Hi,

I am new to hadoop and its not very familiar with internal working. I had
some questions about HADOOP_CLASSPATH.

We are currently suppose to use a Hadoop cluster with 4 machines and its
HADOOP_CLASSPATH in hadoop-env.sh is as below.
export
HADOOP_CLASSPATH=/home/user/app/www/WEB-INF/classes:$HADOOP_CLASSPATH

Now my,
/home/user/app/www/WEB-INF/classes has a class called Application.class

From a remote machine I submit a map-reduce job to this cluster, with a
jar
called MyJar.jar. [This has a Application.class too, but with some
modifications]

When the TaskTracker spawns a child Java process for the Mapper the
classpath I see is as below in that order,

Lets say my hadoop is installed at: /home/user/hadoop/
/home/user/hadoop/jar1,
/home/user/hadoop/jar2,
.
.
.
/home/user/hadoop/jarN,
/home/user/hadoop/lib/jar1,
/home/user/hadoop/lib/jar2,
/home/user/hadoop/lib/jarN,
1./home/user/app/www/WEB-INF/classes,
2/${mapred.local.dir}/taskTracker/{user}/jobcache/{jobid}/jars/Myjar.jar
[note:- basically this has the modified class that I need to use for my
Map-Reduce job]

Well its clear from this classpath that i will end up using the
Application.class from the classes folder. with gives me incorrect
results.

Now my Question is, how do I make sure i reverse the order of 1  2.

Some pointer that I found was,
1) if MyJar.jar is not changing much then I can put in a shared location
and
modify my hadoop-env.sh to
export
HADOOP_CLASSPATH=/some/share/location/lib:/home/user/app/www/WEB-INF/clas
ses:$HADOOP_CLASSPATH

2) get rid of /home/user/app/www/WEB-INF/classes, from my hadoop-env.sh

3) is there any property taht suggest to add before classpath ?

Any help is greatly appreciated.

To Summarize,
If I have HADOOP_CLASSPTH in hadoop-env.sh already set, then how do I add
application jar before this classpath.

Again. I saw the DistributedCache.java [hadoop src] and the code looks
like.

public static void addFileToClassPath(Path file, Configuration conf)
   throws IOException {
   String classpath = conf.get(mapred.job.classpath.files);
   conf.set(mapred.job.classpath.files, classpath == null ? file
   .toString() : classpath + System.getProperty(path.separator)
   + file.toString());
   .
}

basically new files are added to the end of existing classpath.


Thanks,
Dharin.




-- 
View this message in context:
http://old.nabble.com/Hadoopp_ClassPath-issue.-tp33666009p33666009.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Hadoopp_ClassPath issue.

2012-04-11 Thread John George
Dharin,
I believe the properties you are looking for are the following:
HADOOP_USER_CLASSPATH_FIRST: When defined, this will let the user
suggested classpath to the beginning of global classpath. So, you would
have to do something like 'export HADOOP_USER_CLASSPATH_FIRST=true'. If
you are on 2.0 (or 0.23), please refer bin/hadoop-config.sh for more
information. If you are on 1.0 (or 0.20), refer to hadoop script.

Now, if you want to run an M/R job by passing your own jar and you want
that jar to be used first, you want to set the config parameter
'mapreduce.job.user.classpath.first' and then the user provided jar will
be put in before $HADOOP_CLASSPATH.

Hope this makes sense.

Also, these will work on 1.0 (or 0.23) above.

Refer:
https://issues.apache.org/jira/browse/MAPREDUCE-3696 (for 2.0, 0.23)

https://issues.apache.org/jira/browse/MAPREDUCE-1938 (1.0, 0.20)


Thanks,
John George



-Original Message-
From: dmaniar dharin.man...@gmail.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Tue, 10 Apr 2012 21:09:10 -0700
To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org
Subject: Hadoopp_ClassPath issue.


Hi,

I am new to hadoop and its not very familiar with internal working. I had
some questions about HADOOP_CLASSPATH.

We are currently suppose to use a Hadoop cluster with 4 machines and its
HADOOP_CLASSPATH in hadoop-env.sh is as below.
export
HADOOP_CLASSPATH=/home/user/app/www/WEB-INF/classes:$HADOOP_CLASSPATH

Now my,
/home/user/app/www/WEB-INF/classes has a class called Application.class

From a remote machine I submit a map-reduce job to this cluster, with a
jar
called MyJar.jar. [This has a Application.class too, but with some
modifications]

When the TaskTracker spawns a child Java process for the Mapper the
classpath I see is as below in that order,

Lets say my hadoop is installed at: /home/user/hadoop/
/home/user/hadoop/jar1,
/home/user/hadoop/jar2,
.
.
.
/home/user/hadoop/jarN,
/home/user/hadoop/lib/jar1,
/home/user/hadoop/lib/jar2,
/home/user/hadoop/lib/jarN,
1./home/user/app/www/WEB-INF/classes,
2/${mapred.local.dir}/taskTracker/{user}/jobcache/{jobid}/jars/Myjar.jar
[note:- basically this has the modified class that I need to use for my
Map-Reduce job]

Well its clear from this classpath that i will end up using the
Application.class from the classes folder. with gives me incorrect
results.

Now my Question is, how do I make sure i reverse the order of 1  2.

Some pointer that I found was,
1) if MyJar.jar is not changing much then I can put in a shared location
and
modify my hadoop-env.sh to
export
HADOOP_CLASSPATH=/some/share/location/lib:/home/user/app/www/WEB-INF/clas
ses:$HADOOP_CLASSPATH

2) get rid of /home/user/app/www/WEB-INF/classes, from my hadoop-env.sh

3) is there any property taht suggest to add before classpath ?

Any help is greatly appreciated.

To Summarize,
If I have HADOOP_CLASSPTH in hadoop-env.sh already set, then how do I add
application jar before this classpath.

Again. I saw the DistributedCache.java [hadoop src] and the code looks
like.

public static void addFileToClassPath(Path file, Configuration conf)
   throws IOException {
   String classpath = conf.get(mapred.job.classpath.files);
   conf.set(mapred.job.classpath.files, classpath == null ? file
   .toString() : classpath + System.getProperty(path.separator)
   + file.toString());
   .
}

basically new files are added to the end of existing classpath.


Thanks,
Dharin.




-- 
View this message in context:
http://old.nabble.com/Hadoopp_ClassPath-issue.-tp33666009p33666009.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: How do I include the newer version of Commons-lang in my jar?

2012-04-11 Thread John George
Have you tried setting 'mapreduce.user.classpath.first'? It allows user
jars to be put in the classpath before hadoop jars.

-Original Message-
From: Sky USC sky...@hotmail.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Mon, 9 Apr 2012 15:46:52 -0500
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Subject: RE: How do I include the newer version of Commons-lang in my jar?





Thanks for the reply. I appreciate your helpfulness. I created Jars by
following instructions at
http://blog.mafr.de/2010/07/24/maven-hadoop-job/;. So external Jars are
stored in lib/ folder within a jar.

Am I summarizing this correctly:
1. If hadoop version = 0.20.203 or lower - then, there is not possible
for me to use an external jar such as commons-lang from apache in my
application. Any external jars packaged within my jar under lib
directory are not captured. This appears like a huge limitation to me?
2. If hadoop version   0.20.204 to 1.0.x - then use
HADOOP_USER_CLASSPATH_FIRST=true environment variable before launching
hadoop jar might help. I tried this for version 0.20.205 but it didnt
work. 
3. If hadoop version  2.x or formerly 0.23.x - then this can be set via
API?

Is there a working version of testable jar that has these dependencies
that I can try to figure out if its my way of packaging jar or something
else??

Thx

 From: ha...@cloudera.com
 Date: Mon, 9 Apr 2012 13:50:37 +0530
 Subject: Re: How do I include the newer version of Commons-lang in my
jar?
 To: common-user@hadoop.apache.org
 
 Answer is a bit messy.
 
 Perhaps you can set the environment variable export
 HADOOP_USER_CLASSPATH_FIRST=true before you do a hadoop jar Š to
 launch your job. However, although this approach is present in
 0.20.204+ (0.20.205, and 1.0.x), am not sure if it makes an impact on
 the tasks as well. I don't see it changing anything but for the driver
 CP. I've not tested it - please let us know if it works in your
 environment.
 
 In higher versions (2.x or formerly 0.23.x), this is doable from
 within your job if you set mapreduce.job.user.classpath.first to
 true inside your job, and ship your replacement jars along.
 
 Some versions would also let you set this via
 JobConf/Job.setUserClassesTakesPrecedence(true/false) API calls.
 
 On Mon, Apr 9, 2012 at 11:14 AM, Sky sky...@hotmail.com wrote:
  Hi.
 
  I am new to Hadoop and I am working on project on AWS Elastic
MapReduce.
 
  The problem I am facing is:
  * org.apache.commons.lang.time.DateUtils: parseDate() works OK but
  parseDateStrictly() fails.
  I think parseDateStrictly might be new in lang 2.5. I thought I
included all
  dependencies. However, for some reason, during runtime, my app is not
  picking up the newer commons-lang.
 
  Would love some help.
 
  Thx
  - sky
 
 
 
 
 
 -- 
 Harsh J

 



Re: Hadoop archive

2011-10-20 Thread John George
Could you try 0.20.205.0? The HAR issue in branch-20-security was updated
by JIRA HADOOP-7539.


-Original Message-
From: Jonas Hartwig jonas.hart...@cision.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Mon, 17 Oct 2011 02:11:24 -0700
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Subject: Hadoop archive

Hi, im new to the community.

Id like to create an archive but I get the error: Exception in archives
null.

Im using hadoop 0.204.0. the issue was tracked under MAPREDUCE-1399
https://issues.apache.org/jira/browse/MAPREDUCE-1399  and solved. How
do I combine my hadoop version with a new map/reduce release? And how do
I get the release using firefox? I saw something like JIRA but the
firefox plugin is not working with 7.x.

 

regards




Re: a file can be used as a queue?

2011-06-13 Thread John George



On 6/13/11 6:23 AM, Joey Echeverria j...@cloudera.com wrote:

 This feature doesn't currently work. I don't remember the JIRA for it, but
 there's a ticket which will allow a reader to read from an HDFS file before
 it's closed. In that case, you implement a queue by having the producer write
 to the end of the file and the reader read from the beginning of the file.
 
 I'm not sure if there will be a way to tell that a file is still being
 written, so you may need your own end of stream marker.

One way to know the end of stream would be to call getVisibleLength() on the
input stream. As long as the writer has flushed (or closed) its stream, the
reader should be able to see those bytes. TestWriteRead.java might provide
you some clues  
(hdfs/src/test/hdfs/org/apache/hadoop/hdfs/TestWriteRead.java).

 
 -Joey
 
 On Jun 13, 2011, at 2:55, ltomuno ltom...@163.com wrote:
 
 I heard a HDFS file as a producer - consumer queue, a file can be used as a
 queue? I am very confused

Regards,
John George