Re: How to change logging from DRFA to RFA? Is it a good idea?

2010-09-29 Thread Steve Loughran

On 29/09/10 00:12, Alex Kozlov wrote:

Hi Leo,

What distribution are you using?  Sometimes the log4j.properties is packed
inside .jar file, which is picked first, so you need to explicitly give a
java option '-Dlog4j.configuration=path-to-your-log4j-file' in the
corresponding daemon flags.



You find the JAR which has it in Ant, using the whichresource task. 
Indeed, that was why we wrote it.


Here's a snippet from one of my buildfiles, tests.run.classpath is the 
classpath to run tests that is set up elsewhere


  target name=find-log4j depends=ready-to-test
  description=find log4j property files in the classpath
whichresource resource=/log4j.properties
classpathref=tests.run.classpath
property=log4j.test.url /
echo
  Log4J on the test classpath: ${log4j.test.url}
/echo

  /target



Reducers stucks in copy phase

2010-09-29 Thread Pramy Bhats
Hi,

While trying to run a MapReduce job, the reducers stucks in the copy
phase indefinitely. Though, all the Mapper have been finished the reducers
stucks at 15-20% completion.

The log available at the Reducers is as follows:

2010-09-29 11:33:24,535 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201009291127_0001_r_00_0 Need another 7 map output(s) where 5 is
already in progress 2010-09-29 11:33:24,535 INFO
org.apache.hadoop.mapred.ReduceTask: attempt_201009291127_0001_r_00_0
Scheduled 0 outputs (0 slow hosts and2 dup hosts) 2010-09-29 11:34:24,536
INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201009291127_0001_r_00_0 Need another 7 map output(s) where 5 is
already in progress 2010-09-29 11:34:24,536 INFO
org.apache.hadoop.mapred.ReduceTask: attempt_201009291127_0001_r_00_0
Scheduled 0 outputs (0 slow hosts and2 dup hosts) 2010-09-29 11:35:24,537
INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201009291127_0001_r_00_0 Need another 7 map output(s) where 5 is
already in progress 2010-09-29 11:35:24,537 INFO
org.apache.hadoop.mapred.ReduceTask: attempt_201009291127_0001_r_00_0
Scheduled 0 outputs (0 slow hosts and2 dup hosts)


Could you please help me to figure the cause of this reducers stall ?

thanks in advance.
--PB


Re: How to change logging from DRFA to RFA? Is it a good idea?

2010-09-29 Thread Leo Alekseyev
For the benefit of the list archives: the log4j properties are being
set inside the hadoop daemon shell script (here is the relevant line,
as pointed out to me by Boris)

bin/hadoop-daemon.sh:export HADOOP_ROOT_LOGGER=INFO,DRFA

On Tue, Sep 28, 2010 at 4:12 PM, Alex Kozlov ale...@cloudera.com wrote:
 Hi Leo,

 What distribution are you using?  Sometimes the log4j.properties is packed
 inside .jar file, which is picked first, so you need to explicitly give a
 java option '-Dlog4j.configuration=path-to-your-log4j-file' in the
 corresponding daemon flags.

 Alex K

 On Tue, Sep 28, 2010 at 2:13 PM, Leo Alekseyev dnqu...@gmail.com wrote:

 I have all of the above in my log4j.properties; every  line that
 mentions DRFA is commented out.  And yet, I still get the following
 errors:

 log4j:ERROR Could not find value for key log4j.appender.DRFA
 log4j:ERROR Could not instantiate appender named DRFA.

 Is there another config file?..  Is DRFA hard-coded somewhere?..



 On Mon, Sep 27, 2010 at 5:28 PM, Boris Shkolnik bo...@yahoo-inc.com
 wrote:
  log4j.appender.RFA=org.apache.log4j.RollingFileAppender
  log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
 
  log4j.appender.RFA.MaxFileSize=1MB
  log4j.appender.RFA.MaxBackupIndex=30
 
  hadoop.root.logger=INFO,RFA
 
 
  On 9/27/10 4:12 PM, Leo Alekseyev dnqu...@gmail.com wrote:
 
  We are looking for ways to prevent Hadoop daemon logs from piling up
  (over time they can reach several tens of GB and become a nuisance).
  Unfortunately, the log4j DRFA class doesn't seem to provide an easy
  way to limit the number of files it creates.  I would like to try
  switching to RFA with set MaxFileSize and MaxBackupIndex, since it
  looks like that's going to solve the log accumulation problem, but I
  can't figure out how to change the default logging class for the
  daemons.  Can anyone give me some hints on how to do it?
 
  Alternatively, please let me know if there's a better solution to
  control log accumulation.




Multiple masters in hadoop

2010-09-29 Thread Bhushan Mahale
Hi,

The master files name in hadoop/conf is called as masters.
Wondering if I can configure multiple masters for a single cluster. If yes, how 
can I use them?

Thanks,
Bhushan


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: Multiple masters in hadoop

2010-09-29 Thread Shi Yu
The Master appeared in Masters and Salves files is the machine name or 
ip address.  If you have a single cluster, when you specify multiple 
names in those files it will cause error because of the connection failure.


Shi

On 2010-9-29 15:28, Bhushan Mahale wrote:

Hi,

The master files name in hadoop/conf is called as masters.
Wondering if I can configure multiple masters for a single cluster. If yes, how 
can I use them?

Thanks,
Bhushan


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

   



--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799



Re: Multiple masters in hadoop

2010-09-29 Thread james warren
Actually the /hadoop/conf/masters file is for configuring
secondarynamenode(s).  Check
http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
for
details.

cheers,
-jw

On Wed, Sep 29, 2010 at 1:36 PM, Shi Yu sh...@uchicago.edu wrote:

 The Master appeared in Masters and Salves files is the machine name or ip
 address.  If you have a single cluster, when you specify multiple names in
 those files it will cause error because of the connection failure.

 Shi


 On 2010-9-29 15:28, Bhushan Mahale wrote:

 Hi,

 The master files name in hadoop/conf is called as masters.
 Wondering if I can configure multiple masters for a single cluster. If
 yes, how can I use them?

 Thanks,
 Bhushan


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.





 --
 Postdoctoral Scholar
 Institute for Genomics and Systems Biology
 Department of Medicine, the University of Chicago
 Knapp Center for Biomedical Discovery
 900 E. 57th St. Room 10148
 Chicago, IL 60637, US
 Tel: 773-702-6799




java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)

2010-09-29 Thread Tali K

HI All,

I am getting this Exception on a cluster(10 nodes)  when I am running  simple 
hadoop map / reduce job.
I don't have this Exception while running it on my desktop in hadoop's pseudo 
distributed mode.
Can somebody help? I would really appreciate it.


10/09/29 14:28:34 INFO mapred.JobClient:  map 100% reduce 30%
10/09/29 14:28:36 INFO mapred.JobClient: Task Id : 
attempt_201009291306_0004_r_00_0, Status : FAILED
java.lang.RuntimeException: java.io.EOFException
at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:123)
at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:50)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:447)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:107)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:93)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2316)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:576)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at speeditup.MsRead.readFields(MsRead.java:84)
at 
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97)
... 11 more
Here is a class that has WritableComparator.compare. It has only 2 strings max 
length 20 characters for each.

public class MsRead implements WritableComparable  MsRead {
private static final Log LOG = 
LogFactory.getLog(speeditup.CalculateMinEvalue.class);

private String query_id;

private String record;



public String getRecord() {
return record;
}
public void setRecord(String record) {
this.record = record;
}

public String  getQuery_id() {
return query_id;
}

public void setQuery_id(String queryId) {
query_id = queryId;
}

public MsRead()
{
;
}
public MsRead(String a,  String r) 
{
setQuery_id(a);

setRecord(r);
}




  
@Override
public void readFields(DataInput in) throws IOException {
 LOG.debug(**myreadFields +);
 LOG.warn(**myreadFields +);
 LOG.info(**myreadFields +   );
  query_id = in.readUTF();
  record = in.readUTF();
  
}
@Override
 public void write(DataOutput out) throws IOException {
  out.writeUTF(query_id);
  out.writeUTF(record);
  
 }





public static class FirstComparator extends WritableComparator {

private static final Text.Comparator TEXT_COMPARATOR = new 
Text.Comparator();

public FirstComparator() {
  super(MsRead.class);
}

@Override
public int compare(byte[] b1, int s1, int l1,
   byte[] b2, int s2, int l2) {
 
  try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, 
s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, 
s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
  } catch (IOException e) {
throw new IllegalArgumentException(e);
  }

}

@Override
public int compare(WritableComparable a, WritableComparable b) {
  if (a instanceof MsRead  b instanceof MsRead) {
 
//System.err.println(COMPARE  + ((MsRead)a).getType() + \t + 
((MsRead)b).getType() + \t 
//+ (((MsRead) a).toString().compareTo(((MsRead) 
b).toString(; 
return (((MsRead) a).toString().compareTo(((MsRead) b).toString()));
   
  }
  return super.compare(a, b);
}


  }

@Override
public int compareTo(MsRead o) {
 return this.toString().compareTo(o.toString());
  }
@Override
public boolean equals(Object right) {
if (right instanceof MsRead )
{
return (query_id.equals(((MsRead)right).query_id));
}
else 
return false;
}
@Override
public int hashCode() {
return query_id.hashCode() ;
}
   
@Override 
public String toString()
{
return query_id;
}
public String toOutputString()
{
return record;
}

}
 

RE: Multiple masters in hadoop

2010-09-29 Thread Bhushan Mahale
Thanks James.
The link is helpful too.

Regards,
Bhushan

-Original Message-
From: james warren [mailto:ja...@rockyou.com]
Sent: Wednesday, September 29, 2010 1:50 PM
To: common-user@hadoop.apache.org
Subject: Re: Multiple masters in hadoop

Actually the /hadoop/conf/masters file is for configuring
secondarynamenode(s).  Check
http://www.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
for
details.

cheers,
-jw

On Wed, Sep 29, 2010 at 1:36 PM, Shi Yu sh...@uchicago.edu wrote:

 The Master appeared in Masters and Salves files is the machine name or ip
 address.  If you have a single cluster, when you specify multiple names in
 those files it will cause error because of the connection failure.

 Shi


 On 2010-9-29 15:28, Bhushan Mahale wrote:

 Hi,

 The master files name in hadoop/conf is called as masters.
 Wondering if I can configure multiple masters for a single cluster. If
 yes, how can I use them?

 Thanks,
 Bhushan


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.





 --
 Postdoctoral Scholar
 Institute for Genomics and Systems Biology
 Department of Medicine, the University of Chicago
 Knapp Center for Biomedical Discovery
 900 E. 57th St. Room 10148
 Chicago, IL 60637, US
 Tel: 773-702-6799



DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)

2010-09-29 Thread Ted Yu
Your MsRead.readFields() doesn't contain readInt().
Can you show us the lines around line 84 of MsRead.java ?

On Wed, Sep 29, 2010 at 2:44 PM, Tali K ncherr...@hotmail.com wrote:


 HI All,

 I am getting this Exception on a cluster(10 nodes)  when I am running
  simple hadoop map / reduce job.
 I don't have this Exception while running it on my desktop in hadoop's
 pseudo distributed mode.
 Can somebody help? I would really appreciate it.


 10/09/29 14:28:34 INFO mapred.JobClient:  map 100% reduce 30%
 10/09/29 14:28:36 INFO mapred.JobClient: Task Id :
 attempt_201009291306_0004_r_00_0, Status : FAILED
 java.lang.RuntimeException: java.io.EOFException
at
 org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
at
 org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at
 org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:123)
at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:50)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:447)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:107)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:93)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2316)
at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:576)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at speeditup.MsRead.readFields(MsRead.java:84)
at
 org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97)
... 11 more
 Here is a class that has WritableComparator.compare. It has only 2 strings
 max length 20 characters for each.

 public class MsRead implements WritableComparable  MsRead {
private static final Log LOG =
 LogFactory.getLog(speeditup.CalculateMinEvalue.class);

private String query_id;

private String record;



public String getRecord() {
return record;
}
public void setRecord(String record) {
this.record = record;
}

public String  getQuery_id() {
return query_id;
}

public void setQuery_id(String queryId) {
query_id = queryId;
}

public MsRead()
{
;
}
public MsRead(String a,  String r)
{
setQuery_id(a);

setRecord(r);
}





@Override
public void readFields(DataInput in) throws IOException {
 LOG.debug(**myreadFields +);
 LOG.warn(**myreadFields +);
 LOG.info(**myreadFields +   );
  query_id = in.readUTF();
  record = in.readUTF();

}
@Override
 public void write(DataOutput out) throws IOException {
  out.writeUTF(query_id);
  out.writeUTF(record);

 }





public static class FirstComparator extends WritableComparator {

private static final Text.Comparator TEXT_COMPARATOR = new
 Text.Comparator();

public FirstComparator() {
  super(MsRead.class);
}

@Override
public int compare(byte[] b1, int s1, int l1,
   byte[] b2, int s2, int l2) {

  try {
int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) +
 readVInt(b1, s1);
int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) +
 readVInt(b2, s2);
return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2,
 firstL2);
  } catch (IOException e) {
throw new IllegalArgumentException(e);
  }

}

@Override
public int compare(WritableComparable a, WritableComparable b) {
  if (a instanceof MsRead  b instanceof MsRead) {

//System.err.println(COMPARE  + ((MsRead)a).getType() + \t +
 ((MsRead)b).getType() + \t
//+ (((MsRead) a).toString().compareTo(((MsRead)
 b).toString(;
return (((MsRead) a).toString().compareTo(((MsRead)
 b).toString()));

  }
  return super.compare(a, b);
}


  }

@Override
public int compareTo(MsRead o) {
 return this.toString().compareTo(o.toString());
  }
@Override
public boolean equals(Object right) {
if (right instanceof MsRead )
{
return (query_id.equals(((MsRead)right).query_id));
}
else
return false;
}
@Override
public int hashCode() {
return query_id.hashCode() ;
}

@Override
public String toString()
{
return query_id;
}
public String toOutputString()
{
return record;
}

 }