Re: Logging from the job

2010-04-27 Thread Amareshwari Sri Ramadasu
Where are you looking for the logs?
They will be available in Tasklogs. You can view them from web ui from 
taskdetails.jsp page.

-Amareshwari

On 4/27/10 2:22 PM, Alexander Semenov bohtva...@gmail.com wrote:

Hi all.

I'm not sure if I'm posting to correct mail list, please, suggest the
correct one if so.

I need to log statements from the running job, e.g. use Apache commons
logging to print debug messages on map and reduce operations. I've tuned
the conf/log4.properties file for my logging domain but log statements
are still missing in the log files and on the console. I start the job
like this:

hadoop jar jar_file.jar input_dir output_dir

The job finishes gracefully but I see no logging.

Any suggestions?
Thanks.




Re: Logging from the job

2010-04-27 Thread Jeff Zhang
you should use the log4j rather than the apache common logging

On Tue, Apr 27, 2010 at 4:52 PM, Alexander Semenov bohtva...@gmail.com wrote:
 Hi all.

 I'm not sure if I'm posting to correct mail list, please, suggest the
 correct one if so.

 I need to log statements from the running job, e.g. use Apache commons
 logging to print debug messages on map and reduce operations. I've tuned
 the conf/log4.properties file for my logging domain but log statements
 are still missing in the log files and on the console. I start the job
 like this:

 hadoop jar jar_file.jar input_dir output_dir

 The job finishes gracefully but I see no logging.

 Any suggestions?
 Thanks.





-- 
Best Regards

Jeff Zhang


Re: Logging from the job

2010-04-27 Thread Alexander Semenov
I'm expecting to see the logs on the console since the root logger is
configured to do so.

On Tue, 2010-04-27 at 14:28 +0530, Amareshwari Sri Ramadasu wrote:
 Where are you looking for the logs?
 They will be available in Tasklogs. You can view them from web ui from 
 taskdetails.jsp page.
 
 -Amareshwari
 
 On 4/27/10 2:22 PM, Alexander Semenov bohtva...@gmail.com wrote:
 
 Hi all.
 
 I'm not sure if I'm posting to correct mail list, please, suggest the
 correct one if so.
 
 I need to log statements from the running job, e.g. use Apache commons
 logging to print debug messages on map and reduce operations. I've tuned
 the conf/log4.properties file for my logging domain but log statements
 are still missing in the log files and on the console. I start the job
 like this:
 
 hadoop jar jar_file.jar input_dir output_dir
 
 The job finishes gracefully but I see no logging.
 
 Any suggestions?
 Thanks.
 
 




Re: Logging from the job

2010-04-27 Thread Steve Loughran

Alexander Semenov wrote:

Ok, thanks. Unfortunately ant is currently not installed on machine
running hadoop. What if I use slf4j + logback just in job's jar?


depends on the classpath. You can do it in your own code by getting 
whatever classloader you use then call getResource(log4.properties) to 
get the URL, which you can print to the console, something like:


System.err.println(Log4J properties at 
+this.getClass().getClassloader().getResource(log4.properties));




BTW, is hadoop planning to migrate on this stack from the deprecated
Apache commons and log4j?


Deprecated? That's very much a point of view.

1. commons-log is a thin front end to anything; I have my own that can 
be switched in when I desire to get the logs from 1 machine in one 
place. Where it is great for is in libraries which can be embedded in 
other things -Hadoop does get used this way- as it stops the library 
saying here is the logging tool you must use. you get to choose.


2. Log4J is a fantastic logging API with some really good back end 
implementations. In Hadoop, I'd recommend the rolling Logs. There is 
good support in Hadoop for changing logging levels as you go along.


3. slf4j was meant to be a log api that avoided all the problems of 
layering, but it actually introduces a new one: the risk of 1 SLF4J on 
the classpath. And, as it's an extra JAR that Jetty requires, you have 
to make sure your job's SLF4j doesn't clash with the one used for Jetty. 
And its back-ends aren't on a par with Log4J.


I'm happy with commons-logging and log4j, just wish that jetty had 
stayed with commons-logging. The alternative would be the 
java.util.logging APIs, which are themselves a dog to configure. You 
need to point to the right config file using JVM system properties which 
really need to be set on the command line to get picked up early enough, 
and as the docs say:
By default, the LogManager reads its initial configuration from a 
properties file lib/logging.properties in the JRE directory. 


This isn't as bad as having a commons-logging or log4.properties in the 
JAR of a third party library, but it means that by default, one logging 
setup per JVM unless you deliberately configure each app differently. 
Which is a PITA, and its probably one of the reasons that the java 
logging API never took off, and doesn't have as good back end loggers as 
Log4J.


Summary: try and find the properties file, it may just be classpath/JVM 
quirks, learn to use the rolling/daily log4j logs, don't keep the logs 
on your root drive either.

-Steve



mutipul reduce tasks in contrib/index

2010-04-27 Thread ni_jiangfeng
hi all,

I'm using contrib/index for text indexing, can i have mulitpul reducer for 
index writing? For example documents from the same type falls into the same 
reduce node.

2010-04-27 



ni_jiangfeng 


DataNode not able to spawn a Task

2010-04-27 Thread vishalsant

Hi guys, 
   
 I see the exception below when I launch a job


0/04/27 10:54:16 INFO mapred.JobClient:  map 0% reduce 0%
10/04/27 10:54:22 INFO mapred.JobClient: Task Id :
attempt_201004271050_0001_m_005760_0, Status : FAILED
Error initializing attempt_201004271050_0001_m_005760_0:
java.lang.NumberFormatException: For input string: -
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:476)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.hadoop.fs.DF.parseExecResult(DF.java:125)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:751)
at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1665)
at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
at
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1630)

 

Few things

* I ran fsck on the namenode and no corrupted blocks reported.
* The -report from dfsadmin , says the datanode is up.
 
-- 
View this message in context: 
http://old.nabble.com/DataNode-not-able-to-spawn-a-Task-tp28378863p28378863.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: DataNode not able to spawn a Task

2010-04-27 Thread Todd Lipcon
Hi Vishal,

What operating system are you on? The TT is having issues parsing the output
of df

-Todd

On Tue, Apr 27, 2010 at 9:03 AM, vishalsant vishal.santo...@gmail.comwrote:


 Hi guys,

  I see the exception below when I launch a job


 0/04/27 10:54:16 INFO mapred.JobClient:  map 0% reduce 0%
 10/04/27 10:54:22 INFO mapred.JobClient: Task Id :
 attempt_201004271050_0001_m_005760_0, Status : FAILED
 Error initializing attempt_201004271050_0001_m_005760_0:
 java.lang.NumberFormatException: For input string: -
at

 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:476)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.hadoop.fs.DF.parseExecResult(DF.java:125)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at

 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
at

 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at
 org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:751)
at
 org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1665)
at
 org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
at

 org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1630)



 Few things

 * I ran fsck on the namenode and no corrupted blocks reported.
 * The -report from dfsadmin , says the datanode is up.

 --
 View this message in context:
 http://old.nabble.com/DataNode-not-able-to-spawn-a-Task-tp28378863p28378863.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: DataNode not able to spawn a Task

2010-04-27 Thread vishalsant

It seems that this piece of code , does a df to get the amount of free space
( got this info from the irc channel ) 
And it is trying to do a Number conversion on information returned by df

/Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sda21891213200 -45291780 1838887216   -  /
.
.

Of course in my case the Use% is - and that is an issue :)

BTW , this datanode had stopped responding.. it is always good idea to do df
thus , to make sure that this does not happen during job execution and may
be even as a part of the ./hadoop dfsadmin -report pbly.

Will close the thread , when this is resolved with the disk issue ( which it
seems to be ).







vishalsant wrote:
 
 Hi guys, 

  I see the exception below when I launch a job
 
 
 0/04/27 10:54:16 INFO mapred.JobClient:  map 0% reduce 0%
 10/04/27 10:54:22 INFO mapred.JobClient: Task Id :
 attempt_201004271050_0001_m_005760_0, Status : FAILED
 Error initializing attempt_201004271050_0001_m_005760_0:
 java.lang.NumberFormatException: For input string: -
   at
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Integer.parseInt(Integer.java:476)
   at java.lang.Integer.parseInt(Integer.java:499)
   at org.apache.hadoop.fs.DF.parseExecResult(DF.java:125)
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
   at org.apache.hadoop.util.Shell.run(Shell.java:134)
   at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
   at
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
   at
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
   at 
 org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:751)
   at
 org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1665)
   at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
   at
 org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1630)
 
  
 
 Few things
 
 * I ran fsck on the namenode and no corrupted blocks reported.
 * The -report from dfsadmin , says the datanode is up.
  
 

-- 
View this message in context: 
http://old.nabble.com/DataNode-not-able-to-spawn-a-Task-tp28378863p28379065.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Reducer ID

2010-04-27 Thread Farhan Husain
Thanks!

2010/4/26 Amareshwari Sri Ramadasu amar...@yahoo-inc.com

 context.getTaskAttemptID() gives the task attempt id and
 context,getTaskAttemptID().getTaskID() gives the  task id of the reducer.
 Context.getTaskAttemptID().getTaskID().getId() gives the reducer number.

 Thanks
 Amareshwari

 On 4/27/10 5:34 AM, Gang Luo lgpub...@yahoo.com.cn wrote:

 JobConf.get(mapred.task.id) gives you everything (including attempt id).

 -Gang


 - 原始邮件 
 发件人: Farhan Husain farhan.hus...@csebuet.org
 收件人: common-user@hadoop.apache.org
 发送日期: 2010/4/26 (周一) 7:13:03 下午
 主   题: Reducer ID

 Hello,

 Is it possible to know the unique id of a reducer inside the reduce or
 setup
 method of a reducer class? I tried to find any method of the context class
 which might help in this regard but could not get any.

 Thanks,
 Farhan








Re: Logging from the job

2010-04-27 Thread Alex Kozlov
Hi Alexander,

Where are you looking for the logs?  The output of the tasks should be in
$HADOOP_LOG_DIR/userlogs/attempt*/{stdout,stderr,syslog}.

Could you provide the exact java command line your tasks are running with
(do 'ps -ef | grep Child' on one of the nodes when the job is running).

Alex K

On Tue, Apr 27, 2010 at 1:52 AM, Alexander Semenov bohtva...@gmail.comwrote:

 Hi all.

 I'm not sure if I'm posting to correct mail list, please, suggest the
 correct one if so.

 I need to log statements from the running job, e.g. use Apache commons
 logging to print debug messages on map and reduce operations. I've tuned
 the conf/log4.properties file for my logging domain but log statements
 are still missing in the log files and on the console. I start the job
 like this:

 hadoop jar jar_file.jar input_dir output_dir

 The job finishes gracefully but I see no logging.

 Any suggestions?
 Thanks.




Hadoop Developer opening San Diego or Los Angeles

2010-04-27 Thread Larry Mills
My name is Larry Mills and I am conducting a search for two Hadoop/Java
developers in San Diego or Los Angeles.  Please see below job description.

Hadoop/Java Developers Needed

San Diego or Los Angeles, California

Position Summary

We are seeking two JAVA/Hadoop developers who will be as passionate about
our product as we are. If you enjoy pushing the envelope of internet
technology to deliver next generation eCommerce solutions and you meet these
qualifications, we want to talk to you.

Requirements/Qualifications:

*   3+ years JAVA development 
*   Hadoop (HDFS and MapReduce) development training or experience 
*   Passion for cutting-edge technologies 
*   Excellent communication and verbal skills 
*   Must thrive in a fast-paced, small, team-work environment 
*   Bachelor's degree in Computer Science or a related field preferred 
*   Minimum of 3 years professional development experience 

What We can Offer You!

*   Competitive Wages 
*   Medical Benefits 
*   Dental Benefits 
*   401(k) Plan 
*   Vision 

Please contact Larry Mills at 720.339.1361 or email
lmi...@knowledgerecruiters.com to  further explore this opportunity.

 

 

Sincerely, 

cid:image003.png@01CABBD6.B556F1D0

Larry Mills

Managing Partner

Knowledge Recruiters

8547 East Arapahoe Road, J 254

Greenwood Village, Colorado 80112

Phone:  720-339-1361

lmi...@knowledgerecruiters.com

 http://www.knowledgerecruiters.com/ www.knowledgerecruiters.com

 

 http://www.linkedin.com/pub/larry-mills/2/35b/279
http://www.linkedin.com/pub/larry-mills/2/35b/279

 



User defined class as Map/Reduce output value

2010-04-27 Thread Farhan Husain
Hello,

I want to output a class which I have written as the value of the map phase.
The obvious was is to implement the Writable interface but the problem is
the class has other classes as its member properties. The DataInput and
DataOutput interfaces used by the read and write methods of the Writable
class do not support object serialization. Is there any other way I can
achieve this?

Thanks,
Farhan


Re: User defined class as Map/Reduce output value

2010-04-27 Thread Ted Yu
Take a look at the sample given in Javadoc of Writable.java
You need to serialize your data yourself:
 @Override
public void readFields(DataInput in) throws IOException {
  h = Text.readString(in);
  sc = in.readFloat();
  ran = in.readInt ();
}


On Tue, Apr 27, 2010 at 10:53 AM, Farhan Husain
farhan.hus...@csebuet.orgwrote:

 Hello,

 I want to output a class which I have written as the value of the map
 phase.
 The obvious was is to implement the Writable interface but the problem is
 the class has other classes as its member properties. The DataInput and
 DataOutput interfaces used by the read and write methods of the Writable
 class do not support object serialization. Is there any other way I can
 achieve this?

 Thanks,
 Farhan



Jetty Exceptions in the DataNode log related to task failure ?

2010-04-27 Thread vishalsant

Not sure , what is happening here .. in the sense that is this critical?
I had read that the status of a task is passed on to the jobtracker over
http.
Is that true ? 
I see tasks killed b'coz of expiree , even though the Datanode seems to be
alive and kicking ( expect for the above exception )..

Is there any relation?


2010-04-27 14:51:47,334 WARN org.mortbay.log: Committed before 410
getMapOutput(attempt_201004271342_0001_m_001281_0,49) failed :
org.mortbay.jetty.EofException
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
at
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:566)
at 
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946)
at
org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646)
at
org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2943)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
at
org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
... 23 more

-- 
View this message in context: 
http://old.nabble.com/Jetty-Exceptions-in-the-DataNode-log-related-to-task-failure---tp28381307p28381307.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: User defined class as Map/Reduce output value

2010-04-27 Thread Farhan Husain
Can I use the Serializable interface? Alternatively, is there any way to
specify OutputFormatter for mappers like we can do for reducers?

Thanks,
Farhan

On Tue, Apr 27, 2010 at 1:19 PM, Ted Yu yuzhih...@gmail.com wrote:

 Take a look at the sample given in Javadoc of Writable.java
 You need to serialize your data yourself:
 @Override
public void readFields(DataInput in) throws IOException {
  h = Text.readString(in);
  sc = in.readFloat();
  ran = in.readInt ();
}


 On Tue, Apr 27, 2010 at 10:53 AM, Farhan Husain
 farhan.hus...@csebuet.orgwrote:

  Hello,
 
  I want to output a class which I have written as the value of the map
  phase.
  The obvious was is to implement the Writable interface but the problem is
  the class has other classes as its member properties. The DataInput and
  DataOutput interfaces used by the read and write methods of the Writable
  class do not support object serialization. Is there any other way I can
  achieve this?
 
  Thanks,
  Farhan
 



Questions on MultithreadedMapper

2010-04-27 Thread Jim Twensky
Hi,

I've decided to refactor some of my Hadoop jobs and implement them
using MultithreadedMapper.class but I got puzzled because of some
unexpected error messages at run time.
Here are some relevant settings regarding my Hadoop cluster:

mapred.tasktracker.map.tasks.maximum = 1
mapred.tasktracker.reduce.tasks.maximum = 1
mapred.job.reuse.jvm.num.tasks = -1
mapred.map.multithreadedrunner.threads = 4

I'd like to know how threads are used to run the map task in a single
JVM (Correct me if this is wrong). Suppose I've got a sample Mapper
class as such:

class Mapper ... {

MyObject A;
static MyObject B;

setup() {
   Configuration conf = context.getConfiguration();
   A.initialize(c);
   B.initialize(c);
}

map() {...}

cleanup() {...}

Does each thread run all three of setup(), map(), cleanup() methods ?

-OR-

Are setup() and cleanup() run once per task (and thus per JVM
according to my settings) and so map is the only multithreaded
function?
Also, are the objects A and B shared among different threads or does
each trade have its own copy of them? My initial guess was that each
thread would have a separate copy of A, and B would be shared among
the 4 threads running on the same box since it is defined as static,
but it appears to me that this assumption is not correct and A seems
to be shared.

Thanks,
Jim


Re: User defined class as Map/Reduce output value

2010-04-27 Thread Farhan Husain
I tried to use a class which implements the Serializable interface and got
the following error:

java.lang.NullPointerException
at
org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:759)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:487)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

On Tue, Apr 27, 2010 at 12:53 PM, Farhan Husain
farhan.hus...@csebuet.orgwrote:

 Hello,

 I want to output a class which I have written as the value of the map
 phase. The obvious was is to implement the Writable interface but the
 problem is the class has other classes as its member properties. The
 DataInput and DataOutput interfaces used by the read and write methods of
 the Writable class do not support object serialization. Is there any other
 way I can achieve this?

 Thanks,
 Farhan



Output pair in Mapper.cleanup method

2010-04-27 Thread Farhan Husain
Hello,

Is it possible to output in Mapper.cleanup method since the Mapper.context
object is still available there?

Thanks,
Farhan


Re: Output pair in Mapper.cleanup method

2010-04-27 Thread Eric Sammer
Yes. It's a common pattern to buffer some amount of data in the map()
method, flushing every N records and then to flush any remaining
records in the cleanup() method.

On Tue, Apr 27, 2010 at 6:57 PM, Farhan Husain
farhan.hus...@csebuet.org wrote:
 Hello,

 Is it possible to output in Mapper.cleanup method since the Mapper.context
 object is still available there?

 Thanks,
 Farhan




-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com


Re: Output pair in Mapper.cleanup method

2010-04-27 Thread Farhan Husain
Thanks Eric.

On Tue, Apr 27, 2010 at 6:19 PM, Eric Sammer esam...@cloudera.com wrote:

 Yes. It's a common pattern to buffer some amount of data in the map()
 method, flushing every N records and then to flush any remaining
 records in the cleanup() method.

 On Tue, Apr 27, 2010 at 6:57 PM, Farhan Husain
 farhan.hus...@csebuet.org wrote:
  Hello,
 
  Is it possible to output in Mapper.cleanup method since the
 Mapper.context
  object is still available there?
 
  Thanks,
  Farhan
 



 --
 Eric Sammer
 phone: +1-917-287-2675
 twitter: esammer
 data: www.cloudera.com



Re: User defined class as Map/Reduce output value

2010-04-27 Thread Ted Yu
Can you try adding 'org.apache.hadoop.io.serializer.JavaSerialization,' to
the following config ?
C:\hadoop-0.20.2\src\core\core-default.xml(87,9):
  nameio.serializations/name

By default, only org.apache.hadoop.io.serializer.WritableSerialization is
included.

On Tue, Apr 27, 2010 at 3:55 PM, Farhan Husain farhan.hus...@csebuet.orgwrote:

 I tried to use a class which implements the Serializable interface and got
 the following error:

 java.lang.NullPointerException
at

 org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:759)
at

 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:487)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

 On Tue, Apr 27, 2010 at 12:53 PM, Farhan Husain
 farhan.hus...@csebuet.orgwrote:

  Hello,
 
  I want to output a class which I have written as the value of the map
  phase. The obvious was is to implement the Writable interface but the
  problem is the class has other classes as its member properties. The
  DataInput and DataOutput interfaces used by the read and write methods of
  the Writable class do not support object serialization. Is there any
 other
  way I can achieve this?
 
  Thanks,
  Farhan