Re: Is Mapper's map method thread safe?

2009-05-14 Thread Shengkai Zhu
Each mapper instance will be executed in separate JVM.

On Thu, May 14, 2009 at 2:04 PM, imcaptor imcap...@gmail.com wrote:

 Dear all:

 Any one knows Is Mapper's map method thread safe?
 Thank you!

 imcaptor




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: If I make two map reduce, can I don't save the medial output?

2009-04-17 Thread Shengkai Zhu
More detail description?

On Fri, Apr 17, 2009 at 2:21 PM, 王红宝 imcap...@gmail.com wrote:

 as the tittle.


 Thank You!
 imcaptor




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: custom writable class

2008-09-18 Thread Shengkai Zhu
Your custom implementation of any interface from hadoop-core should be
archived together with the application (i.e. in the same jar).
Andt he jar will be added to CLASSPATH of the task runner, then your
customwritable.java could be found.

On Thu, Sep 18, 2008 at 8:09 PM, Deepak Diwakar [EMAIL PROTECTED] wrote:

 Hi,

 I am new to hadoop. For my map/reduce task I want to write my on custom
 writable class. Could anyone please let me know where exactly to place the
 customwritable.java file?

 I found that in {hadoop-home}
 /hadoop-{version}/src/java/org/apache/hadoop/io/ all  type of writable
 class
 files are there.

 Then  in the main task, we just include import
 org.apache.hadoop.io.{X}Writable; But this is not working for me.
 Basically
 at the time of compilation compiler doesn't find my customwritable class
 which i have placed in the mentioned folder.

 plz help me in this endevor.

 Thanks
 deepak




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: custom writable class

2008-09-18 Thread Shengkai Zhu
You can refer to the Hadoop Map-Reduce Tutorial

On Thu, Sep 18, 2008 at 8:40 PM, Shengkai Zhu [EMAIL PROTECTED] wrote:


 Your custom implementation of any interface from hadoop-core should be
 archived together with the application (i.e. in the same jar).
 Andt he jar will be added to CLASSPATH of the task runner, then your
 customwritable.java could be found.


 On Thu, Sep 18, 2008 at 8:09 PM, Deepak Diwakar [EMAIL PROTECTED]wrote:

 Hi,

 I am new to hadoop. For my map/reduce task I want to write my on custom
 writable class. Could anyone please let me know where exactly to place the
 customwritable.java file?

 I found that in {hadoop-home}
 /hadoop-{version}/src/java/org/apache/hadoop/io/ all  type of writable
 class
 files are there.

 Then  in the main task, we just include import
 org.apache.hadoop.io.{X}Writable; But this is not working for me.
 Basically
 at the time of compilation compiler doesn't find my customwritable class
 which i have placed in the mentioned folder.

 plz help me in this endevor.

 Thanks
 deepak




 --

 朱盛凯

 Jash Zhu

 复旦大学软件学院

 Software School, Fudan University




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: custom writable class

2008-09-18 Thread Shengkai Zhu
Here is the link
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html

On Thu, Sep 18, 2008 at 9:16 PM, chanel [EMAIL PROTECTED] wrote:

 Where can you find the Hadoop Map-Reduce Tutorial?


 Shengkai Zhu wrote:

 You can refer to the Hadoop Map-Reduce Tutorial

 On Thu, Sep 18, 2008 at 8:40 PM, Shengkai Zhu [EMAIL PROTECTED]
 wrote:



 Your custom implementation of any interface from hadoop-core should be
 archived together with the application (i.e. in the same jar).
 Andt he jar will be added to CLASSPATH of the task runner, then your
 customwritable.java could be found.


 On Thu, Sep 18, 2008 at 8:09 PM, Deepak Diwakar [EMAIL PROTECTED]
 wrote:



 Hi,

 I am new to hadoop. For my map/reduce task I want to write my on custom
 writable class. Could anyone please let me know where exactly to place
 the
 customwritable.java file?

 I found that in {hadoop-home}
 /hadoop-{version}/src/java/org/apache/hadoop/io/ all  type of writable
 class
 files are there.

 Then  in the main task, we just include import
 org.apache.hadoop.io.{X}Writable; But this is not working for me.
 Basically
 at the time of compilation compiler doesn't find my customwritable class
 which i have placed in the mentioned folder.

 plz help me in this endevor.

 Thanks
 deepak




 --

 朱盛凯

 Jash Zhu

 复旦大学软件学院

 Software School, Fudan University










 No virus found in this outgoing message.
 Checked by AVG - http://www.avg.com
 Version: 8.0.169 / Virus Database: 270.6.21/1678 - Release Date: 9/18/2008
 9:01 AM




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: hadoop hanging (probably misconfiguration) assistance

2008-09-11 Thread Shengkai Zhu
Logs may probably tell what happened.

On Thu, Sep 11, 2008 at 3:20 PM, [EMAIL PROTECTED] wrote:

 Hi All,
 I have been trying to move from pseudo distributed hadoop cluster which
 worked perfectly well, to a real hadoop cluster.  I was able to execute
 the wordcount example on my pseudo cluster but my real cluster hangs at
 this point:

 # bin/hadoop jar hadoop*jar wordcount /myinput /myoutput
 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
 : 2
 08/09/10 17:10:30 INFO mapred.FileInputFormat: Total input paths to process
 : 2
 08/09/10 17:10:31 INFO mapred.JobClient: Running job: job_200809101706_0001
 08/09/10 17:10:32 INFO mapred.JobClient:  map 0% reduce 0%

 The machines are doing nothing ie all processes at 0.0%

 I have changed the configuration a couple of times to see where the issue
 lies.  Currently I have 2 machines in the cluster the namenode and
 the jobtracker one one machine with the datanode on a separate machine.

 I have moved from named nodes to ip addresses with negligible improvement.
 The only errors in the logfiles are regarding flushing for log4j so I did
 not consider that to be relevant.

 If anyone has seen this or has any ideas where I might find the source of
 my issues I would be grateful.

 Regards
 Damien

 # cat hadoop-site.xml
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration

  property
namemapred.task.timeout/name
value6000/value
  descriptionThe number of milliseconds before a task will be
  terminated if it neither reads an input, writes an output, nor
  updates its status string.
  /description
 /property

 property
namefs.default.name/name
valuehdfs://10.7.3.164:54130//value
  /property

   property
namedfs.replication/name
value1/value
  /property

   property
namehadoop.logfile.size/name
value100/value
  /property

   property
namehadoop.logfile.count/name
value2/value
  /property

   property
nameio.sort.mb/name
value25/value
  /property

  property
namedfs.block.size/name
value8388608/value
  /property

  property
namedfs.namenode.handler.count/name
value5/value
  /property

  property
namemapred.job.tracker/name
value10.7.3.164:54131/value
  /property

   property
namemapred.job.tracker.handler.count/name
value3/value
  /property

  property
namemapred.tasktracker.map.tasks.maximum/name
value2/value
  /property

  property
namemapred.tasktracker.reduce.tasks.maximum/name
value2/value
  /property

   property
namemapred.child.java.opts/name
value-Xmx128m/value
  /property

  property
namemapred.map.tasks.speculative.execution/name
valuefalse/value
  /property

  property
namemapred.reduce.tasks.speculative.execution/name
valuefalse/value
  /property

  property
namemapred.submit.replication/name
value1/value
  /property

  property
nametasktracker.http.threads/name
value4/value
  /property

 /configuration





-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: Issue in reduce phase with SortedMapWritable and custom Writables as values

2008-09-11 Thread Shengkai Zhu
AFAIK, tasktracker will load your job archive automatically while running
the map/reduce task.

On Tue, Sep 9, 2008 at 10:28 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:

 Based on some similar problems that I found others were having in the
 mailing lists, it looks like the solution was to list my Map/Reduce
 job JAR In the conf/hadoop-env.sh file under HADOOP_CLASSPATH. After
 doing that and re-submitting the job, it all worked fine! I guess the
 MapWritable class somehow doesn't share the same classpath as the
 program that actually submits the job conf. Is this expected?

 Thanks,
 Ryan


 On Tue, Sep 9, 2008 at 9:44 AM, Ryan LeCompte [EMAIL PROTECTED] wrote:
  Okay, I think I'm getting closer but now I'm running into another
 problem.
 
  First off, I created my own CustomMapWritable that extends MapWritable
  and invokes AbstractMapWritable.addToMap() to add my custom classes.
  Now the map/reduce phases actually complete and the job as a whole
  completes. However, when I try to use the SequenceFile API to later
  read the output data, I'm getting a strange exception. First the code:
 
  FileSystem fileSys = FileSystem.get(conf);
  SequenceFile.Reader reader = new SequenceFile.Reader(fileSys, inFile,
  conf);
  Text key = new Text();
  CustomWritable stats = new CustomWritable();
  reader.next(key, stats);
  reader.close();
 
  And now the exception that's thrown:
 
  java.io.IOException: can't find class: com.test.CustomStatsWritable
  because com.test.CustomStatsWritable
 at
 org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:210)
 at
 org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:145)
 at
 com.test.CustomStatsWritable.readFields(UserStatsWritable.java:49)
 at
 org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
 at
 org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
  ...
 
  Any ideas?
 
  Thanks,
  Ryan
 
 
  On Tue, Sep 9, 2008 at 12:36 AM, Ryan LeCompte [EMAIL PROTECTED]
 wrote:
  Hello,
 
  I'm attempting to use a SortedMapWritable with a LongWritable as the
  key and a custom implementation of org.apache.hadoop.io.Writable as
  the value. I notice that my program works fine when I use another
  primitive wrapper (e.g. Text) as the value, but fails with the
  following exception when I use my custom Writable instance:
 
  2008-09-08 23:25:02,072 INFO org.apache.hadoop.mapred.ReduceTask:
  Initiating in-memory merge with 1 segments...
  2008-09-08 23:25:02,077 INFO org.apache.hadoop.mapred.Merger: Merging
  1 sorted segments
  2008-09-08 23:25:02,077 INFO org.apache.hadoop.mapred.Merger: Down to
  the last merge-pass, with 1 segments left of total size: 5492 bytes
  2008-09-08 23:25:02,099 WARN org.apache.hadoop.mapred.ReduceTask:
  attempt_200809082247_0005_r_00_0 Merge of the inmemory files threw
  a
  n exception: java.io.IOException: Intermedate merge failed
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2133)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2064)
  Caused by: java.lang.RuntimeException: java.lang.NullPointerException
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:80)
 at
 org.apache.hadoop.io.SortedMapWritable.readFields(SortedMapWritable.java:179)
 ...
 
  I noticed that the AbstractMapWritable class has a protected
  addToMap(Class clazz) method. Do I somehow need to let my
  SortedMapWritable instance know about my custom Writable value? I've
  properly implemented the custom Writable object (it just contains a
  few primitives, like longs and ints).
 
  Any insight is appreciated.
 
  Thanks,
  Ryan
 
 




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: number of tasks on a node.

2008-09-09 Thread Shengkai Zhu
You can monitor the task or node status through web pages provided.

On Wed, Sep 10, 2008 at 11:24 AM, Edward J. Yoon [EMAIL PROTECTED]wrote:

 TaskTrackers are communicates with JobTracker so I guess you can
 handle it via JobTracker.

 -Edward

 On Wed, Sep 10, 2008 at 12:03 PM, Dmitry Pushkarev [EMAIL PROTECTED]
 wrote:
  Hi.
 
 
 
  How can node find out how many task are being run on it at a given time?
 
  I want tasktracer nodes (which are assigned from amazon EC) to shutdown
 if
  nothing is being run for some period of time, but don't yet see right way
 of
  implementing this.
 
 



 --
 Best regards, Edward J. Yoon
 [EMAIL PROTECTED]
 http://blog.udanax.org




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: Failing MR jobs!

2008-09-08 Thread Shengkai Zhu
Do you have some more detailed information? Logs are helpful.

On Mon, Sep 8, 2008 at 3:26 AM, Erik Holstad [EMAIL PROTECTED] wrote:

 Hi!
 I'm trying to run a MR job, but it keeps on failing and I can't understand
 why.
 Sometimes it shows output at 66% and sometimes 98% or so.
 I had a couple of exception before that I didn't catch that made the job to
 fail.


 The log file from the task can be found at:
 http://pastebin.com/m4414d369


 and the code looks like:
 //Java
 import java.io.*;
 import java.util.*;
 import java.net.*;

 //Hadoop
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
 import org.apache.hadoop.io.*;
 import org.apache.hadoop.mapred.*;
 import org.apache.hadoop.util.*;

 //HBase
 import org.apache.hadoop.hbase.*;
 import org.apache.hadoop.hbase.mapred.*;
 import org.apache.hadoop.hbase.io.*;
 import org.apache.hadoop.hbase.client.*;
 // org.apache.hadoop.hbase.client.HTable

 //Extra
 import org.apache.commons.cli.ParseException;

 import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
 import org.apache.commons.httpclient.*;
 import org.apache.commons.httpclient.methods.*;
 import org.apache.commons.httpclient.params.HttpMethodParams;


 public class SerpentMR1 extends TableMap implements Mapper, Tool {

//Setting DebugLevel
private static final int DL = 0;

//Setting up the variables for the MR job
private static final String NAME = SerpentMR1;
private static final String INPUTTABLE = sources;
private final String[] COLS = {content:feedurl, content:ttl,
 content:updated};


private Configuration conf;

public JobConf createSubmittableJob(String[] args) throws IOException{
JobConf c = new JobConf(getConf(), SerpentMR1.class);
String jar = /home/hbase/SerpentMR/ +NAME+.jar;
c.setJar(jar);
c.setJobName(NAME);

int mapTasks = 4;
int reduceTasks = 20;

c.setNumMapTasks(mapTasks);
c.setNumReduceTasks(reduceTasks);

String inputCols = ;
for (int i=0; iCOLS.length; i++){inputCols += COLS[i] +  ; }

TableMap.initJob(INPUTTABLE, inputCols, this.getClass(), Text.class,
 BytesWritable.class, c);
//Classes between:

c.setOutputFormat(TextOutputFormat.class);
Path path = new Path(users); //inserting into a temp table
FileOutputFormat.setOutputPath(c, path);

c.setReducerClass(MyReducer.class);
return c;
}

public void map(ImmutableBytesWritable key, RowResult res,
 OutputCollector output, Reporter reporter)
throws IOException {
Cell cellLast= res.get(COLS[2].getBytes());//lastupdate

long oldTime = cellLast.getTimestamp();

Cell cell_ttl= res.get(COLS[1].getBytes());//ttl
long ttl = StreamyUtil.BytesToLong(cell_ttl.getValue() );
byte[] url = null;

long currTime = time.GetTimeInMillis();

if(currTime - oldTime  ttl){
url = res.get(COLS[0].getBytes()).getValue();//url
output.collect(new Text(Base64.encode_strip(res.getRow())), new
 BytesWritable(url) );/
}
}



public static class MyReducer implements Reducer{
 //org.apache.hadoop.mapred.Reducer{


private int timeout = 1000; //Sets the connection timeout time ms;

public void reduce(Object key, Iterator values, OutputCollector
 output, Reporter rep)
throws IOException {
HttpClient client = new HttpClient();//new
 MultiThreadedHttpConnectionManager());
client.getHttpConnectionManager().
getParams().setConnectionTimeout(timeout);

GetMethod method = null;

int stat = 0;
String content = ;
byte[] colFam = select.getBytes();
byte[] column = lastupdate.getBytes();
byte[] currTime = null;

HBaseRef hbref = new HBaseRef();
JerlType sendjerl = null; //new JerlType();
ArrayList jd = new ArrayList();

InputStream is = null;

while(values.hasNext()){
BytesWritable bw = (BytesWritable)values.next();

String address = new String(bw.get());
try{
System.out.println(address);

method = new GetMethod(address);
method.setFollowRedirects(true);

} catch (Exception e){
System.err.println(Invalid Address);
e.printStackTrace();
}

if (method != null){
try {
// Execute the method.
stat = client.executeMethod(method);

if(stat == 200){
content = ;
is =
 (InputStream)(method.getResponseBodyAsStream());

//Write to HBase new stamp 

Re: Failing MR jobs!

2008-09-08 Thread Shengkai Zhu
Sorry, I didn't see the log link.

On Tue, Sep 9, 2008 at 12:01 PM, Shengkai Zhu [EMAIL PROTECTED] wrote:


 Do you have some more detailed information? Logs are helpful.


 On Mon, Sep 8, 2008 at 3:26 AM, Erik Holstad [EMAIL PROTECTED]wrote:

 Hi!
 I'm trying to run a MR job, but it keeps on failing and I can't understand
 why.
 Sometimes it shows output at 66% and sometimes 98% or so.
 I had a couple of exception before that I didn't catch that made the job
 to
 fail.


 The log file from the task can be found at:
 http://pastebin.com/m4414d369


 and the code looks like:
 //Java
 import java.io.*;
 import java.util.*;
 import java.net.*;

 //Hadoop
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
 import org.apache.hadoop.io.*;
 import org.apache.hadoop.mapred.*;
 import org.apache.hadoop.util.*;

 //HBase
 import org.apache.hadoop.hbase.*;
 import org.apache.hadoop.hbase.mapred.*;
 import org.apache.hadoop.hbase.io.*;
 import org.apache.hadoop.hbase.client.*;
 // org.apache.hadoop.hbase.client.HTable

 //Extra
 import org.apache.commons.cli.ParseException;

 import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
 import org.apache.commons.httpclient.*;
 import org.apache.commons.httpclient.methods.*;
 import org.apache.commons.httpclient.params.HttpMethodParams;


 public class SerpentMR1 extends TableMap implements Mapper, Tool {

//Setting DebugLevel
private static final int DL = 0;

//Setting up the variables for the MR job
private static final String NAME = SerpentMR1;
private static final String INPUTTABLE = sources;
private final String[] COLS = {content:feedurl, content:ttl,
 content:updated};


private Configuration conf;

public JobConf createSubmittableJob(String[] args) throws IOException{
JobConf c = new JobConf(getConf(), SerpentMR1.class);
String jar = /home/hbase/SerpentMR/ +NAME+.jar;
c.setJar(jar);
c.setJobName(NAME);

int mapTasks = 4;
int reduceTasks = 20;

c.setNumMapTasks(mapTasks);
c.setNumReduceTasks(reduceTasks);

String inputCols = ;
for (int i=0; iCOLS.length; i++){inputCols += COLS[i] +  ; }

TableMap.initJob(INPUTTABLE, inputCols, this.getClass(),
 Text.class,
 BytesWritable.class, c);
//Classes between:

c.setOutputFormat(TextOutputFormat.class);
Path path = new Path(users); //inserting into a temp table
FileOutputFormat.setOutputPath(c, path);

c.setReducerClass(MyReducer.class);
return c;
}

public void map(ImmutableBytesWritable key, RowResult res,
 OutputCollector output, Reporter reporter)
throws IOException {
Cell cellLast= res.get(COLS[2].getBytes());//lastupdate

long oldTime = cellLast.getTimestamp();

Cell cell_ttl= res.get(COLS[1].getBytes());//ttl
long ttl = StreamyUtil.BytesToLong(cell_ttl.getValue() );
byte[] url = null;

long currTime = time.GetTimeInMillis();

if(currTime - oldTime  ttl){
url = res.get(COLS[0].getBytes()).getValue();//url
output.collect(new Text(Base64.encode_strip(res.getRow())), new
 BytesWritable(url) );/
}
}



public static class MyReducer implements Reducer{
 //org.apache.hadoop.mapred.Reducer{


private int timeout = 1000; //Sets the connection timeout time ms;

public void reduce(Object key, Iterator values, OutputCollector
 output, Reporter rep)
throws IOException {
HttpClient client = new HttpClient();//new
 MultiThreadedHttpConnectionManager());
client.getHttpConnectionManager().
getParams().setConnectionTimeout(timeout);

GetMethod method = null;

int stat = 0;
String content = ;
byte[] colFam = select.getBytes();
byte[] column = lastupdate.getBytes();
byte[] currTime = null;

HBaseRef hbref = new HBaseRef();
JerlType sendjerl = null; //new JerlType();
ArrayList jd = new ArrayList();

InputStream is = null;

while(values.hasNext()){
BytesWritable bw = (BytesWritable)values.next();

String address = new String(bw.get());
try{
System.out.println(address);

method = new GetMethod(address);
method.setFollowRedirects(true);

} catch (Exception e){
System.err.println(Invalid Address);
e.printStackTrace();
}

if (method != null){
try {
// Execute the method.
stat = client.executeMethod(method);

if(stat == 200){
content

Re: How to control the map and reduce step sequentially

2008-07-28 Thread Shengkai Zhu
The real reduce logic is actually started when all map tasks are finished.

Is it still unexpected?


On 7/28/08, 晋光峰 [EMAIL PROTECTED] wrote:

 Dear All,

 When i using Hadoop, I noticed that the reducer step is started immediately
 when the mappers are still running. According to my project requirement,
 the
 reducer step should not start until all the mappers finish their execution.
 Anybody knows how to use some Hadoop API to achieve this? When all the
 mappers finish their process, then the reducer is started.

 Thanks
 --
 Guangfeng Jin




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: Reg: Problem in Build Versions of Hadoop-0.17.0

2008-07-16 Thread Shengkai Zhu
Replace the hadoop-*-core.jar in datanodes with your jar compiled under
jobs


On 7/16/08, chaitanya krishna [EMAIL PROTECTED] wrote:

 Hi,

 I'm using hadoop-0.17.0 and recently, when i stopped and restarted dfs,
 the datanodes are being created and soon, they r not present. the logs of
 namenode shows the following error:
 /
 SHUTDOWN_MSG: Shutting down NameNode at 172.16.45.171/172.16.45.171
 /
 2008-07-16 10:13:34,885 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = 172.16.45.171/172.16.45.171
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.17.1-dev
 STARTUP_MSG:   build =  -r ; compiled by 'jobs' on Wed Jul 16 10:13:11 IST
 2008
 /
 2008-07-16 10:13:34,973 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=NameNode, port=9000
 2008-07-16 10:13:35,011 INFO org.apache.hadoop.dfs.NameNode: Namenode up
 at:
 172.16.45.171/172.16.45.171:9000
 2008-07-16 10:13:35,015 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=NameNode, sessionId=null
 2008-07-16 10:13:35,017 INFO org.apache.hadoop.dfs.NameNodeMetrics:
 Initializing NameNodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2008-07-16 10:13:35,060 INFO org.apache.hadoop.fs.FSNamesystem:
 fsOwner=jobs,jobs
 2008-07-16 10:13:35,060 INFO org.apache.hadoop.fs.FSNamesystem:
 supergroup=supergroup
 2008-07-16 10:13:35,060 INFO org.apache.hadoop.fs.FSNamesystem:
 isPermissionEnabled=true
 2008-07-16 10:14:44,869 INFO org.apache.hadoop.fs.FSNamesystem: Finished
 loading FSImage in 69827 msecs
 2008-07-16 10:14:44,870 INFO org.apache.hadoop.dfs.StateChange: STATE*
 Leaving safe mode after 69 secs.
 2008-07-16 10:14:44,871 INFO org.apache.hadoop.dfs.StateChange: STATE*
 Network topology has 0 racks and 0 datanodes
 2008-07-16 10:14:44,871 INFO org.apache.hadoop.dfs.StateChange: STATE*
 UnderReplicatedBlocks has 0 blocks
 2008-07-16 10:14:44,877 INFO org.apache.hadoop.fs.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2008-07-16 10:14:44,930 INFO org.mortbay.util.Credential: Checking Resource
 aliases
 2008-07-16 10:14:44,993 INFO org.mortbay.http.HttpServer: Version
 Jetty/5.1.4
 2008-07-16 10:14:44,993 INFO org.mortbay.util.Container: Started
 HttpContext[/static,/static]
 2008-07-16 10:14:44,993 INFO org.mortbay.util.Container: Started
 HttpContext[/logs,/logs]
 2008-07-16 10:14:45,190 INFO org.mortbay.util.Container: Started
 [EMAIL PROTECTED]
 2008-07-16 10:14:45,219 INFO org.mortbay.util.Container: Started
 WebApplicationContext[/,/]
 2008-07-16 10:14:45,221 INFO org.mortbay.http.SocketListener: Started
 SocketListener on 0.0.0.0:50070
 2008-07-16 10:14:45,221 INFO org.mortbay.util.Container: Started
 [EMAIL PROTECTED]
 2008-07-16 10:14:45,221 INFO org.apache.hadoop.fs.FSNamesystem: Web-server
 up at: 0.0.0.0:50070
 2008-07-16 10:14:45,307 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.174:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:45,308 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.178:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:45,308 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.176:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:45,308 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.177:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:45,309 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.172:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:45,309 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.175:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:45,309 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.179:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:45,311 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.173:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:45,341 INFO org.apache.hadoop.dfs.NameNode: Error report
 from 172.16.45.180:50010: Incompatible build versions: namenode BV = ;
 datanode BV = 656523
 2008-07-16 10:14:49,224 INFO org.apache.hadoop.fs.FSNamesystem: Number of
 transactions: 1 Total time for transactions(ms): 1 Number of syncs: 0
 SyncTimes(ms): 0
 2008-07-16 10:19:45,726 INFO org.apache.hadoop.fs.FSNamesystem: Roll Edit
 Log from 172.16.45.171
 2008-07-16 10:19:45,726 INFO org.apache.hadoop.fs.FSNamesystem: Number of
 transactions: 3 

Re: Why is the task run in a child JVM?

2008-07-14 Thread Shengkai Zhu
Well, I got it.


On 7/14/08, Jason Venner [EMAIL PROTECTED] wrote:

 One benefit is that if your map or reduce behaves badly it can't take down
 the task tracker.

 In our case we have some poorly behaved external native libraries we use,
 and we have to forcibly ensure that the child vms are killed when the child
 main finishes (often by kill -9), so the fact the child (task) is a separate
 jvm process is very helpful.

 The downside is the jvm start time. Has anyone experimented with the jar
 freezing for more than the standard boot class path jars to speed up
 startup?


 Shengkai Zhu wrote:

 What's the benefits from such design compared to multi-thread?



 --
 Jason Venner
 Attributor - Program the Web http://www.attributor.com/
 Attributor is hiring Hadoop Wranglers and coding wizards, contact if
 interested




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: java.io.IOException: All datanodes are bad. Aborting...

2008-07-11 Thread Shengkai Zhu
Did you do the clean work on all the datanodes?
rm -Rf /path/to/my/hadoop/dfs/data


On 6/20/08, novice user [EMAIL PROTECTED] wrote:


 Hi Mori Bellamy,
 I did this twice.  and still the same problem is persisting. I don't know
 how to solve this issue. If any one know the answer, please let me know.

 Thanks

 Mori Bellamy wrote:
 
  That's bizarre. I'm not sure why your DFS would have magically gotten
  full. Whenever hadoop gives me trouble, i try the following sequence
  of commands
 
  stop-all.sh
  rm -Rf /path/to/my/hadoop/dfs/data
  hadoop namenode -format
  start-all.sh
 
  maybe you would get some luck if you ran that on all of the machines?
  (of course, don't run it if you don't want to lose all of that data)
  On Jun 19, 2008, at 4:32 AM, novice user wrote:
 
 
  Hi Every one,
  I am running a simple map-red application similar to k-means. But,
  when I
  ran it in on single machine, it went fine with out any issues. But,
  when I
  ran the same on a hadoop cluster of 9 machines. It fails saying
  java.io.IOException: All datanodes are bad. Aborting...
 
  Here is more explanation about the problem:
  I tried to upgrade my hadoop cluster to hadoop-17. During this
  process, I
  made a mistake of not installing hadoop on all machines. So, the
  upgrade
  failed. Nor I was able to roll back.  So, I re-formatted the name node
  afresh. and then hadoop installation was successful.
 
  Later, when I ran my map-reduce job, it ran successfully,but  the
  same job
  with zero reduce tasks is failing with the error as:
  java.io.IOException: All datanodes  are bad. Aborting...
 
  When I looked into the data nodes, I figured out that file system is
  100%
  full with different directories of name subdir in
  hadoop-username/dfs/data/current directory. I am wondering where I
  went
  wrong.
  Can some one please help me on this?
 
  The same job went fine on a single machine with same amount of input
  data.
 
  Thanks
 
 
 
  --
  View this message in context:
 
 http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18006296.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18022330.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: dfs copyFromLocal/put fails

2008-07-11 Thread Shengkai Zhu
Firewall problem or you need entries into /etc/hosts


On 6/18/08, Alexander Arimond [EMAIL PROTECTED] wrote:

 hi,

 i'm new in hadoop and im just testing it at the moment.
 i set up a cluster with 2 nodes and it seems like they are running
 normally,
 the log files of the namenode and the datanodes dont show errors.
 Firewall should be set right.
 but when i try to upload a file to the dfs i get following message:

 [EMAIL PROTECTED]:~/hadoop$ bin/hadoop dfs -put file.txt file.txt
 08/06/12 14:44:19 INFO dfs.DFSClient: Exception in
 createBlockOutputStream java.net.ConnectException: Connection refused
 08/06/12 14:44:19 INFO dfs.DFSClient: Abandoning block
 blk_5837981856060447217
 08/06/12 14:44:28 INFO dfs.DFSClient: Exception in
 createBlockOutputStream java.net.ConnectException: Connection refused
 08/06/12 14:44:28 INFO dfs.DFSClient: Abandoning block
 blk_2573458924311304120
 08/06/12 14:44:37 INFO dfs.DFSClient: Exception in
 createBlockOutputStream java.net.ConnectException: Connection refused
 08/06/12 14:44:37 INFO dfs.DFSClient: Abandoning block
 blk_1207459436305221119
 08/06/12 14:44:46 INFO dfs.DFSClient: Exception in
 createBlockOutputStream java.net.ConnectException: Connection refused
 08/06/12 14:44:46 INFO dfs.DFSClient: Abandoning block
 blk_-8263828216969765661
 08/06/12 14:44:52 WARN dfs.DFSClient: DataStreamer Exception:
 java.io.IOException: Unable to create new block.
 08/06/12 14:44:52 WARN dfs.DFSClient: Error Recovery for block
 blk_-8263828216969765661 bad datanode[0]


 dont know what that means and didnt found something about that..
 Hope somebody can help with that.

 Thank you!




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: JobClient question

2008-07-11 Thread Shengkai Zhu
You should provide JobTracker address and port through configuration.


On 7/11/08, Larry Compton [EMAIL PROTECTED] wrote:

 I'm coming up to speed on the Hadoop APIs. I need to be able to invoke a
 job
 from within a Java application (as opposed to running from the command-line
 hadoop executable). The JobConf and JobClient appear to support this and
 I've written a test program to configure and run a job. However, the job
 doesn't appear to be submitted to the JobTracker. Here's a code excerpt
 from
 my client...

String rdfInputPath = args[0];
String outputPath = args[1];
String uriInputPath = args[2];
String jarPath = args[3];

JobConf conf = new JobConf(MaterializeMap.class);
conf.setJobName(materialize);

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);

conf.setMapperClass(MaterializeMapper.class);
conf.setCombinerClass(MaterializeReducer.class);
conf.setReducerClass(MaterializeReducer.class);
conf.setJar(jarPath);

DistributedCache.addCacheFile(new Path(uriInputPath).toUri(), conf);

FileInputFormat.setInputPaths(conf, new Path(rdfInputPath));
FileOutputFormat.setOutputPath(conf, new Path(outputPath));

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

JobClient.runJob(conf);

 It seems like I should be providing a URL to the JobTracker somewhere, but
 I
 can't figure out where to provide the information.

 --
 Larry Compton




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: JobClient question

2008-07-11 Thread Shengkai Zhu
Yes, you have already invoke he submitJob method through RPC.

But you have no configuration to describe your hadoop system dir, which is
default set /tmp/hadoop/mapred/system
So in your client program, it saved job.xml under the default dir.

But in JobTracker, your configuration make the system
dir /home/larry/pkg/hadoop/hdfs/mapred/system/
So you can't find the job.xml under it.


On 7/11/08, Larry Compton [EMAIL PROTECTED] wrote:

 Thanks. Is this the correct syntax?

 conf.set(mapred.job.tracker, localhost:54311);

 It does appear to be communicating with the JobTracker now, but I get the
 following stack trace. Is there anything else that needs to be done to
 configure the job?

 Exception in thread main org.apache.hadoop.ipc.RemoteException:
 java.io.IOException:
 /home/larry/pkg/hadoop/hdfs/mapred/system/job_20080714_0001/job.xml: No
 such file or directory
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

at org.apache.hadoop.ipc.Client.call(Client.java:557)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at $Proxy0.submitJob(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at

 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
at jobclient.MaterializeMain.main(MaterializeMain.java:44)


 On Fri, Jul 11, 2008 at 11:41 AM, Shengkai Zhu [EMAIL PROTECTED]
 wrote:

  You should provide JobTracker address and port through configuration.
 
 
  On 7/11/08, Larry Compton [EMAIL PROTECTED] wrote:
  
   I'm coming up to speed on the Hadoop APIs. I need to be able to invoke
 a
   job
   from within a Java application (as opposed to running from the
  command-line
   hadoop executable). The JobConf and JobClient appear to support this
  and
   I've written a test program to configure and run a job. However, the
 job
   doesn't appear to be submitted to the JobTracker. Here's a code excerpt
   from
   my client...
  
  String rdfInputPath = args[0];
  String outputPath = args[1];
  String uriInputPath = args[2];
  String jarPath = args[3];
  
  JobConf conf = new JobConf(MaterializeMap.class);
  conf.setJobName(materialize);
  
  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(Text.class);
  
  conf.setMapperClass(MaterializeMapper.class);
  conf.setCombinerClass(MaterializeReducer.class);
  conf.setReducerClass(MaterializeReducer.class);
  conf.setJar(jarPath);
  
  DistributedCache.addCacheFile(new Path(uriInputPath).toUri(),
  conf);
  
  FileInputFormat.setInputPaths(conf, new Path(rdfInputPath));
  FileOutputFormat.setOutputPath(conf, new Path(outputPath));
  
  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);
  
  JobClient.runJob(conf);
  
   It seems like I should be providing a URL to the JobTracker somewhere,
  but
   I
   can't figure out where to provide the information.
  
   --
   Larry Compton
  
 
 
 
  --
 
  朱盛凯
 
  Jash Zhu
 
  复旦大学软件学院
 
  Software School, Fudan University
 



 --
 Larry Compton




-- 

朱盛凯

Jash Zhu

复旦大学软件学院

Software School, Fudan University


Re: Cannot get passwordless ssh to work right

2008-07-10 Thread Shengkai Zhu
You should chmod ssh directory and authorized_keys of the *
datanode/tasktracker* instead of jobtracker.

On 7/11/08, Jim Lowell [EMAIL PROTECTED] wrote:

 I'm trying to get a 2-node Hadoop cluster up and running on Ubuntu. I've
 already gotten both nodes to run Hadoop as single-node following the
 excellent instructions at
 http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)
 .

 Now I'm trying to convert them to a 2-node cluster and am failing because I
 cannot get passwordless ssh to work. Long story short, the slave can
 currently ssh to the master without being prompted for a password, but no
 matter what I do the master cannot ssh to the slave without being prompted
 for a password. Here's everything I've tried so far:

 1.   Uninstalled  installed openssh
 2.   Diffed sshd_config on both nodes, they have identical settings
 3.   Regenerated the RSA keys  repopulated them at least a dozen times
 4.   Compared ~/.ssh/authorized_keys on both systems and they match
 5.   Updated permissions with the following:
 . server$ chmod go-w ~/
 . server$ chmod 700 ~/.ssh
 . server$ chmod 600 ~/.ssh/authorized_keys
 6.   Run ssh -vvv on both systems and diffed the output. The output
 from both matches to this line:

 debug2: we sent a publickey packet, wait for reply

 On the slave (the ssh that works without a password prompt), the next line
 is this:

 debug1: Server accepts key: pkalg ssh-rsa blen 277

 On the master (the ssh that always prompts for a password), the next line
 is this:

 debug1: Authentications that can continue: publickey,password

 I'm not sure what this output means other than the server didn't accept the
 key for some reason.

 No matter what I do, I am always faced with a prompt for a password when
 trying to ssh from the master to the slave node. I'd like to compare a log
 file for the openssh server but I haven't figured out yet where that is
 located or even if it exists (I'm pretty new to Linux / Ubuntu).

 Any help would be appreciated. At this point, I'm probably going to nuke
 both boxes and reinstall Ubuntu from scratch and hope for the best. I'd like
 to avoid that and (more importantly) learn something from this experience.

 - Jim



Re: Version Mismatch when accessing hdfs through a nonhadoop java application?

2008-07-10 Thread Shengkai Zhu
I've check cod ed in DataNode.java, exactly where you get the error;

*...*
*DataInputStream in=null;*
*in = new DataInputStream(
new BufferedInputStream(s.getInputStream(), BUFFER_SIZE));
short version = in.readShort();
if ( version != DATA_TRANFER_VERSION ) {
 throw new IOException( Version Mismatch );
}*
*...*

May be useful for you.

On 7/11/08, Thibaut_ [EMAIL PROTECTED] wrote:


 Hi, I'm trying to access the hdfs of my hadoop cluster in a non hadoop
 application. Hadoop 0.17.1 is running on standart ports

 This is the code I use:

 FileSystem fileSystem = null;
String hdfsurl = hdfs://localhost:50010;
 fileSystem = new DistributedFileSystem();

try {
fileSystem.initialize(new URI(hdfsurl), new
 Configuration());
} catch (Exception e) {
e.printStackTrace();
System.out.println(init error:);
System.exit(1);

}


 which fails with the exception:


 java.net.SocketTimeoutException: timed out waiting for rpc response
at org.apache.hadoop.ipc.Client.call(Client.java:559)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313)
at
 org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:102)
at org.apache.hadoop.dfs.DFSClient.init(DFSClient.java:178)
at

 org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:68)
at
 com.iterend.spider.conf.Config.getRemoteFileSystem(Config.java:72)
at tests.RemoteFileSystemTest.main(RemoteFileSystemTest.java:22)
 init error:


 The haddop logfile contains the following error:

 2008-07-10 23:05:47,840 INFO org.apache.hadoop.dfs.Storage: Storage
 directory \hadoop\tmp\hadoop-sshd_server\dfs\data is not formatted.
 2008-07-10 23:05:47,840 INFO org.apache.hadoop.dfs.Storage: Formatting ...
 2008-07-10 23:05:47,928 INFO org.apache.hadoop.dfs.DataNode: Registered
 FSDatasetStatusMBean
 2008-07-10 23:05:47,929 INFO org.apache.hadoop.dfs.DataNode: Opened server
 at 50010
 2008-07-10 23:05:47,933 INFO org.apache.hadoop.dfs.DataNode: Balancing
 bandwith is 1048576 bytes/s
 2008-07-10 23:05:48,128 INFO org.mortbay.util.Credential: Checking Resource
 aliases
 2008-07-10 23:05:48,344 INFO org.mortbay.http.HttpServer: Version
 Jetty/5.1.4
 2008-07-10 23:05:48,346 INFO org.mortbay.util.Container: Started
 HttpContext[/static,/static]
 2008-07-10 23:05:48,346 INFO org.mortbay.util.Container: Started
 HttpContext[/logs,/logs]
 2008-07-10 23:05:49,047 INFO org.mortbay.util.Container: Started
 [EMAIL PROTECTED]
 2008-07-10 23:05:49,244 INFO org.mortbay.util.Container: Started
 WebApplicationContext[/,/]
 2008-07-10 23:05:49,247 INFO org.mortbay.http.SocketListener: Started
 SocketListener on 0.0.0.0:50075
 2008-07-10 23:05:49,247 INFO org.mortbay.util.Container: Started
 [EMAIL PROTECTED]
 2008-07-10 23:05:49,257 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=DataNode, sessionId=null
 2008-07-10 23:05:49,535 INFO org.apache.hadoop.dfs.DataNode: New storage id
 DS-2117780943-192.168.1.130-50010-1215723949510 is assigned to data-node
 127.0.0.1:50010
 2008-07-10 23:05:49,586 INFO org.apache.hadoop.dfs.DataNode:
 127.0.0.1:50010In DataNode.run, data =
 FSDataset{dirpath='c:\hadoop\tmp\hadoop-sshd_server\dfs\data\current'}
 2008-07-10 23:05:49,586 INFO org.apache.hadoop.dfs.DataNode: using
 BLOCKREPORT_INTERVAL of 360msec Initial delay: 6msec
 2008-07-10 23:06:04,636 INFO org.apache.hadoop.dfs.DataNode: BlockReport of
 0 blocks got processed in 11 msecs
 2008-07-10 23:19:54,512 ERROR org.apache.hadoop.dfs.DataNode:
 127.0.0.1:50010:DataXceiver: java.io.IOException: Version Mismatch
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:961)
at java.lang.Thread.run(Thread.java:619)


 Any ideas how I can fix this? The haddop cluster and my application are
 both
 using the same hadoop jar!

 Thanks for your help,
 Thibaut
 --
 View this message in context:
 http://www.nabble.com/Version-Mismatch-when-accessing-hdfs-through-a-nonhadoop-java-application--tp18392343p18392343.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Failed to repeat the Quickstart guide for Pseudo-distributed operation

2008-07-08 Thread Shengkai Zhu
After your formatting the namenode second time, your datanodes and namenode
may stay in inconsistency, namely, under imcompatible namespace.

On 7/2/08, Xuan Dzung Doan [EMAIL PROTECTED] wrote:

 I was exactly following the Hadoop 0.16.4 quickstart guide to run a
 Pseudo-distributed operation on my Fedora 8 machine. The first time I did
 it, everything ran successfully (formated a new hdfs, started hadoop
 daemons, then ran the grep example). A moment later, I decided to redo
 everything again. Reformating the hdfs and starting the daemons seemed to
 have no problem; but from the homepage of the namenode's web interface (
 http://localhost:50070/), when I clicked Browse the filesystem, it said
 the following:


 HTTP ERROR: 404
 /browseDirectory.jsp
 RequestURI=/browseDirectory.jsp
 Then when I tried to copy files to the hdfs to re-run the grep example, I
 couldn't with the following long list of exceptions (looks like some
 replication or block allocation issue):

 # bin/hadoop dfs -put conf input

 08/06/29 09:38:42 INFO dfs.DFSClient:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /user/root/input/hadoop-env.sh could only be replicated to 0 nodes, instead
 of 1
at
 org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)

at org.apache.hadoop.ipc.Client.call(Client.java:512)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2074)
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1967)
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:1487)
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1601)

 08/06/29 09:38:42 WARN dfs.DFSClient: NotReplicatedYetException sleeping
 /user/root/input/hadoop-env.sh retries left 4
 08/06/29 09:38:42 INFO dfs.DFSClient:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /user/root/input/hadoop-env.sh could only be replicated to 0 nodes, instead
 of 1
at
 org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)

at org.apache.hadoop.ipc.Client.call(Client.java:512)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2074)
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1967)
at
 

Re: Re: modified word count example

2008-07-08 Thread Shengkai Zhu
It's an example M-R application in Phoenix coded in C.
I've no idea whether there's a popular hadoop version for it and I ported it
into hadoop-style application.

FYI. Src attached.

On 7/9/08, heyongqiang [EMAIL PROTECTED] wrote:

 where i can find the Reverse-Index application?




 heyongqiang
 2008-07-09



 发件人: Shengkai Zhu
 发送时间: 2008-07-09 09:06:38
 收件人: core-user@hadoop.apache.org
 抄送:
 主题: Re: modified word count example

 Another Map Reduce application, Reverse-Index, behaviors similarly as you
 description.
 You can refer to that.


 On 7/9/08, heyongqiang  [EMAIL PROTECTED]  wrote:
 
  InputFormat's method RecordReader K, V  getRecordReader(InputSplit
 split,
  JobConf job, Reporter reporter) throws IOException; return a
 RecordReader.
  You can implement your own InputFormat and RecordReader:
  1)the RecorderReader remember the FileSplit(subclass of InputSplit) field
  in its class
  2) RecordReader's createValue() method always return the FileSplit's file
  field.
 
  hope this helps.
 
 
 
  heyongqiang
  2008-07-09
 
 
 
  发件人: Sandy
  发送时间: 2008-07-09 01:45:15
  收件人: core-user@hadoop.apache.org
  抄送:
  主题: modified word count example
 
  Hi,
 
  Let's say I want to run a map reduce job on a series of text files (let's
  say x.txt y.txt and z.txt)
 
  Given the following mapper function in python (from WordCount.py):
 
  class WordCountMap(Mapper, MapReduceBase):
 one = IntWritable(1) # removed
 def map(self, key, value, output, reporter):
 for w in value.toString().split():
 output.collect(Text(w), self.one) #how can I modify this line?
 
  Instead of creating pairs for each word found and the numeral one as the
  example is doing, is there a function I can invoke to store the name of
 the
  file it came from instead?
 
  thus, i'd have pairs like   water, x.txt hadoop, y.txt  
   hadoop,
  z.txt   etc.
 
  I took a look at javadoc, but i'm not sure if I've checked in the right
  places. Could someone point me in the right direction?
 
  Thanks!
 
  -SM
 



rindex.tar.gz
Description: GNU Zip compressed data