Re: using cascading fro map-reduce
Hi! If you are interested in Cascading I recommend you to ask on the Cascading mailing list or come ask in the irc channel. The mailing list can be found at the bottom left corner of www.cascading.org . Regards Erik
Re: Probelms getting Eclipse Hadoop plugin to work.
Hi guys! Thanks for your help, but still no luck, I did try to set it up on a different machine with Eclipse 3.2.2 and the IBM plugin instead of the Hadoop one, in that one I only needed to fill out the install directory and the host and that worked just fine. I have filled out the ports correctly and the cluster is up and running and works just fine. Regards Erik
Re: Probelms getting Eclipse Hadoop plugin to work.
Thanks guys! Running Linux and the remote cluster is also Linux. I have the properties set up like that already on my remote cluster, but not sure where to input this info into Eclipse. And when changing the ports to 9000 and 9001 I get: Error: java.io.IOException: Unknown protocol to job tracker: org.apache.hadoop.dfs.ClientProtocol Regards Erik
Re: Map/Recuce Job done locally?
Hey Philipp! MR jobs are run locally if you just run the java file, to get it running in distributed mode you need to create a job jar and run that like ./bin/hadoop jar ... Regards Erik
Re: Map/Recuce Job done locally?
Hey Philipp! Not sure about your time tracking thing, probably works, I've just used a bash script to start the jar and then you can do the timing in the script. About how to compile the jars, you need to include the dependencies too, but you will see what you are missing when you run the job. Regards Erik
Probelms getting Eclipse Hadoop plugin to work.
I'm using Eclipse 3.3.2 and want to view my remote cluster using the Hadoop plugin. Everything shows up and I can see the map/reduce perspective but when trying to connect to a location I get: Error: Call failed on local exception I've set the host to for example xx0, where xx0 is a remote machine accessible from the terminal, and the ports to 50020/50040 for M/R master and DFS master respectively. Is there anything I'm missing to set for remote access to the Hadoop cluster? Regards Erik
Redirecting the logs to remote log server?
Hi! I have been trying to get the logs from Hadoop to redirect to a remote log server. Tried to add the socket appender in the log4j.properties file in the conf directory and also to add commons.logging + log4j jars + the same log4j.properties file into the WEB-INF of the master but I still get nothing in the logs on the log server, what is it that i'm missing here? Regards Erik
Re: Cleaning up files in HDFS?
Hi! I thought that the trash function was only working for files that were already deleted and not for files that are to be deleted, but it would be nice if you could set it up to work on a specific directory. Erik On Fri, Nov 14, 2008 at 6:07 PM, lohit [EMAIL PROTECTED] wrote: Have you tried fs.trash.interval property namefs.trash.interval/name value0/value descriptionNumber of minutes between trash checkpoints. If zero, the trash feature is disabled. /description /property more info about trash feature here. http://hadoop.apache.org/core/docs/current/hdfs_design.html Thanks, Lohit - Original Message From: Erik Holstad [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Friday, November 14, 2008 5:08:03 PM Subject: Cleaning up files in HDFS? Hi! We would like to run a delete script that deletes all files older than x days that are stored in lib l in hdfs, what is the best way of doing that? Regards Erik
Cleaning up files in HDFS?
Hi! We would like to run a delete script that deletes all files older than x days that are stored in lib l in hdfs, what is the best way of doing that? Regards Erik
Re: Passing Constants from One Job to the Next
Hi! Is there a way of using the value read in the configure() in the Map or Reduce phase? Erik On Thu, Oct 23, 2008 at 2:40 AM, Aaron Kimball [EMAIL PROTECTED] wrote: See Configuration.setInt() in the API. (JobConf inherits from Configuration). You can read it back in the configure() method of your mappers/reducers - Aaron On Wed, Oct 22, 2008 at 3:03 PM, Yih Sun Khoo [EMAIL PROTECTED] wrote: Are you saying that I can pass, say, a single integer constant with either of these three: JobConf? A HDFS file? DistributedCache? Or are you asking if I can pass given the context of: JobConf? A HDFS file? DistributedCache? I'm thinking of how to pass a single int so from one Jobconf to the next On Wed, Oct 22, 2008 at 2:57 PM, Arun C Murthy [EMAIL PROTECTED] wrote: On Oct 22, 2008, at 2:52 PM, Yih Sun Khoo wrote: I like to hear some good ways of passing constants from one job to the next. Unless I'm missing something: JobConf? A HDFS file? DistributedCache? Arun These are some ways that I can think of: 1) The obvious solution is to carry the constant as part of your value from one job to the next, but that would mean every value would hold that constant 2) Use the reporter as a hack so that you can set the status message and then get the status message back when u need the constant Any other ideas? (Also please do not include code)
Re: How to change number of mappers in Hadoop streaming?
Hi Steve! I you can pass -jobconf mapred.map.tasks=$MAPPERS -jobconf mapred.reduce.tasks=$REDUCERS to the streaming job to set the number of reducers and mappers. Regards Erik On Wed, Oct 15, 2008 at 4:25 PM, Steve Gao [EMAIL PROTECTED] wrote: Is there a way to change number of mappers in Hadoop streaming command line? I know I can change hadoop-default.xml: property namemapred.map.tasks/name value10/value descriptionThe default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is local. /description /property But that's for all jobs. What if I just want each job has different NUM_OF_Mappers themselves? Thanks
Failing MR jobs!
Hi! I'm trying to run a MR job, but it keeps on failing and I can't understand why. Sometimes it shows output at 66% and sometimes 98% or so. I had a couple of exception before that I didn't catch that made the job to fail. The log file from the task can be found at: http://pastebin.com/m4414d369 and the code looks like: //Java import java.io.*; import java.util.*; import java.net.*; //Hadoop import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; //HBase import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.mapred.*; import org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.client.*; // org.apache.hadoop.hbase.client.HTable //Extra import org.apache.commons.cli.ParseException; import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager; import org.apache.commons.httpclient.*; import org.apache.commons.httpclient.methods.*; import org.apache.commons.httpclient.params.HttpMethodParams; public class SerpentMR1 extends TableMap implements Mapper, Tool { //Setting DebugLevel private static final int DL = 0; //Setting up the variables for the MR job private static final String NAME = SerpentMR1; private static final String INPUTTABLE = sources; private final String[] COLS = {content:feedurl, content:ttl, content:updated}; private Configuration conf; public JobConf createSubmittableJob(String[] args) throws IOException{ JobConf c = new JobConf(getConf(), SerpentMR1.class); String jar = /home/hbase/SerpentMR/ +NAME+.jar; c.setJar(jar); c.setJobName(NAME); int mapTasks = 4; int reduceTasks = 20; c.setNumMapTasks(mapTasks); c.setNumReduceTasks(reduceTasks); String inputCols = ; for (int i=0; iCOLS.length; i++){inputCols += COLS[i] + ; } TableMap.initJob(INPUTTABLE, inputCols, this.getClass(), Text.class, BytesWritable.class, c); //Classes between: c.setOutputFormat(TextOutputFormat.class); Path path = new Path(users); //inserting into a temp table FileOutputFormat.setOutputPath(c, path); c.setReducerClass(MyReducer.class); return c; } public void map(ImmutableBytesWritable key, RowResult res, OutputCollector output, Reporter reporter) throws IOException { Cell cellLast= res.get(COLS[2].getBytes());//lastupdate long oldTime = cellLast.getTimestamp(); Cell cell_ttl= res.get(COLS[1].getBytes());//ttl long ttl = StreamyUtil.BytesToLong(cell_ttl.getValue() ); byte[] url = null; long currTime = time.GetTimeInMillis(); if(currTime - oldTime ttl){ url = res.get(COLS[0].getBytes()).getValue();//url output.collect(new Text(Base64.encode_strip(res.getRow())), new BytesWritable(url) );/ } } public static class MyReducer implements Reducer{ //org.apache.hadoop.mapred.Reducer{ private int timeout = 1000; //Sets the connection timeout time ms; public void reduce(Object key, Iterator values, OutputCollector output, Reporter rep) throws IOException { HttpClient client = new HttpClient();//new MultiThreadedHttpConnectionManager()); client.getHttpConnectionManager(). getParams().setConnectionTimeout(timeout); GetMethod method = null; int stat = 0; String content = ; byte[] colFam = select.getBytes(); byte[] column = lastupdate.getBytes(); byte[] currTime = null; HBaseRef hbref = new HBaseRef(); JerlType sendjerl = null; //new JerlType(); ArrayList jd = new ArrayList(); InputStream is = null; while(values.hasNext()){ BytesWritable bw = (BytesWritable)values.next(); String address = new String(bw.get()); try{ System.out.println(address); method = new GetMethod(address); method.setFollowRedirects(true); } catch (Exception e){ System.err.println(Invalid Address); e.printStackTrace(); } if (method != null){ try { // Execute the method. stat = client.executeMethod(method); if(stat == 200){ content = ; is = (InputStream)(method.getResponseBodyAsStream()); //Write to HBase new stamp select:lastupdate currTime = StreamyUtil.LongToBytes(time.GetTimeInMillis() ); jd.add(new
Trying to write to HDFS from mapreduce.
Hi! I'm writing a mapreduce job where I want the output from the mapper to go strait to the HDFS without passing the reduce method. Have been told that I can do: c.setOutputFormat(TextOutputFormat.class); also added Path path = new Path(user); FileOutputFormat.setOutputPath(c, path); But I still ended up with the result in the local filesystem instead. Regards Erik