Re: using cascading fro map-reduce

2009-04-08 Thread Erik Holstad
Hi!
If you are interested in Cascading I recommend you to ask on the Cascading
mailing list or come ask in the irc channel.
The mailing list can be found at the bottom left corner of www.cascading.org
.

Regards Erik


Re: Probelms getting Eclipse Hadoop plugin to work.

2009-02-20 Thread Erik Holstad
Hi guys!
Thanks for your help, but still no luck, I did try to set it up on a
different machine with Eclipse 3.2.2 and the
IBM plugin instead of the Hadoop one, in that one I only needed to fill out
the install directory and the host
and that worked just fine.
I have filled out the ports correctly and the cluster is up and running and
works just fine.

Regards Erik


Re: Probelms getting Eclipse Hadoop plugin to work.

2009-02-19 Thread Erik Holstad
Thanks guys!
Running Linux and the remote cluster is also Linux.
I have the properties set up like that already on my remote cluster, but
not sure where to input this info into Eclipse.
And when changing the ports to 9000 and 9001 I get:

Error: java.io.IOException: Unknown protocol to job tracker:
org.apache.hadoop.dfs.ClientProtocol

Regards Erik


Re: Map/Recuce Job done locally?

2009-02-19 Thread Erik Holstad
Hey Philipp!
MR jobs are run locally if you just run the java file, to get it running in
distributed mode
you need to create a job jar and run that like ./bin/hadoop jar ...

Regards Erik


Re: Map/Recuce Job done locally?

2009-02-19 Thread Erik Holstad
Hey Philipp!
Not sure about your time tracking thing, probably works, I've just used a
bash script
to start the jar and then you can do the timing in the script.
About how to compile the jars, you need to include the dependencies too, but
you will see what you are missing when you run the job.

Regards Erik


Probelms getting Eclipse Hadoop plugin to work.

2009-02-18 Thread Erik Holstad
I'm using Eclipse 3.3.2 and want to view my remote cluster using the Hadoop
plugin.
Everything shows up and I can see the map/reduce perspective but when trying
to
connect to a location I get:
Error: Call failed on local exception

I've set the host to for example xx0, where xx0 is a remote machine
accessible from
the terminal, and the ports to 50020/50040 for M/R master and
DFS master respectively. Is there anything I'm missing to set for remote
access to the
Hadoop cluster?

Regards Erik


Redirecting the logs to remote log server?

2008-11-21 Thread Erik Holstad
Hi!
I have been trying to get the logs from Hadoop to redirect to a remote log
server.
Tried to add the socket appender in the log4j.properties file in the conf
directory
and also to add commons.logging + log4j jars + the same log4j.properties
file
into the WEB-INF of the master but I still get nothing in the logs on the
log server,
what is it that i'm missing here?

Regards Erik


Re: Cleaning up files in HDFS?

2008-11-17 Thread Erik Holstad
Hi!
I thought that the trash function was only working for files that were
already
deleted and not for files that are to be deleted, but it would be nice if
you
could set it up to work on a specific directory.

Erik

On Fri, Nov 14, 2008 at 6:07 PM, lohit [EMAIL PROTECTED] wrote:

 Have you tried fs.trash.interval

 property
  namefs.trash.interval/name
  value0/value
  descriptionNumber of minutes between trash checkpoints.
  If zero, the trash feature is disabled.
  /description
 /property

 more info about trash feature here.
 http://hadoop.apache.org/core/docs/current/hdfs_design.html


 Thanks,
 Lohit

 - Original Message 
 From: Erik Holstad [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Friday, November 14, 2008 5:08:03 PM
 Subject: Cleaning up files in HDFS?

 Hi!
 We would like to run a delete script that deletes all files older than
 x days that are stored in lib l in hdfs, what is the best way of doing
 that?

 Regards Erik




Cleaning up files in HDFS?

2008-11-14 Thread Erik Holstad
Hi!
We would like to run a delete script that deletes all files older than
x days that are stored in lib l in hdfs, what is the best way of doing that?

Regards Erik


Re: Passing Constants from One Job to the Next

2008-10-30 Thread Erik Holstad
Hi!
Is there a way of using the value read in the configure() in the Map or
Reduce phase?

Erik

On Thu, Oct 23, 2008 at 2:40 AM, Aaron Kimball [EMAIL PROTECTED] wrote:

 See Configuration.setInt() in the API. (JobConf inherits from
 Configuration). You can read it back in the configure() method of your
 mappers/reducers
 - Aaron

 On Wed, Oct 22, 2008 at 3:03 PM, Yih Sun Khoo [EMAIL PROTECTED] wrote:

  Are you saying that I can pass, say, a single integer constant with
 either
  of these three: JobConf? A HDFS file? DistributedCache?
  Or are you asking if I can pass given the context of: JobConf? A HDFS
 file?
  DistributedCache?
  I'm thinking of how to pass a single int so from one Jobconf to the next
 
  On Wed, Oct 22, 2008 at 2:57 PM, Arun C Murthy [EMAIL PROTECTED]
 wrote:
 
  
   On Oct 22, 2008, at 2:52 PM, Yih Sun Khoo wrote:
  
I like to hear some good ways of passing constants from one job to the
   next.
  
  
   Unless I'm missing something: JobConf? A HDFS file? DistributedCache?
  
   Arun
  
  
  
   These are some ways that I can think of:
   1)  The obvious solution is to carry the constant as part of your
 value
   from
   one job to the next, but that would mean every value would hold that
   constant
   2)  Use the reporter as a hack so that you can set the status message
  and
   then get the status message back when u need the constant
  
   Any other ideas?  (Also please do not include code)
  
  
  
 



Re: How to change number of mappers in Hadoop streaming?

2008-10-17 Thread Erik Holstad
Hi Steve!
I you can pass -jobconf mapred.map.tasks=$MAPPERS  -jobconf
mapred.reduce.tasks=$REDUCERS
to the streaming job to set the number of reducers and mappers.

Regards Erik

On Wed, Oct 15, 2008 at 4:25 PM, Steve Gao [EMAIL PROTECTED] wrote:

 Is there a way to change number of mappers in Hadoop streaming command
 line?
 I know I can change hadoop-default.xml:

 property
   namemapred.map.tasks/name
   value10/value
   descriptionThe default number of map tasks per job.  Typically set
   to a prime several times greater than number of available hosts.
   Ignored when mapred.job.tracker is local.
   /description
 /property

 But that's for all jobs. What if I just want each job has different
 NUM_OF_Mappers themselves? Thanks







Failing MR jobs!

2008-09-07 Thread Erik Holstad
Hi!
I'm trying to run a MR job, but it keeps on failing and I can't understand
why.
Sometimes it shows output at 66% and sometimes 98% or so.
I had a couple of exception before that I didn't catch that made the job to
fail.


The log file from the task can be found at:
http://pastebin.com/m4414d369


and the code looks like:
//Java
import java.io.*;
import java.util.*;
import java.net.*;

//Hadoop
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

//HBase
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.mapred.*;
import org.apache.hadoop.hbase.io.*;
import org.apache.hadoop.hbase.client.*;
// org.apache.hadoop.hbase.client.HTable

//Extra
import org.apache.commons.cli.ParseException;

import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;


public class SerpentMR1 extends TableMap implements Mapper, Tool {

//Setting DebugLevel
private static final int DL = 0;

//Setting up the variables for the MR job
private static final String NAME = SerpentMR1;
private static final String INPUTTABLE = sources;
private final String[] COLS = {content:feedurl, content:ttl,
content:updated};


private Configuration conf;

public JobConf createSubmittableJob(String[] args) throws IOException{
JobConf c = new JobConf(getConf(), SerpentMR1.class);
String jar = /home/hbase/SerpentMR/ +NAME+.jar;
c.setJar(jar);
c.setJobName(NAME);

int mapTasks = 4;
int reduceTasks = 20;

c.setNumMapTasks(mapTasks);
c.setNumReduceTasks(reduceTasks);

String inputCols = ;
for (int i=0; iCOLS.length; i++){inputCols += COLS[i] +  ; }

TableMap.initJob(INPUTTABLE, inputCols, this.getClass(), Text.class,
BytesWritable.class, c);
//Classes between:

c.setOutputFormat(TextOutputFormat.class);
Path path = new Path(users); //inserting into a temp table
FileOutputFormat.setOutputPath(c, path);

c.setReducerClass(MyReducer.class);
return c;
}

public void map(ImmutableBytesWritable key, RowResult res,
OutputCollector output, Reporter reporter)
throws IOException {
Cell cellLast= res.get(COLS[2].getBytes());//lastupdate

long oldTime = cellLast.getTimestamp();

Cell cell_ttl= res.get(COLS[1].getBytes());//ttl
long ttl = StreamyUtil.BytesToLong(cell_ttl.getValue() );
byte[] url = null;

long currTime = time.GetTimeInMillis();

if(currTime - oldTime  ttl){
url = res.get(COLS[0].getBytes()).getValue();//url
output.collect(new Text(Base64.encode_strip(res.getRow())), new
BytesWritable(url) );/
}
}



public static class MyReducer implements Reducer{
//org.apache.hadoop.mapred.Reducer{


private int timeout = 1000; //Sets the connection timeout time ms;

public void reduce(Object key, Iterator values, OutputCollector
output, Reporter rep)
throws IOException {
HttpClient client = new HttpClient();//new
MultiThreadedHttpConnectionManager());
client.getHttpConnectionManager().
getParams().setConnectionTimeout(timeout);

GetMethod method = null;

int stat = 0;
String content = ;
byte[] colFam = select.getBytes();
byte[] column = lastupdate.getBytes();
byte[] currTime = null;

HBaseRef hbref = new HBaseRef();
JerlType sendjerl = null; //new JerlType();
ArrayList jd = new ArrayList();

InputStream is = null;

while(values.hasNext()){
BytesWritable bw = (BytesWritable)values.next();

String address = new String(bw.get());
try{
System.out.println(address);

method = new GetMethod(address);
method.setFollowRedirects(true);

} catch (Exception e){
System.err.println(Invalid Address);
e.printStackTrace();
}

if (method != null){
try {
// Execute the method.
stat = client.executeMethod(method);

if(stat == 200){
content = ;
is =
(InputStream)(method.getResponseBodyAsStream());

//Write to HBase new stamp select:lastupdate
currTime =
StreamyUtil.LongToBytes(time.GetTimeInMillis() );
jd.add(new 

Trying to write to HDFS from mapreduce.

2008-07-24 Thread Erik Holstad
Hi!
I'm writing a mapreduce job where I want the output from the mapper to go
strait
to the HDFS without passing the reduce method. Have been told that I can do:
c.setOutputFormat(TextOutputFormat.class); also added
Path path = new Path(user);
FileOutputFormat.setOutputPath(c, path);

But I still ended up with the result in the local filesystem instead.

Regards Erik