Re: JNI and calling Hadoop jar files

2009-03-23 Thread Jeff Eastman
This looks somewhat similar to my Subtle Classloader Issue from yesterday. I'll be watching this thread too. Jeff Saptarshi Guha wrote: Hello, I'm using some JNI interfaces, via a R. My classpath contains all the jar files in $HADOOP_HOME and $HADOOP_HOME/lib My class is public SeqKeyList(

Subtle Classloader Issue

2009-03-22 Thread Jeff Eastman
I'm trying to run the Dirichlet clustering example from (http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html). The command line: $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.1.job org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job ... loads our ex

Re: RecordReader design heuristic

2009-03-18 Thread Jeff Eastman
Hi Josh, It seemed like you had a conceptual wire crossed and I'm glad to help out. The neat thing about Hadoop mappers is - since they are given a replicated HDFS block to munch on - the job scheduler has factor> number of node choices where it can run each mapper. This means mappers are alway

Re: RecordReader design heuristic

2009-03-17 Thread Jeff Eastman
s for your feedback. Josh Patterson TVA -Original Message- From: Jeff Eastman [mailto:j...@windwardsolutions.com] Sent: Tuesday, March 17, 2009 5:11 PM To: core-user@hadoop.apache.org Subject: Re: RecordReader design heuristic If you send a single point to the mapper, your mapper logic will be clea

Re: RecordReader design heuristic

2009-03-17 Thread Jeff Eastman
If you send a single point to the mapper, your mapper logic will be clean and simple. Otherwise you will need to loop over your block of points in the mapper. In Mahout clustering, I send the mapper individual points because the input file is point-per-line. In either case, the record reader wi

Re: Users Group Meeting Slides

2008-05-22 Thread Jeff Eastman
nters to where I can find the code? Thanks! Tanton On Thu, May 22, 2008 at 11:36 AM, Jeff Eastman <[EMAIL PROTECTED]> wrote: I uploaded the slides from my Mahout overview to our wiki (http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with another recent talk by Isabel Drost. B

Users Group Meeting Slides

2008-05-22 Thread Jeff Eastman
I uploaded the slides from my Mahout overview to our wiki (http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with another recent talk by Isabel Drost. Both are similar in content but their differences reflect the rapid evolution of the project in the month that separates them in time

Re: Hadoop 0.17 AMI?

2008-05-22 Thread Jeff Eastman
ote: Hi Jeff, 0.17.0 was released yesterday, from what I can tell. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message ---- From: Jeff Eastman <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Wednesday, May 21, 2008 11:18:56 AM Subjec

Re: Hadoop experts wanted

2008-05-21 Thread Jeff Eastman
Hi Edward, Check out this link (http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable) before you panic over the similar postings. Jim's a little vague about what he's actually going to do with this data or when, but I found it useful. Jeff Edward J. Yoon wrote: Hey Ak

Re: Hadoop 0.17 AMI?

2008-05-21 Thread Jeff Eastman
asn't been released yet. I (or Mukund) is hoping to call a vote this afternoon or tomorrow. Nige On May 14, 2008, at 12:36 PM, Jeff Eastman wrote: I'm trying to bring up a cluster on EC2 using (http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the version to use bec

Hadoop 0.17 AMI?

2008-05-14 Thread Jeff Eastman
I'm trying to bring up a cluster on EC2 using (http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the version to use because of the DNS improvements, etc. Unfortunately, I cannot find a public AMI with this build. Is there one that I'm not finding or do I need to create one? Jeff

RE: Hadoop input path - can it have subdirectories

2008-04-01 Thread Jeff Eastman
My experience running with the Java API is that subdirectories in the input path do cause an exception, so the streaming file input processing must be different. Jeff Eastman > -Original Message- > From: Norbert Burger [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 01, 200

RE: Hadoop summit video capture?

2008-03-25 Thread Jeff Eastman
I don't know if there was a live version, but the entire summit was recorded on video so it will be available. BTW, it was an overwhelming success and the speakers are all well worth waiting for. I personally got a lot of positive feedback and interest in Mahout, so expect your inbox to explode in

RE: Performance / cluster scaling question

2008-03-21 Thread Jeff Eastman
> > > >> -Original Message- > >> From: André Martin [mailto:[EMAIL PROTECTED] > >> Sent: Friday, March 21, 2008 2:36 PM > >> To: core-user@hadoop.apache.org > >> Subject: Re: Performance / cluster scaling questio

RE: Performance / cluster scaling question

2008-03-21 Thread Jeff Eastman
; Sent: Friday, March 21, 2008 2:36 PM > To: core-user@hadoop.apache.org > Subject: Re: Performance / cluster scaling question > > 3 - the default one... > > Jeff Eastman wrote: > > What's your replication factor? > > Jeff > > > > > >> -

RE: Performance / cluster scaling question

2008-03-21 Thread Jeff Eastman
What's your replication factor? Jeff > -Original Message- > From: André Martin [mailto:[EMAIL PROTECTED] > Sent: Friday, March 21, 2008 2:25 PM > To: core-user@hadoop.apache.org > Subject: Performance / cluster scaling question > > Hi everyone, > I ran a distributed system that consists

RE: Master as DataNode

2008-03-21 Thread Jeff Eastman
> pushed it out to 5 machines, things look good. appreciate the help. > > what is it that causes this? i know i formatted the dfs more than once. > is > that what does it? or just adding nodes, or... ? > > -colin > > > On Fri, Mar 21, 2008 at 2:30 PM, Jeff Eastman <

RE: Master as DataNode

2008-03-21 Thread Jeff Eastman
tastore/hadoop/dfs/data: namenode namespaceID = > 2121666262; datanode namespaceID = 2058961420 > > > looks like i'm hitting this "Incompatible namespaceID" bug: > http://issues.apache.org/jira/browse/HADOOP-1212 > > is there a work around for this? > > -co

RE: Master as DataNode

2008-03-21 Thread Jeff Eastman
Check your logs. That should work out of the box with the configuration steps you described. Jeff > -Original Message- > From: Colin Freas [mailto:[EMAIL PROTECTED] > Sent: Friday, March 21, 2008 10:40 AM > To: core-user@hadoop.apache.org > Subject: Master as DataNode > > setting up a s

RE: why the value of attribute in map function will change ?

2008-03-16 Thread Jeff Eastman
Consider that your mapper and driver execute in different JVMs and cannot share static values. Jeff > -Original Message- > From: ma qiang [mailto:[EMAIL PROTECTED] > Sent: Saturday, March 15, 2008 10:35 PM > To: core-user@hadoop.apache.org > Subject: why the value of attribute in map func

RE: Map/Reduce Type Mismatch error

2008-03-07 Thread Jeff Eastman
The key provided by the default FileInputFormat is not Text, but an integer offset into the split(which is not very usful IMHO). Try changing your mapper back to . If you are expecting the file name to be the key, you will (I think) need to write your own InputFormat. Jeff -Original Message--

RE: Equivalent of cmdline head or tail?

2008-03-06 Thread Jeff Eastman
I think the accepted pattern for this is to accumulate your top N and bottom N values while you reduce and then output them in the close() call. The files from your config can be obtained during the configure() call. Jeff -Original Message- From: Jimmy Wan [mailto:[EMAIL PROTECTED] Sent:

RE: Decompression Blues

2008-02-27 Thread Jeff Eastman
: Arun C Murthy [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 26, 2008 3:47 PM To: core-user@hadoop.apache.org Subject: Re: Decompression Blues Jeff, On Feb 26, 2008, at 12:58 PM, Jeff Eastman wrote: > I'm processing a number of .gz compressed Apache and other logs using > Hadoop

RE: Decompression Blues

2008-02-26 Thread Jeff Eastman
ginal Message- From: Arun C Murthy [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 26, 2008 3:47 PM To: core-user@hadoop.apache.org Subject: Re: Decompression Blues Jeff, On Feb 26, 2008, at 12:58 PM, Jeff Eastman wrote: > I'm processing a number of .gz compressed Apache and oth

Decompression Blues

2008-02-26 Thread Jeff Eastman
I'm processing a number of .gz compressed Apache and other logs using Hadoop 0.15.2 and encountering fatal decompression errors such as: 08/02/26 12:09:12 INFO mapred.JobClient: Task Id : task_200802171116_0001_m_05_0, Status : FAILED java.lang.InternalError at org.apache.hadoop.i

RE: newbie question... please help.

2008-02-23 Thread Jeff Eastman
If your main question is "can I host my mssql database on the Hadoop DFS?", then the answer is no. The DFS is designed for large files that are write once, read multiple and a database engine would want to update the files. If, OTOH, your question is "can I move (some of) my mssql database into H

RE: Best Practice?

2008-02-11 Thread Jeff Eastman
onday, February 11, 2008 12:40 PM To: core-user@hadoop.apache.org Subject: Re: Best Practice? Jeff, Doesn't the reducer see all of the data points for each cluster (canopy) in a single list? If so, why the need to output during close? If not, why not? On 2/11/08 12:24 PM, "Jeff E

RE: Best Practice?

2008-02-11 Thread Jeff Eastman
rried about this, but now I won't. Thanks, Jeff -Original Message- From: Owen O'Malley [mailto:[EMAIL PROTECTED] Sent: Monday, February 11, 2008 10:40 AM To: core-user@hadoop.apache.org Subject: Re: Best Practice? On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote: > I'

RE: Best Practice?

2008-02-10 Thread Jeff Eastman
sums in the reducer. On 2/9/08 4:21 PM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote: > Thanks Aaron, I missed that one. Now I have my configuration information > in my mapper. In the mapper, I'm computing cluster centroids by reading > all the input points and assign

RE: Best Practice?

2008-02-09 Thread Jeff Eastman
Well, I tried saving the OutputCollectors in an instance variable and writing to them during close and it seems to work. Jeff -Original Message- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Saturday, February 09, 2008 4:21 PM To: core-user@hadoop.apache.org Subject: RE: Best

RE: Best Practice?

2008-02-09 Thread Jeff Eastman
Thanks Aaron, I missed that one. Now I have my configuration information in my mapper. In the mapper, I'm computing cluster centroids by reading all the input points and assigning them to clusters. I don't actually store the points in the mapper, just the evolving centroids. I'm trying to wait un

Best Practice?

2008-02-09 Thread Jeff Eastman
What's the best way to get additional configuration arguments to my mappers and reducers? Jeff

RE: Starting up a larger cluster

2008-02-08 Thread Jeff Eastman
I noticed that phenomena right off the bat. Is that a designed "feature" or just an unhappy consequence of how blocks are allocated? Ted compensates for this by aggressively rebalancing his cluster often by adjusting the replication up and down, but I wonder if an improvement in the allocation stra

RE: Starting up a larger cluster

2008-02-07 Thread Jeff Eastman
Oops, should be TaskTracker. -Original Message- From: Jeff Eastman [mailto:[EMAIL PROTECTED] Sent: Thursday, February 07, 2008 12:24 PM To: core-user@hadoop.apache.org Subject: RE: Starting up a larger cluster Hi Ben, I've been down this same path recently and I think I understand

RE: Starting up a larger cluster

2008-02-07 Thread Jeff Eastman
Hi Ben, I've been down this same path recently and I think I understand your issues: 1) Yes, you need the hadoop folder to be in the same location on each node. Only the master node actually uses the slaves file, to start up DataNode and JobTracker daemons on those nodes. 2) If you did not specif

RE: Platform reliability with Hadoop

2008-01-22 Thread Jeff Eastman
adoop/mapred/system - mapred.temp.dir = /hadoop/mapred/temp Each user gets their own /users/username directory in the DFS and jobs submitted by each user use their own user directories. Now to find a bigger problem to solve... Jeff -Original Message- From: Jeff Eastman [mailto:[EMAIL PROT