This looks somewhat similar to my Subtle Classloader Issue from
yesterday. I'll be watching this thread too.
Jeff
Saptarshi Guha wrote:
Hello,
I'm using some JNI interfaces, via a R. My classpath contains all the
jar files in $HADOOP_HOME and $HADOOP_HOME/lib
My class is
public SeqKeyList(
I'm trying to run the Dirichlet clustering example from
(http://cwiki.apache.org/MAHOUT/syntheticcontroldata.html). The command
line:
$HADOOP_HOME/bin/hadoop jar
$MAHOUT_HOME/examples/target/mahout-examples-0.1.job
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
... loads our ex
Hi Josh,
It seemed like you had a conceptual wire crossed and I'm glad to help
out. The neat thing about Hadoop mappers is - since they are given a
replicated HDFS block to munch on - the job scheduler has factor> number of node choices where it can run each mapper. This means
mappers are alway
s for your
feedback.
Josh Patterson
TVA
-Original Message-
From: Jeff Eastman [mailto:j...@windwardsolutions.com]
Sent: Tuesday, March 17, 2009 5:11 PM
To: core-user@hadoop.apache.org
Subject: Re: RecordReader design heuristic
If you send a single point to the mapper, your mapper logic will be
clea
If you send a single point to the mapper, your mapper logic will be
clean and simple. Otherwise you will need to loop over your block of
points in the mapper. In Mahout clustering, I send the mapper individual
points because the input file is point-per-line. In either case, the
record reader wi
nters to where I can find the code?
Thanks!
Tanton
On Thu, May 22, 2008 at 11:36 AM, Jeff Eastman
<[EMAIL PROTECTED]> wrote:
I uploaded the slides from my Mahout overview to our wiki
(http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with another
recent talk by Isabel Drost. B
I uploaded the slides from my Mahout overview to our wiki
(http://cwiki.apache.org/confluence/display/MAHOUT/FAQ) along with
another recent talk by Isabel Drost. Both are similar in content but
their differences reflect the rapid evolution of the project in the
month that separates them in time
ote:
Hi Jeff,
0.17.0 was released yesterday, from what I can tell.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message ----
From: Jeff Eastman <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, May 21, 2008 11:18:56 AM
Subjec
Hi Edward,
Check out this link
(http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable)
before you panic over the similar postings. Jim's a little vague about
what he's actually going to do with this data or when, but I found it
useful.
Jeff
Edward J. Yoon wrote:
Hey Ak
asn't been released yet. I (or Mukund) is hoping to call
a vote this afternoon or tomorrow.
Nige
On May 14, 2008, at 12:36 PM, Jeff Eastman wrote:
I'm trying to bring up a cluster on EC2 using
(http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the
version to use bec
I'm trying to bring up a cluster on EC2 using
(http://wiki.apache.org/hadoop/AmazonEC2) and it seems that 0.17 is the
version to use because of the DNS improvements, etc. Unfortunately, I
cannot find a public AMI with this build. Is there one that I'm not
finding or do I need to create one?
Jeff
My experience running with the Java API is that subdirectories in the input
path do cause an exception, so the streaming file input processing must be
different.
Jeff Eastman
> -Original Message-
> From: Norbert Burger [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, April 01, 200
I don't know if there was a live version, but the entire summit was recorded
on video so it will be available. BTW, it was an overwhelming success and
the speakers are all well worth waiting for. I personally got a lot of
positive feedback and interest in Mahout, so expect your inbox to explode in
> >
> >> -Original Message-
> >> From: André Martin [mailto:[EMAIL PROTECTED]
> >> Sent: Friday, March 21, 2008 2:36 PM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: Performance / cluster scaling questio
; Sent: Friday, March 21, 2008 2:36 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Performance / cluster scaling question
>
> 3 - the default one...
>
> Jeff Eastman wrote:
> > What's your replication factor?
> > Jeff
> >
> >
> >> -
What's your replication factor?
Jeff
> -Original Message-
> From: André Martin [mailto:[EMAIL PROTECTED]
> Sent: Friday, March 21, 2008 2:25 PM
> To: core-user@hadoop.apache.org
> Subject: Performance / cluster scaling question
>
> Hi everyone,
> I ran a distributed system that consists
> pushed it out to 5 machines, things look good. appreciate the help.
>
> what is it that causes this? i know i formatted the dfs more than once.
> is
> that what does it? or just adding nodes, or... ?
>
> -colin
>
>
> On Fri, Mar 21, 2008 at 2:30 PM, Jeff Eastman <
tastore/hadoop/dfs/data: namenode namespaceID =
> 2121666262; datanode namespaceID = 2058961420
>
>
> looks like i'm hitting this "Incompatible namespaceID" bug:
> http://issues.apache.org/jira/browse/HADOOP-1212
>
> is there a work around for this?
>
> -co
Check your logs. That should work out of the box with the configuration
steps you described.
Jeff
> -Original Message-
> From: Colin Freas [mailto:[EMAIL PROTECTED]
> Sent: Friday, March 21, 2008 10:40 AM
> To: core-user@hadoop.apache.org
> Subject: Master as DataNode
>
> setting up a s
Consider that your mapper and driver execute in different JVMs and cannot
share static values.
Jeff
> -Original Message-
> From: ma qiang [mailto:[EMAIL PROTECTED]
> Sent: Saturday, March 15, 2008 10:35 PM
> To: core-user@hadoop.apache.org
> Subject: why the value of attribute in map func
The key provided by the default FileInputFormat is not Text, but an
integer offset into the split(which is not very usful IMHO). Try
changing your mapper back to . If you are
expecting the file name to be the key, you will (I think) need to write
your own InputFormat.
Jeff
-Original Message--
I think the accepted pattern for this is to accumulate your top N and
bottom N values while you reduce and then output them in the close()
call. The files from your config can be obtained during the configure()
call.
Jeff
-Original Message-
From: Jimmy Wan [mailto:[EMAIL PROTECTED]
Sent:
: Arun C Murthy [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 26, 2008 3:47 PM
To: core-user@hadoop.apache.org
Subject: Re: Decompression Blues
Jeff,
On Feb 26, 2008, at 12:58 PM, Jeff Eastman wrote:
> I'm processing a number of .gz compressed Apache and other logs using
> Hadoop
ginal Message-
From: Arun C Murthy [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 26, 2008 3:47 PM
To: core-user@hadoop.apache.org
Subject: Re: Decompression Blues
Jeff,
On Feb 26, 2008, at 12:58 PM, Jeff Eastman wrote:
> I'm processing a number of .gz compressed Apache and oth
I'm processing a number of .gz compressed Apache and other logs using
Hadoop 0.15.2 and encountering fatal decompression errors such as:
08/02/26 12:09:12 INFO mapred.JobClient: Task Id :
task_200802171116_0001_m_05_0, Status : FAILED
java.lang.InternalError
at
org.apache.hadoop.i
If your main question is "can I host my mssql database on the Hadoop
DFS?", then the answer is no. The DFS is designed for large files that
are write once, read multiple and a database engine would want to update
the files.
If, OTOH, your question is "can I move (some of) my mssql database into
H
onday, February 11, 2008 12:40 PM
To: core-user@hadoop.apache.org
Subject: Re: Best Practice?
Jeff,
Doesn't the reducer see all of the data points for each cluster (canopy)
in
a single list?
If so, why the need to output during close?
If not, why not?
On 2/11/08 12:24 PM, "Jeff E
rried about this, but now I won't.
Thanks,
Jeff
-Original Message-
From: Owen O'Malley [mailto:[EMAIL PROTECTED]
Sent: Monday, February 11, 2008 10:40 AM
To: core-user@hadoop.apache.org
Subject: Re: Best Practice?
On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote:
> I'
sums in the reducer.
On 2/9/08 4:21 PM, "Jeff Eastman" <[EMAIL PROTECTED]> wrote:
> Thanks Aaron, I missed that one. Now I have my configuration
information
> in my mapper. In the mapper, I'm computing cluster centroids by
reading
> all the input points and assign
Well, I tried saving the OutputCollectors in an instance variable and
writing to them during close and it seems to work.
Jeff
-Original Message-
From: Jeff Eastman [mailto:[EMAIL PROTECTED]
Sent: Saturday, February 09, 2008 4:21 PM
To: core-user@hadoop.apache.org
Subject: RE: Best
Thanks Aaron, I missed that one. Now I have my configuration information
in my mapper. In the mapper, I'm computing cluster centroids by reading
all the input points and assigning them to clusters. I don't actually
store the points in the mapper, just the evolving centroids.
I'm trying to wait un
What's the best way to get additional configuration arguments to my
mappers and reducers?
Jeff
I noticed that phenomena right off the bat. Is that a designed "feature"
or just an unhappy consequence of how blocks are allocated? Ted
compensates for this by aggressively rebalancing his cluster often by
adjusting the replication up and down, but I wonder if an improvement in
the allocation stra
Oops, should be TaskTracker.
-Original Message-
From: Jeff Eastman [mailto:[EMAIL PROTECTED]
Sent: Thursday, February 07, 2008 12:24 PM
To: core-user@hadoop.apache.org
Subject: RE: Starting up a larger cluster
Hi Ben,
I've been down this same path recently and I think I understand
Hi Ben,
I've been down this same path recently and I think I understand your
issues:
1) Yes, you need the hadoop folder to be in the same location on each
node. Only the master node actually uses the slaves file, to start up
DataNode and JobTracker daemons on those nodes.
2) If you did not specif
adoop/mapred/system
- mapred.temp.dir = /hadoop/mapred/temp
Each user gets their own /users/username directory in the DFS and jobs
submitted by each user use their own user directories. Now to find a
bigger problem to solve...
Jeff
-Original Message-
From: Jeff Eastman [mailto:[EMAIL PROT
36 matches
Mail list logo