Hi all,
I'm running with Hadoop 1.0.4 and HBase 0.94.12 bundled (OSGi) versions I
built.
Most issues I encountered are related to class loaders.
One of the patterns I noticed in both projects is:
ClassLoader cl = Thread.currentThread().getContextClassLoader();
if(cl == null) {
cl
a “mapper-side
pre-reducer” and operates on blocks of data that have already been sorted
by key, so mucking with the keys doesn’t **seem** like a good idea.
john
*From:* Amit Sela [mailto:am...@infolinks.com]
*Sent:* Sunday, January 12, 2014 9:26 AM
*To:* user@hadoop.apache.org
*Subject
Hi all,
I'm running a mapreduce job that has custom counters incremented in the
combiner's reduce function.
Looking at the mapreduce web UI I see that, like all counters, its has
three columns: Map, Reduce and Total.
From what I know, the combiner is executed on the map output, hence runs in
Hi all,
I was wondering if it is possible to manipulate the key during combine:
Say I have a mapreduce job where the key has many qualifiers.
I would like to split the key into two (or more) keys if it has more
than, say 100 qualifiers.
In the combiner class I would do something like:
int count
a look at
http://hbase.apache.org/book.html#snappy.compression
Cheers
On Wed, Jan 1, 2014 at 8:05 AM, Amit Sela am...@infolinks.com wrote:
Hi all,
I'm running on Hadoop 1.0.4 and I'd like to use Snappy for map output
compression.
I'm adding the configurations:
configuration.setBoolean
Hi all,
I'm running on Hadoop 1.0.4 and I'd like to use Snappy for map output
compression.
I'm adding the configurations:
configuration.setBoolean(mapred.compress.map.output, true);
configuration.set(mapred.map.output.compression.codec,
org.apache.hadoop.io.compress.SnappyCodec);
And I've added
I would like to add new machines to my existing cluster but they won't be
similar to the current nodes. I have to scenarios I'm thinking of:
1. What are the implications (besides initial load balancing) of adding a
new node to the cluster, if this node runs on a machine similar to all
other nodes
Hi all,
I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop
(logs are gzipped into block size files).
I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean
that any input file bigger then block size will be split between maps ?
What are the tradeoffs
Hi all,
I was wondering if there is a way to let fair scheduler ignore the user and
submit a job to a specific pool.
I would like to have 3/4 pools:
1. Very short (~1 min) routine jobs.
2. Normal processing time (1 hr) routine jobs.
3. Long (days) experimental jobs.
4. ? ad hoc immediate jobs ?
Hi everyone,
I'm running Hadoop 1.0.4 on a modest cluster (~20 machines) and I would
like to divide my cluster resources by job's process time.
The jobs running on the cluster can be divided as follows:
1. Very short jobs: less then 1 minute.
2. Normal jobs: 2-3 minutes up to an hour or two.
3.
Hi all,
I'm running Hadoop 1.0.4 on a modest cluster (~20 machines).
The jobs running on the cluster can be divided (resource wise) as follows:
Sorry, Gmail tab error, please disregard and I will re-send, Thanks.
On Sat, Jul 6, 2013 at 5:02 PM, Amit Sela am...@infolinks.com wrote:
Hi all,
I'm running Hadoop 1.0.4 on a modest cluster (~20 machines).
The jobs running on the cluster can be divided (resource wise) as follows:
Hi all,
I'm running Hadoop 1.0.4 on a modest cluster (~20 machines).
The jobs running on the cluster can be divided (resource wise) as follows:
1. Very short jobs: less then 1 minute.
2. Normal jobs: 2-3 minutes up to an hour or two.
3. Very long jobs: days of processing. (still not active and
Hi all,
I'm running Hadoop 1.0.4 on a modest cluster (~20 machines).
The jobs running on the cluster can be divided (resource wise) as follows:
1. Very short jobs: less then 1 minute.
2. Normal jobs: 2-3 minutes up to an hour or two.
3. Very long jobs: days of processing. (still not active and
Hi all,
I'm trying to run ant test on a clean Hadoop branch-1 checkout.
ant works fine but when I run ant test I get a lot of failures:
Test org.apache.hadoop.cli.TestCLI FAILED
Test org.apache.hadoop.fs.TestFileUtil FAILED
Test org.apache.hadoop.fs.TestHarFileSystem FAILED
Test
://issues.apache.org/jira/browse/HADOOP-6103, although the fix
never made it into branch-1. Can you create a branch-1 patch for this
please?
Thanks,
Tom
On Thu, Apr 18, 2013 at 4:09 AM, Amit Sela am...@infolinks.com wrote:
Hi all,
I was wondering if there is a good reason why public
Hi all,
I was wondering if there is a good reason why public
Configuration(Configuration other) constructor in Hadoop 1.0.4 doesn't
clone the classloader in other to the new Configration ?
Is this a bug ?
I'm asking because I'm trying to run a Hadoop client in OSGI environment
and I need to
, Amit Sela am...@infolinks.com wrote:
Hi all,
I'm trying to submit a mapreduce job remotely using job.submit()
I get the following:
[WARN ] org.apache.hadoop.mapred.JobClient » Use
GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
[INFO
Hi all,
I'm trying to setup an Hadoop client for job submissions (and more) as an
OSGI bundle.
I came over a lot of hardships but I'm kinda stuck now.
When I create a new Job for submission I setClassLoader() for the Job
Configuration so that it would use the bundle's ClassLoader (felix), but
Nothing on JT log, but as I mentioned I see this in the client log:
[WARN ] org.apache.hadoop.mapred.JobClient » Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
[INFO ] org.apache.hadoop.mapred.JobClient » Cleaning up the staging
Hi all,
I'm trying to submit a mapreduce job remotely using job.submit()
I get the following:
[WARN ] org.apache.hadoop.mapred.JobClient » Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
[INFO ] org.apache.hadoop.mapred.JobClient
, Amit Sela am...@infolinks.com wrote:
The client prints the two lines I posted and the cluster shows nothing.
Not
even incrementing the number of submitted jobs.
On Apr 15, 2013 4:10 PM, Harsh J ha...@cloudera.com wrote:
When you say nothing happens; where exactly do you mean? The client
Reading my own message I understand that maybe it's not clear so just to
clarify - the previously mentioned JT ID is indeed the correct ID.
Thanks.
On Apr 15, 2013 4:35 PM, Amit Sela am...@infolinks.com wrote:
This is the JT ID and there is no problem running jobs from command line,
just remote
://issues.apache.org/jira/browse/MAPREDUCE-4857
Which is fixed in 1.0.4
** **
** **
*From:* Amit Sela [mailto:am...@infolinks.com am...@infolinks.com]
*Sent:* Tuesday, March 12, 2013 5:08 AM
*To:* user@hadoop.apache.org
*Subject:* Re: Child error
** **
Hi Jean-Marc
Hi all,
I have a weird failure occurring every now and then during a MapReduce job.
This is the error:
*java.lang.Throwable: Child Error*
* at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)*
*Caused by: java.io.IOException: Task process exit with nonzero status of
255.*
* at
from 1.0.3 that much no ?)
Thanks!
On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Hi Amit,
Which Hadoop version are you using?
I have been told it's because of
https://issues.apache.org/jira/browse/MAPREDUCE-2374
JM
2013/3/12 Amit Sela am
Hi all,
I'm implementing an API over the JobTracker client - JobClient.
My plan is to have a pool of JobClient objects that will expose the ability
to submit jobs, poll status etc.
My question is: Should I set a maximum pool size ? How many connections
aree too many connection for the JobTracker
Hi all,
Has anyone ever used some kind of a generic output key for a mapreduce
job ?
I have a job running multiple tasks and I want them to be able to use both
Text and IntWritable as output key classes.
Any suggestions ?
Thanks,
Amit.
{
integer.writeFields(out);
}
}
[... readFields method that works in a similar way]
}
-Sandy
On Sun, Feb 10, 2013 at 4:00 AM, Amit Sela am...@infolinks.com wrote:
Hi all,
Has anyone ever used some kind of a generic output key for a mapreduce
job ?
I have a job running multiple
Hi all,
I was wondering if anyone here tried using the GPU of a Hadoop Node to
enhance MapReduce processing ?
I read about it but it always comes down to heavy computations such as
Matrix multiplications and Mote Carlo algorithms.
Did anyone try it with MapReduce jobs that analyze logs or any
Hi Jon,
I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4
and I haven't noticed any performance issues. By multiple assignment
feature do you mean speculative execution
(mapred.map.tasks.speculative.execution
and mapred.reduce.tasks.speculative.execution) ?
On Mon, Nov
resolved for 1.2.0.
On Tue, Nov 27, 2012 at 3:20 PM, Amit Sela am...@infolinks.com wrote:
Hi Jon,
I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4
and I haven't noticed any performance issues. By multiple assignment
feature do you mean speculative execution
Hi everyone,
Anyone knows if the new corona tools (Facebook just released as open
source) are compatible with hadoop 1.0.x ? or just 0.20.x ?
Thanks.
Hi all,
I want to upgrade a 1TB cluster from hadoop 0.20.3 to hadoop 1.0.3.
I am interested to know how long does the hdfs upgrade take and in general
how long it takes from deploying new versions until the cluster is back to
running heavy MapReduce ?
I'd also appreciate it if someone could
34 matches
Mail list logo