Class loading in Hadoop and HBase

2014-03-19 Thread Amit Sela
Hi all, I'm running with Hadoop 1.0.4 and HBase 0.94.12 bundled (OSGi) versions I built. Most issues I encountered are related to class loaders. One of the patterns I noticed in both projects is: ClassLoader cl = Thread.currentThread().getContextClassLoader(); if(cl == null) { cl

Re: manipulating key in combine phase

2014-01-13 Thread Amit Sela
a “mapper-side pre-reducer” and operates on blocks of data that have already been sorted by key, so mucking with the keys doesn’t **seem** like a good idea. john *From:* Amit Sela [mailto:am...@infolinks.com] *Sent:* Sunday, January 12, 2014 9:26 AM *To:* user@hadoop.apache.org *Subject

Custom counters in combiner

2014-01-13 Thread Amit Sela
Hi all, I'm running a mapreduce job that has custom counters incremented in the combiner's reduce function. Looking at the mapreduce web UI I see that, like all counters, its has three columns: Map, Reduce and Total. From what I know, the combiner is executed on the map output, hence runs in

manipulating key in combine phase

2014-01-12 Thread Amit Sela
Hi all, I was wondering if it is possible to manipulate the key during combine: Say I have a mapreduce job where the key has many qualifiers. I would like to split the key into two (or more) keys if it has more than, say 100 qualifiers. In the combiner class I would do something like: int count

Re: Setting up Snappy compression in Hadoop

2014-01-02 Thread Amit Sela
a look at http://hbase.apache.org/book.html#snappy.compression Cheers On Wed, Jan 1, 2014 at 8:05 AM, Amit Sela am...@infolinks.com wrote: Hi all, I'm running on Hadoop 1.0.4 and I'd like to use Snappy for map output compression. I'm adding the configurations: configuration.setBoolean

Setting up Snappy compression in Hadoop

2014-01-01 Thread Amit Sela
Hi all, I'm running on Hadoop 1.0.4 and I'd like to use Snappy for map output compression. I'm adding the configurations: configuration.setBoolean(mapred.compress.map.output, true); configuration.set(mapred.map.output.compression.codec, org.apache.hadoop.io.compress.SnappyCodec); And I've added

Add machine with bigger storage to cluster

2013-09-30 Thread Amit Sela
I would like to add new machines to my existing cluster but they won't be similar to the current nodes. I have to scenarios I'm thinking of: 1. What are the implications (besides initial load balancing) of adding a new node to the cluster, if this node runs on a machine similar to all other nodes

Bzip2 vs Gzip

2013-09-17 Thread Amit Sela
Hi all, I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop (logs are gzipped into block size files). I read that bzip2 is splittable. Is it so in hadoop 1.0.4 ? Does that mean that any input file bigger then block size will be split between maps ? What are the tradeoffs

Fair Scheduler pools regardless of users

2013-07-08 Thread Amit Sela
Hi all, I was wondering if there is a way to let fair scheduler ignore the user and submit a job to a specific pool. I would like to have 3/4 pools: 1. Very short (~1 min) routine jobs. 2. Normal processing time (1 hr) routine jobs. 3. Long (days) experimental jobs. 4. ? ad hoc immediate jobs ?

Capacity scheduler for dividing cluster resousrces

2013-07-07 Thread Amit Sela
Hi everyone, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines) and I would like to divide my cluster resources by job's process time. The jobs running on the cluster can be divided as follows: 1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3.

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows:

Re: Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Sorry, Gmail tab error, please disregard and I will re-send, Thanks. On Sat, Jul 6, 2013 at 5:02 PM, Amit Sela am...@infolinks.com wrote: Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows:

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows: 1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3. Very long jobs: days of processing. (still not active and

Using CapacityScheduler to divide resources between jobs (not users)

2013-07-06 Thread Amit Sela
Hi all, I'm running Hadoop 1.0.4 on a modest cluster (~20 machines). The jobs running on the cluster can be divided (resource wise) as follows: 1. Very short jobs: less then 1 minute. 2. Normal jobs: 2-3 minutes up to an hour or two. 3. Very long jobs: days of processing. (still not active and

Failing to run ant test on clean Hadoop branch-1 checkout

2013-04-27 Thread Amit Sela
Hi all, I'm trying to run ant test on a clean Hadoop branch-1 checkout. ant works fine but when I run ant test I get a lot of failures: Test org.apache.hadoop.cli.TestCLI FAILED Test org.apache.hadoop.fs.TestFileUtil FAILED Test org.apache.hadoop.fs.TestHarFileSystem FAILED Test

Re: Configuration clone constructor not cloning classloader

2013-04-21 Thread Amit Sela
://issues.apache.org/jira/browse/HADOOP-6103, although the fix never made it into branch-1. Can you create a branch-1 patch for this please? Thanks, Tom On Thu, Apr 18, 2013 at 4:09 AM, Amit Sela am...@infolinks.com wrote: Hi all, I was wondering if there is a good reason why public

Configuration clone constructor not cloning classloader

2013-04-18 Thread Amit Sela
Hi all, I was wondering if there is a good reason why public Configuration(Configuration other) constructor in Hadoop 1.0.4 doesn't clone the classloader in other to the new Configration ? Is this a bug ? I'm asking because I'm trying to run a Hadoop client in OSGI environment and I need to

Re: Submitting mapreduce and nothing happens

2013-04-17 Thread Amit Sela
, Amit Sela am...@infolinks.com wrote: Hi all, I'm trying to submit a mapreduce job remotely using job.submit() I get the following: [WARN ] org.apache.hadoop.mapred.JobClient » Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [INFO

Setting up a Hadoop client in OSGI bundle

2013-04-17 Thread Amit Sela
Hi all, I'm trying to setup an Hadoop client for job submissions (and more) as an OSGI bundle. I came over a lot of hardships but I'm kinda stuck now. When I create a new Job for submission I setClassLoader() for the Job Configuration so that it would use the bundle's ClassLoader (felix), but

Re: Submitting mapreduce and nothing happens

2013-04-16 Thread Amit Sela
Nothing on JT log, but as I mentioned I see this in the client log: [WARN ] org.apache.hadoop.mapred.JobClient » Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [INFO ] org.apache.hadoop.mapred.JobClient » Cleaning up the staging

Submitting mapreduce and nothing happens

2013-04-15 Thread Amit Sela
Hi all, I'm trying to submit a mapreduce job remotely using job.submit() I get the following: [WARN ] org.apache.hadoop.mapred.JobClient » Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [INFO ] org.apache.hadoop.mapred.JobClient

Re: Submitting mapreduce and nothing happens

2013-04-15 Thread Amit Sela
, Amit Sela am...@infolinks.com wrote: The client prints the two lines I posted and the cluster shows nothing. Not even incrementing the number of submitted jobs. On Apr 15, 2013 4:10 PM, Harsh J ha...@cloudera.com wrote: When you say nothing happens; where exactly do you mean? The client

Re: Submitting mapreduce and nothing happens

2013-04-15 Thread Amit Sela
Reading my own message I understand that maybe it's not clear so just to clarify - the previously mentioned JT ID is indeed the correct ID. Thanks. On Apr 15, 2013 4:35 PM, Amit Sela am...@infolinks.com wrote: This is the JT ID and there is no problem running jobs from command line, just remote

Re: Child error

2013-03-13 Thread Amit Sela
://issues.apache.org/jira/browse/MAPREDUCE-4857 Which is fixed in 1.0.4 ** ** ** ** *From:* Amit Sela [mailto:am...@infolinks.com am...@infolinks.com] *Sent:* Tuesday, March 12, 2013 5:08 AM *To:* user@hadoop.apache.org *Subject:* Re: Child error ** ** Hi Jean-Marc

Child error

2013-03-12 Thread Amit Sela
Hi all, I have a weird failure occurring every now and then during a MapReduce job. This is the error: *java.lang.Throwable: Child Error* * at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)* *Caused by: java.io.IOException: Task process exit with nonzero status of 255.* * at

Re: Child error

2013-03-12 Thread Amit Sela
from 1.0.3 that much no ?) Thanks! On Tue, Mar 12, 2013 at 1:40 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Amit, Which Hadoop version are you using? I have been told it's because of https://issues.apache.org/jira/browse/MAPREDUCE-2374 JM 2013/3/12 Amit Sela am

JobTracker client - max connections

2013-03-05 Thread Amit Sela
Hi all, I'm implementing an API over the JobTracker client - JobClient. My plan is to have a pool of JobClient objects that will expose the ability to submit jobs, poll status etc. My question is: Should I set a maximum pool size ? How many connections aree too many connection for the JobTracker

Generic output key class

2013-02-10 Thread Amit Sela
Hi all, Has anyone ever used some kind of a generic output key for a mapreduce job ? I have a job running multiple tasks and I want them to be able to use both Text and IntWritable as output key classes. Any suggestions ? Thanks, Amit.

Re: Generic output key class

2013-02-10 Thread Amit Sela
{ integer.writeFields(out); } } [... readFields method that works in a similar way] } -Sandy On Sun, Feb 10, 2013 at 4:00 AM, Amit Sela am...@infolinks.com wrote: Hi all, Has anyone ever used some kind of a generic output key for a mapreduce job ? I have a job running multiple

Using JCUDA with MapReduce

2013-01-20 Thread Amit Sela
Hi all, I was wondering if anyone here tried using the GPU of a Hadoop Node to enhance MapReduce processing ? I read about it but it always comes down to heavy computations such as Matrix multiplications and Mote Carlo algorithms. Did anyone try it with MapReduce jobs that analyze logs or any

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Amit Sela
Hi Jon, I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4 and I haven't noticed any performance issues. By multiple assignment feature do you mean speculative execution (mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution) ? On Mon, Nov

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Amit Sela
resolved for 1.2.0. On Tue, Nov 27, 2012 at 3:20 PM, Amit Sela am...@infolinks.com wrote: Hi Jon, I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4 and I haven't noticed any performance issues. By multiple assignment feature do you mean speculative execution

Facebook corona compatibility

2012-11-12 Thread Amit Sela
Hi everyone, Anyone knows if the new corona tools (Facebook just released as open source) are compatible with hadoop 1.0.x ? or just 0.20.x ? Thanks.

HDFS upgrade

2012-10-17 Thread Amit Sela
Hi all, I want to upgrade a 1TB cluster from hadoop 0.20.3 to hadoop 1.0.3. I am interested to know how long does the hdfs upgrade take and in general how long it takes from deploying new versions until the cluster is back to running heavy MapReduce ? I'd also appreciate it if someone could