custom partitioner

2012-03-26 Thread Harun Raşit ER
My custom parititoner is:
public class PopulationPartitioner extends Partitioner IntWritable,
Chromosome implements Configurable
{   
@Override
public int getPartition(IntWritable key, Chromosome value, int
numOfPartitions) {

int partition = key.get();
if (partition  0 || partition = numOfPartitions)
{
partition = numOfPartitions-1;
}
System.out.println(partition +partition );
return partition;
}

@Override
public Configuration getConf() {
// TODO Auto-generated method stub
return conf;
}

@Override
public void setConf(Configuration arg0) {
// TODO Auto-generated method stub
conf = arg0;

}

private Configuration conf;
}

And my mapred configuration file is :

configuration
 property
  namemapred.job.tracker/name
  valuelocalhost:9001/value
 /property
 property
  namemapred.tasktracker.reduce.tasks.maximum/name
  value4/value
 /property
/configuration

Thanks again.


This shouldn't be the case at all. Can you share your Partitioner code
and the job.xml of the job that showed this behavior?

In any case: How do you set the numberOfReducer to 4?

2012/3/23 Harun Raşit ER harunrasi...@gmail.com:
 I wrote a custom partitioner. But when I work as standalone or
 pseudo-distributed mode, the number of partitions is always 1. I set the
 numberOfReducer to 4, but the numOfPartitions parameter of custom
 partitioner is still 1 and all my four mappers' results are going to 1
 reducer. The other reducers yield empty files.

 How can i set the number of partitions in standalone or pseudo-distributed
 mode?

 thanks for your helps.



-- 
Harsh J


Re: custom partitioner

2012-03-26 Thread Harun Raşit ER
Thanks for your help.

I assigned key values from a static variable and when i ran in eclipse
platform, i saw the right key values, but after distributed-mode
debug, i have seen all my key values are 0.


On 3/25/12, Harsh J ha...@cloudera.com wrote:
 Harun,

 Does your map task stdout logs show varying values for partition?
 Seems to me like all your keys are somehow outside of [0,
 numOfPartitions), and hence go to the last partition, per your logic.

 2012/3/25 Harun Raşit ER harunrasi...@gmail.com:
 public int getPartition(IntWritable key, Chromosome value, int
 numOfPartitions)
  {
                int partition = key.get();
                if (partition  0 || partition = numOfPartitions)
                {
                        partition = numOfPartitions-1;
                }
                System.out.println(partition +partition );
                return partition;
 }

 I wrote the custom partitioner above. But the problem is about the third
 parameter, numOfPartitions.

 It is always 1 in pseudo-distributed mode. I have 4 mappers and 4
 reducers, but only one of the reducers uses the real values. The others
 yield nothing, just empty files.

 When I remove the if statement, hadoop complains about the partition
 number
 as illegal partition for 

 How can i set the number of partitions in pseudo-distributed mode?

 Thanks.



 --
 Harsh J



Configurations for Multiple Clusters

2012-03-26 Thread Eli Finkelshteyn

Hi,
I've recently run into a situation where I'll have access to my local 
single node cluster, and development cluster, and a production cluster. 
What's the best way to work with all three of these, easily switching 
between which I want to use? I found this project 
http://code.google.com/p/hadoopenv/wiki/README that seems like it's 
trying to address my issue, but it hasn't been updated since 2010. Is 
this still the preferred way of managing multiple clusters, or is there 
something better (preferably something easy to install, as I'll need to 
convince others at my company to use whatever solution I find).


Thanks!
Eli


Re: Configurations for Multiple Clusters

2012-03-26 Thread Harsh J
Eli,

Couple of things I've done (Not necessarily ideal):

* Use a git repo for configs with branches. Switch when needed.
* Point HADOOP_CONF_DIR to the right conf-dir location (or have
triggers that do these for you). Similar to above I guess.
* Alias hadoop multiple times to use different --config conf dir
opts, and use convenient aliased names.

Taking a brief look at hadoopenv sources (Last update seems to be mid
2011), I think it would work with the stable versions of Hadoop even
today. You may give it a whirl too if its approach attracts you.

On Mon, Mar 26, 2012 at 9:30 PM, Eli Finkelshteyn iefin...@gmail.com wrote:
 Hi,
 I've recently run into a situation where I'll have access to my local single
 node cluster, and development cluster, and a production cluster. What's the
 best way to work with all three of these, easily switching between which I
 want to use? I found this project
 http://code.google.com/p/hadoopenv/wiki/README that seems like it's trying
 to address my issue, but it hasn't been updated since 2010. Is this still
 the preferred way of managing multiple clusters, or is there something
 better (preferably something easy to install, as I'll need to convince
 others at my company to use whatever solution I find).

 Thanks!
 Eli



-- 
Harsh J


Hadoop Map task - No Task Attempts found

2012-03-26 Thread V_sriram

Hi,

I am running a Map reduce program which would scan the Hbase and will get me
the required data. The Map process runs till 99.47% and after that it simply
waits for the remaining 5 Map tasks to complete. But those remaining 5 Map
tasks remain in 0.00% without any task attempt. Any ideas plz..
-- 
View this message in context: 
http://old.nabble.com/Hadoop-Map-task---No-Task-Attempts-found-tp33544760p33544760.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Hadoop Map task - No Task Attempts found

2012-03-26 Thread T Vinod Gupta
did you check the hadoop job logs to see what could be going on?
thanks

On Mon, Mar 26, 2012 at 12:14 PM, V_sriram vsrira...@gmail.com wrote:


 Hi,

 I am running a Map reduce program which would scan the Hbase and will get
 me
 the required data. The Map process runs till 99.47% and after that it
 simply
 waits for the remaining 5 Map tasks to complete. But those remaining 5 Map
 tasks remain in 0.00% without any task attempt. Any ideas plz..
 --
 View this message in context:
 http://old.nabble.com/Hadoop-Map-task---No-Task-Attempts-found-tp33544760p33544760.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Job tracker service start issue.

2012-03-26 Thread Marcos Ortiz



On 03/23/2012 06:57 AM, kasi subrahmanyam wrote:

Hi Oliver,

I am not sure my suggestion might solve your problem or it might be already
solved on your side.
It seems the task tracker is having a problem accessing the tmp directory.
Try going to the core and mapred site xml and change the tmp directory to a
new one.
If this is not yet working then manually change the permissions of theat
directory  using :
chmod -R 777 tmp
Please, don´t do chmod -R 777 in tmp directory. It´s not recommendable 
for production servers.

The first option is more wise:
1- change the tmp directory in the core and mapreduce files
2- chown this new directory to group hadoop, where are the mapred and 
hdfs users


On Fri, Mar 23, 2012 at 3:33 PM, Olivier Sallouolivier.sal...@irisa.frwrote:



Le 3/23/12 8:50 AM, Manish Bhoge a écrit :

I have Hadoop running on Standalone box. When I am starting deamon for
namenode, secondarynamenode, job tracker, task tracker and data node, it

is starting gracefully. But soon after it start job tracker it doesn't

show up job tracker service. when i run 'jps' it is showing me all the
services including task tracker except Job Tracker.

Is there any time limit that need to set up or is it going into the safe
mode. Because when i saw job tracker log this what it is showing, looks
like it is starting the namenode but soon after it shutdown:

2012-03-22 23:26:04,061 INFO org.apache.hadoop.mapred.JobTracker:

STARTUP_MSG:

/
STARTUP_MSG: Starting JobTracker
STARTUP_MSG:   host = manish/10.131.18.119
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u3
STARTUP_MSG:   build =

file:///data/1/tmp/nightly_2012-02-16_09-46-24_3/hadoop-0.20-0.20.2+923.195-1~maverick
-r 217a3767c48ad11d4632e19a22897677268c40c4; compiled by 'root' on Thu Feb
16 10:22:53 PST 2012

/
2012-03-22 23:26:04,140 INFO

org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens

2012-03-22 23:26:04,141 INFO

org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread,
tokenRemoverScanInterval=60 min(s)

2012-03-22 23:26:04,141 INFO

org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens

2012-03-22 23:26:04,142 INFO org.apache.hadoop.mapred.JobTracker:

Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)

2012-03-22 23:26:04,143 INFO org.apache.hadoop.util.HostsFileReader:

Refreshing hosts (include/exclude) list

2012-03-22 23:26:04,186 INFO org.apache.hadoop.mapred.JobTracker:

Starting jobtracker with owner as mapred

2012-03-22 23:26:04,201 INFO org.apache.hadoop.ipc.Server: Starting

Socket Reader #1 for port 54311

2012-03-22 23:26:04,203 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:

Initializing RPC Metrics with hostName=JobTracker, port=54311

2012-03-22 23:26:04,206 INFO

org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics
with hostName=JobTracker, port=54311

2012-03-22 23:26:09,250 INFO org.mortbay.log: Logging to

org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog

2012-03-22 23:26:09,298 INFO org.apache.hadoop.http.HttpServer: Added

global filtersafety
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)

2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Port

returned by webServer.getConnectors()[0].getLocalPort() before open() is
-1. Opening the listener on 50030

2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer:

listener.getLocalPort() returned 50030
webServer.getConnectors()[0].getLocalPort() returned 50030

2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Jetty

bound to port 50030

2012-03-22 23:26:09,319 INFO org.mortbay.log: jetty-6.1.26.cloudera.1
2012-03-22 23:26:09,517 INFO org.mortbay.log: Started

SelectChannelConnector@0.0.0.0:50030

2012-03-22 23:26:09,519 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:

Initializing JVM Metrics with processName=JobTracker, sessionId=

2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker:

JobTracker up at: 54311

2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker:

JobTracker webserver: 50030

2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: Failed

to operate on mapred.system.dir
(hdfs://localhost:54310/app/hadoop/tmp/mapred/system) because of
permissions.

2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: This

directory should be owned by the user 'mapred (auth:SIMPLE)'

2012-03-22 23:26:09,650 WARN org.apache.hadoop.mapred.JobTracker:

Bailing out ...

org.apache.hadoop.security.AccessControlException: The systemdir


Zero Byte file in HDFS

2012-03-26 Thread Abhishek Pratap Singh
Hi All,

I was just going through the implementation scenario of avoiding or
deleting Zero byte file in HDFS. I m using Hive partition table where the
data in partition come from INSERT OVERWRITE command using the SELECT from
few other tables.
Sometimes 0 Byte files are being generated in those partitions and during
the course of time the amount of these files in the HDFS will increase
enormously, decreasing the performance of hadoop job on that table /
folder. I m looking for best way to avoid generation or deleting the zero
byte file.

I can think of few ways to implement this

1) Programmatically using the Filesystem object and cleaning the zero byte
file.
2) Using Hadoop fs and Linux command combination to identify the zero byte
file and delete it.
3) LazyOutputFormat (Applicable in Hadoop based custom jobs).

Kindly guide on efficient ways to achieve the same.

Regards,
Abhishek


Re: Zero Byte file in HDFS

2012-03-26 Thread Bejoy KS
Hi Abshikek
   I can propose a better solution. Enable merge in hive. So that the 
smaller files would be merged to at lest the hdfs block size(your choice) and 
would benefit subsequent hive jobs on the same.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Abhishek Pratap Singh manu.i...@gmail.com
Date: Mon, 26 Mar 2012 14:20:18 
To: common-user@hadoop.apache.org; u...@hive.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Zero Byte file in HDFS

Hi All,

I was just going through the implementation scenario of avoiding or
deleting Zero byte file in HDFS. I m using Hive partition table where the
data in partition come from INSERT OVERWRITE command using the SELECT from
few other tables.
Sometimes 0 Byte files are being generated in those partitions and during
the course of time the amount of these files in the HDFS will increase
enormously, decreasing the performance of hadoop job on that table /
folder. I m looking for best way to avoid generation or deleting the zero
byte file.

I can think of few ways to implement this

1) Programmatically using the Filesystem object and cleaning the zero byte
file.
2) Using Hadoop fs and Linux command combination to identify the zero byte
file and delete it.
3) LazyOutputFormat (Applicable in Hadoop based custom jobs).

Kindly guide on efficient ways to achieve the same.

Regards,
Abhishek



Re: Zero Byte file in HDFS

2012-03-26 Thread Abhishek Pratap Singh
Thanks Bejoy for this input, Does this merging will combine all the small
files to least block size for the very first mappers of the hive job?
Well i ll explore on this, my interest on deleting the zero byte files from
HDFS comes from reducing the cost of Bookkeeping these files in system. The
metadata of any files in HDFS occupy approx 150kb space, considering
thousand or millions of zero byte file in HDFS will cost a lot.

Regards,
Abhishek


On Mon, Mar 26, 2012 at 2:27 PM, Bejoy KS bejoy.had...@gmail.com wrote:

 Hi Abshikek
   I can propose a better solution. Enable merge in hive. So that the
 smaller files would be merged to at lest the hdfs block size(your choice)
 and would benefit subsequent hive jobs on the same.

 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.

 -Original Message-
 From: Abhishek Pratap Singh manu.i...@gmail.com
 Date: Mon, 26 Mar 2012 14:20:18
 To: common-user@hadoop.apache.org; u...@hive.apache.org
 Reply-To: common-user@hadoop.apache.org
 Subject: Zero Byte file in HDFS

 Hi All,

 I was just going through the implementation scenario of avoiding or
 deleting Zero byte file in HDFS. I m using Hive partition table where the
 data in partition come from INSERT OVERWRITE command using the SELECT from
 few other tables.
 Sometimes 0 Byte files are being generated in those partitions and during
 the course of time the amount of these files in the HDFS will increase
 enormously, decreasing the performance of hadoop job on that table /
 folder. I m looking for best way to avoid generation or deleting the zero
 byte file.

 I can think of few ways to implement this

 1) Programmatically using the Filesystem object and cleaning the zero byte
 file.
 2) Using Hadoop fs and Linux command combination to identify the zero byte
 file and delete it.
 3) LazyOutputFormat (Applicable in Hadoop based custom jobs).

 Kindly guide on efficient ways to achieve the same.

 Regards,
 Abhishek




Re: Zero Byte file in HDFS

2012-03-26 Thread Bejoy KS
Hi Abshiek
Merging happens as a last stage of hive jobs. Say your hive query is 
translated to n MR jobs when you enable merge you can set a size that is needed 
to merge (usually block size). So after n MR jobs there would be a map only job 
that is automatically triggered from hive which merges the smaller files into 
larger ones. The intermediate output files are not retained in hive and hence 
the final large enough files remains in hdfs.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Abhishek Pratap Singh manu.i...@gmail.com
Date: Mon, 26 Mar 2012 14:41:53 
To: common-user@hadoop.apache.org; bejoy.had...@gmail.com
Cc: u...@hive.apache.org
Subject: Re: Zero Byte file in HDFS

Thanks Bejoy for this input, Does this merging will combine all the small
files to least block size for the very first mappers of the hive job?
Well i ll explore on this, my interest on deleting the zero byte files from
HDFS comes from reducing the cost of Bookkeeping these files in system. The
metadata of any files in HDFS occupy approx 150kb space, considering
thousand or millions of zero byte file in HDFS will cost a lot.

Regards,
Abhishek


On Mon, Mar 26, 2012 at 2:27 PM, Bejoy KS bejoy.had...@gmail.com wrote:

 Hi Abshikek
   I can propose a better solution. Enable merge in hive. So that the
 smaller files would be merged to at lest the hdfs block size(your choice)
 and would benefit subsequent hive jobs on the same.

 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.

 -Original Message-
 From: Abhishek Pratap Singh manu.i...@gmail.com
 Date: Mon, 26 Mar 2012 14:20:18
 To: common-user@hadoop.apache.org; u...@hive.apache.org
 Reply-To: common-user@hadoop.apache.org
 Subject: Zero Byte file in HDFS

 Hi All,

 I was just going through the implementation scenario of avoiding or
 deleting Zero byte file in HDFS. I m using Hive partition table where the
 data in partition come from INSERT OVERWRITE command using the SELECT from
 few other tables.
 Sometimes 0 Byte files are being generated in those partitions and during
 the course of time the amount of these files in the HDFS will increase
 enormously, decreasing the performance of hadoop job on that table /
 folder. I m looking for best way to avoid generation or deleting the zero
 byte file.

 I can think of few ways to implement this

 1) Programmatically using the Filesystem object and cleaning the zero byte
 file.
 2) Using Hadoop fs and Linux command combination to identify the zero byte
 file and delete it.
 3) LazyOutputFormat (Applicable in Hadoop based custom jobs).

 Kindly guide on efficient ways to achieve the same.

 Regards,
 Abhishek





Re: Zero Byte file in HDFS

2012-03-26 Thread Abhishek Pratap Singh
This sounds goods as long as the output of select query has at least one
row. But in my case it can be zero rows.

Thanks,
Abhishek

On Mon, Mar 26, 2012 at 2:48 PM, Bejoy KS bejoy.had...@gmail.com wrote:

 **
 Hi Abshiek
 Merging happens as a last stage of hive jobs. Say your hive query is
 translated to n MR jobs when you enable merge you can set a size that is
 needed to merge (usually block size). So after n MR jobs there would be a
 map only job that is automatically triggered from hive which merges the
 smaller files into larger ones. The intermediate output files are not
 retained in hive and hence the final large enough files remains in hdfs.

 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.
 --
 *From: * Abhishek Pratap Singh manu.i...@gmail.com
 *Date: *Mon, 26 Mar 2012 14:41:53 -0700
 *To: *common-user@hadoop.apache.org; bejoy.had...@gmail.com
 *Cc: *u...@hive.apache.org
 *Subject: *Re: Zero Byte file in HDFS

 Thanks Bejoy for this input, Does this merging will combine all the small
 files to least block size for the very first mappers of the hive job?
 Well i ll explore on this, my interest on deleting the zero byte files
 from HDFS comes from reducing the cost of Bookkeeping these files in
 system. The metadata of any files in HDFS occupy approx 150kb space,
 considering thousand or millions of zero byte file in HDFS will cost a lot.

 Regards,
 Abhishek


 On Mon, Mar 26, 2012 at 2:27 PM, Bejoy KS bejoy.had...@gmail.com wrote:

 Hi Abshikek
   I can propose a better solution. Enable merge in hive. So that the
 smaller files would be merged to at lest the hdfs block size(your choice)
 and would benefit subsequent hive jobs on the same.

 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.

 -Original Message-
 From: Abhishek Pratap Singh manu.i...@gmail.com
 Date: Mon, 26 Mar 2012 14:20:18
 To: common-user@hadoop.apache.org; u...@hive.apache.org
 Reply-To: common-user@hadoop.apache.org
 Subject: Zero Byte file in HDFS

 Hi All,

 I was just going through the implementation scenario of avoiding or
 deleting Zero byte file in HDFS. I m using Hive partition table where the
 data in partition come from INSERT OVERWRITE command using the SELECT from
 few other tables.
 Sometimes 0 Byte files are being generated in those partitions and during
 the course of time the amount of these files in the HDFS will increase
 enormously, decreasing the performance of hadoop job on that table /
 folder. I m looking for best way to avoid generation or deleting the zero
 byte file.

 I can think of few ways to implement this

 1) Programmatically using the Filesystem object and cleaning the zero byte
 file.
 2) Using Hadoop fs and Linux command combination to identify the zero byte
 file and delete it.
 3) LazyOutputFormat (Applicable in Hadoop based custom jobs).

 Kindly guide on efficient ways to achieve the same.

 Regards,
 Abhishek





Separating mapper intermediate files

2012-03-26 Thread aayush
I am a newbie to Hadoop and map reduce. I am running a single node hadoop
setup. I have created 2 partitions on my HDD. I want the mapper intermediate
files (i.e. the spill files and the mapper output) to be sent to a file
system on Partition1 whereas everything else including HDFS should be run on
partition2. I am struggling to find the appropriate parametes in the conf
files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am
not sure how to use what. I would really appreciate if someone could tell me
exactly which parameters to modify to achieve the goal.

Furthermore, can someone also tell me how to save intermediate mapper
files(spill outputs) and where are they saved.

Thanks in advance for any help.

--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3859787.html
Sent from the Users mailing list archive at Nabble.com.


Re: Avro, Hadoop0.20.2, Jackson Error

2012-03-26 Thread Scott Carey
Does it still happen if you configure

avro-tools to use 

dependency
  groupIdorg.apache.avro/groupId
  artifactIdavro-tools/artifactId
  version1.6.3/version
  classifiernodeps/classifier
/dependency


?

You have two hadoop's, two jacksons, and even two avro:avro artifacts in
your classpath if you use the avro bundle jar with a default classifier.

avro-tools jar is not intended for inclusion in a project, as it is a jar
with dependencies inside.
https://cwiki.apache.org/confluence/display/AVRO/Build+Documentation#BuildD
ocumentation-ProjectStructure

On 3/26/12 7:52 PM, Deepak Nettem deepaknet...@gmail.com wrote:

When I include some Avro code in my Mapper, I get this error:

Error:
org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$F
eature;)Lorg/codehaus/jackson/JsonFactory;

Particularly, just these two lines of code:

InputStream in =
getClass().getResourceAsStream(schema.avsc);
Schema schema = Schema.parse(in);

This code works perfectly when run as a stand alone application outside of
Hadoop. Why do I get this error? and what's the best way to get rid of it?

I am using Hadoop 0.20.2, and writing code in the new API.

I found that the Hadoop lib directory contains jackson-core-asl-1.0.1.jar
and jackson-mapper-asl-1.0.1.jar.

I removed these, but got this error:
hadoop Exception in thread main java.lang.
NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException

I am using Maven as a build tool, and my pom.xml has this dependency:

dependency
groupIdorg.codehaus.jackson/groupId
  artifactIdjackson-mapper-asl/artifactId
  version1.5.2/version
  scopecompile/scope
/dependency




I added the dependency:


dependency
groupIdorg.codehaus.jackson/groupId
  artifactIdjackson-core-asl/artifactId
  version1.5.2/version
  scopecompile/scope
/dependency

But that still gives me this error:

Error: org.codehaus.jackson.
JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus
/jackson/JsonFactory;

-

I also tried replacing the earlier dependencies with these:

   dependency
groupIdorg.apache.avro/
groupId
artifactIdavro-tools/artifactId
version1.6.3/version
/dependency

dependency
groupIdorg.apache.avro/groupId
artifactIdavro/artifactId
version1.6.3/version
/dependency


dependency
groupIdorg.codehaus.jackson/groupId
  artifactIdjackson-mapper-asl/artifactId
  version1.8.8/version
  scopecompile/scope
/dependency

dependency
groupIdorg.codehaus.jackson/groupId
  artifactIdjackson-core-asl/artifactId
  version1.8.8/version
  scopecompile/scope
/dependency

And this is my app dependency tree:

[INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ AvroTest ---
[INFO] org.avrotest:AvroTest:jar:1.0-SNAPSHOT
[INFO] +- junit:junit:jar:3.8.1:test (scope not updated to compile)
[INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
[INFO] +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
[INFO] +- net.sf.json-lib:json-lib:jar:jdk15:2.3:compile
[INFO] |  +- commons-beanutils:commons-beanutils:jar:1.8.0:compile
[INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  +- commons-lang:commons-lang:jar:2.4:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] |  \- net.sf.ezmorph:ezmorph:jar:1.0.6:compile
[INFO] +- org.apache.avro:avro-tools:jar:1.6.3:compile
[INFO] |  \- org.slf4j:slf4j-api:jar:1.6.4:compile
[INFO] +- org.apache.avro:avro:jar:1.6.3:compile
[INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] |  \- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile
[INFO] \- org.apache.hadoop:hadoop-core:jar:0.20.2:compile
[INFO]+- commons-cli:commons-cli:jar:1.2:compile
[INFO]+- xmlenc:xmlenc:jar:0.52:compile
[INFO]+- commons-httpclient:commons-httpclient:jar:3.0.1:compile
[INFO]+- commons-codec:commons-codec:jar:1.3:compile
[INFO]+- commons-net:commons-net:jar:1.4.1:compile
[INFO]+- org.mortbay.jetty:jetty:jar:6.1.14:compile
[INFO]+- org.mortbay.jetty:jetty-util:jar:6.1.14:compile
[INFO]+- tomcat:jasper-runtime:jar:5.5.12:compile
[INFO]+- tomcat:jasper-compiler:jar:5.5.12:compile
[INFO]+- org.mortbay.jetty:jsp-api-2.1:jar:6.1.14:compile
[INFO]+- org.mortbay.jetty:jsp-2.1:jar:6.1.14:compile
[INFO]|  \- ant:ant:jar:1.6.5:compile
[INFO]+- commons-el:commons-el:jar:1.0:compile
[INFO]+- net.java.dev.jets3t:jets3t:jar:0.7.1:compile
[INFO]+- org.mortbay.jetty:servlet-api-2.5:jar:6.1.14:compile
[INFO]+- net.sf.kosmosfs:kfs:jar:0.3:compile
[INFO]+- hsqldb:hsqldb:jar:1.8.0.10:compile
[INFO]+- oro:oro:jar:2.0.8:compile
[INFO]\- org.eclipse.jdt:core:jar:3.1.1:compile

I still get the same error.

Somebody please please help me with this. I need to resolve this asap!!

Best,
Deepak



Re: Separating mapper intermediate files

2012-03-26 Thread Harsh J
Hello Aayush,

Three things that'd help clear your confusion:
1. dfs.data.dir controls where HDFS blocks are to be stored. Set this
to a partition1 path.
2. mapred.local.dir controls where intermediate task data go to. Set
this to a partition2 path.

 Furthermore, can someone also tell me how to save intermediate mapper
 files(spill outputs) and where are they saved.

Intermediate outputs are handled by the framework itself (There is no
user/manual work involved), and are saved inside attempt directories
under mapred.local.dir.

On Tue, Mar 27, 2012 at 4:46 AM, aayush aayushgupta...@gmail.com wrote:
 I am a newbie to Hadoop and map reduce. I am running a single node hadoop
 setup. I have created 2 partitions on my HDD. I want the mapper intermediate
 files (i.e. the spill files and the mapper output) to be sent to a file
 system on Partition1 whereas everything else including HDFS should be run on
 partition2. I am struggling to find the appropriate parametes in the conf
 files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am
 not sure how to use what. I would really appreciate if someone could tell me
 exactly which parameters to modify to achieve the goal.

-- 
Harsh J