custom partitioner
My custom parititoner is: public class PopulationPartitioner extends Partitioner IntWritable, Chromosome implements Configurable { @Override public int getPartition(IntWritable key, Chromosome value, int numOfPartitions) { int partition = key.get(); if (partition 0 || partition = numOfPartitions) { partition = numOfPartitions-1; } System.out.println(partition +partition ); return partition; } @Override public Configuration getConf() { // TODO Auto-generated method stub return conf; } @Override public void setConf(Configuration arg0) { // TODO Auto-generated method stub conf = arg0; } private Configuration conf; } And my mapred configuration file is : configuration property namemapred.job.tracker/name valuelocalhost:9001/value /property property namemapred.tasktracker.reduce.tasks.maximum/name value4/value /property /configuration Thanks again. This shouldn't be the case at all. Can you share your Partitioner code and the job.xml of the job that showed this behavior? In any case: How do you set the numberOfReducer to 4? 2012/3/23 Harun Raşit ER harunrasi...@gmail.com: I wrote a custom partitioner. But when I work as standalone or pseudo-distributed mode, the number of partitions is always 1. I set the numberOfReducer to 4, but the numOfPartitions parameter of custom partitioner is still 1 and all my four mappers' results are going to 1 reducer. The other reducers yield empty files. How can i set the number of partitions in standalone or pseudo-distributed mode? thanks for your helps. -- Harsh J
Re: custom partitioner
Thanks for your help. I assigned key values from a static variable and when i ran in eclipse platform, i saw the right key values, but after distributed-mode debug, i have seen all my key values are 0. On 3/25/12, Harsh J ha...@cloudera.com wrote: Harun, Does your map task stdout logs show varying values for partition? Seems to me like all your keys are somehow outside of [0, numOfPartitions), and hence go to the last partition, per your logic. 2012/3/25 Harun Raşit ER harunrasi...@gmail.com: public int getPartition(IntWritable key, Chromosome value, int numOfPartitions) { int partition = key.get(); if (partition 0 || partition = numOfPartitions) { partition = numOfPartitions-1; } System.out.println(partition +partition ); return partition; } I wrote the custom partitioner above. But the problem is about the third parameter, numOfPartitions. It is always 1 in pseudo-distributed mode. I have 4 mappers and 4 reducers, but only one of the reducers uses the real values. The others yield nothing, just empty files. When I remove the if statement, hadoop complains about the partition number as illegal partition for How can i set the number of partitions in pseudo-distributed mode? Thanks. -- Harsh J
Configurations for Multiple Clusters
Hi, I've recently run into a situation where I'll have access to my local single node cluster, and development cluster, and a production cluster. What's the best way to work with all three of these, easily switching between which I want to use? I found this project http://code.google.com/p/hadoopenv/wiki/README that seems like it's trying to address my issue, but it hasn't been updated since 2010. Is this still the preferred way of managing multiple clusters, or is there something better (preferably something easy to install, as I'll need to convince others at my company to use whatever solution I find). Thanks! Eli
Re: Configurations for Multiple Clusters
Eli, Couple of things I've done (Not necessarily ideal): * Use a git repo for configs with branches. Switch when needed. * Point HADOOP_CONF_DIR to the right conf-dir location (or have triggers that do these for you). Similar to above I guess. * Alias hadoop multiple times to use different --config conf dir opts, and use convenient aliased names. Taking a brief look at hadoopenv sources (Last update seems to be mid 2011), I think it would work with the stable versions of Hadoop even today. You may give it a whirl too if its approach attracts you. On Mon, Mar 26, 2012 at 9:30 PM, Eli Finkelshteyn iefin...@gmail.com wrote: Hi, I've recently run into a situation where I'll have access to my local single node cluster, and development cluster, and a production cluster. What's the best way to work with all three of these, easily switching between which I want to use? I found this project http://code.google.com/p/hadoopenv/wiki/README that seems like it's trying to address my issue, but it hasn't been updated since 2010. Is this still the preferred way of managing multiple clusters, or is there something better (preferably something easy to install, as I'll need to convince others at my company to use whatever solution I find). Thanks! Eli -- Harsh J
Hadoop Map task - No Task Attempts found
Hi, I am running a Map reduce program which would scan the Hbase and will get me the required data. The Map process runs till 99.47% and after that it simply waits for the remaining 5 Map tasks to complete. But those remaining 5 Map tasks remain in 0.00% without any task attempt. Any ideas plz.. -- View this message in context: http://old.nabble.com/Hadoop-Map-task---No-Task-Attempts-found-tp33544760p33544760.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoop Map task - No Task Attempts found
did you check the hadoop job logs to see what could be going on? thanks On Mon, Mar 26, 2012 at 12:14 PM, V_sriram vsrira...@gmail.com wrote: Hi, I am running a Map reduce program which would scan the Hbase and will get me the required data. The Map process runs till 99.47% and after that it simply waits for the remaining 5 Map tasks to complete. But those remaining 5 Map tasks remain in 0.00% without any task attempt. Any ideas plz.. -- View this message in context: http://old.nabble.com/Hadoop-Map-task---No-Task-Attempts-found-tp33544760p33544760.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Job tracker service start issue.
On 03/23/2012 06:57 AM, kasi subrahmanyam wrote: Hi Oliver, I am not sure my suggestion might solve your problem or it might be already solved on your side. It seems the task tracker is having a problem accessing the tmp directory. Try going to the core and mapred site xml and change the tmp directory to a new one. If this is not yet working then manually change the permissions of theat directory using : chmod -R 777 tmp Please, don´t do chmod -R 777 in tmp directory. It´s not recommendable for production servers. The first option is more wise: 1- change the tmp directory in the core and mapreduce files 2- chown this new directory to group hadoop, where are the mapred and hdfs users On Fri, Mar 23, 2012 at 3:33 PM, Olivier Sallouolivier.sal...@irisa.frwrote: Le 3/23/12 8:50 AM, Manish Bhoge a écrit : I have Hadoop running on Standalone box. When I am starting deamon for namenode, secondarynamenode, job tracker, task tracker and data node, it is starting gracefully. But soon after it start job tracker it doesn't show up job tracker service. when i run 'jps' it is showing me all the services including task tracker except Job Tracker. Is there any time limit that need to set up or is it going into the safe mode. Because when i saw job tracker log this what it is showing, looks like it is starting the namenode but soon after it shutdown: 2012-03-22 23:26:04,061 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: / STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = manish/10.131.18.119 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u3 STARTUP_MSG: build = file:///data/1/tmp/nightly_2012-02-16_09-46-24_3/hadoop-0.20-0.20.2+923.195-1~maverick -r 217a3767c48ad11d4632e19a22897677268c40c4; compiled by 'root' on Thu Feb 16 10:22:53 PST 2012 / 2012-03-22 23:26:04,140 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2012-03-22 23:26:04,141 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2012-03-22 23:26:04,141 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2012-03-22 23:26:04,142 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) 2012-03-22 23:26:04,143 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2012-03-22 23:26:04,186 INFO org.apache.hadoop.mapred.JobTracker: Starting jobtracker with owner as mapred 2012-03-22 23:26:04,201 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 54311 2012-03-22 23:26:04,203 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=JobTracker, port=54311 2012-03-22 23:26:04,206 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=JobTracker, port=54311 2012-03-22 23:26:09,250 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-03-22 23:26:09,298 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50030 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50030 webServer.getConnectors()[0].getLocalPort() returned 50030 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50030 2012-03-22 23:26:09,319 INFO org.mortbay.log: jetty-6.1.26.cloudera.1 2012-03-22 23:26:09,517 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50030 2012-03-22 23:26:09,519 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 54311 2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030 2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://localhost:54310/app/hadoop/tmp/mapred/system) because of permissions. 2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned by the user 'mapred (auth:SIMPLE)' 2012-03-22 23:26:09,650 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ... org.apache.hadoop.security.AccessControlException: The systemdir
Zero Byte file in HDFS
Hi All, I was just going through the implementation scenario of avoiding or deleting Zero byte file in HDFS. I m using Hive partition table where the data in partition come from INSERT OVERWRITE command using the SELECT from few other tables. Sometimes 0 Byte files are being generated in those partitions and during the course of time the amount of these files in the HDFS will increase enormously, decreasing the performance of hadoop job on that table / folder. I m looking for best way to avoid generation or deleting the zero byte file. I can think of few ways to implement this 1) Programmatically using the Filesystem object and cleaning the zero byte file. 2) Using Hadoop fs and Linux command combination to identify the zero byte file and delete it. 3) LazyOutputFormat (Applicable in Hadoop based custom jobs). Kindly guide on efficient ways to achieve the same. Regards, Abhishek
Re: Zero Byte file in HDFS
Hi Abshikek I can propose a better solution. Enable merge in hive. So that the smaller files would be merged to at lest the hdfs block size(your choice) and would benefit subsequent hive jobs on the same. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhishek Pratap Singh manu.i...@gmail.com Date: Mon, 26 Mar 2012 14:20:18 To: common-user@hadoop.apache.org; u...@hive.apache.org Reply-To: common-user@hadoop.apache.org Subject: Zero Byte file in HDFS Hi All, I was just going through the implementation scenario of avoiding or deleting Zero byte file in HDFS. I m using Hive partition table where the data in partition come from INSERT OVERWRITE command using the SELECT from few other tables. Sometimes 0 Byte files are being generated in those partitions and during the course of time the amount of these files in the HDFS will increase enormously, decreasing the performance of hadoop job on that table / folder. I m looking for best way to avoid generation or deleting the zero byte file. I can think of few ways to implement this 1) Programmatically using the Filesystem object and cleaning the zero byte file. 2) Using Hadoop fs and Linux command combination to identify the zero byte file and delete it. 3) LazyOutputFormat (Applicable in Hadoop based custom jobs). Kindly guide on efficient ways to achieve the same. Regards, Abhishek
Re: Zero Byte file in HDFS
Thanks Bejoy for this input, Does this merging will combine all the small files to least block size for the very first mappers of the hive job? Well i ll explore on this, my interest on deleting the zero byte files from HDFS comes from reducing the cost of Bookkeeping these files in system. The metadata of any files in HDFS occupy approx 150kb space, considering thousand or millions of zero byte file in HDFS will cost a lot. Regards, Abhishek On Mon, Mar 26, 2012 at 2:27 PM, Bejoy KS bejoy.had...@gmail.com wrote: Hi Abshikek I can propose a better solution. Enable merge in hive. So that the smaller files would be merged to at lest the hdfs block size(your choice) and would benefit subsequent hive jobs on the same. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhishek Pratap Singh manu.i...@gmail.com Date: Mon, 26 Mar 2012 14:20:18 To: common-user@hadoop.apache.org; u...@hive.apache.org Reply-To: common-user@hadoop.apache.org Subject: Zero Byte file in HDFS Hi All, I was just going through the implementation scenario of avoiding or deleting Zero byte file in HDFS. I m using Hive partition table where the data in partition come from INSERT OVERWRITE command using the SELECT from few other tables. Sometimes 0 Byte files are being generated in those partitions and during the course of time the amount of these files in the HDFS will increase enormously, decreasing the performance of hadoop job on that table / folder. I m looking for best way to avoid generation or deleting the zero byte file. I can think of few ways to implement this 1) Programmatically using the Filesystem object and cleaning the zero byte file. 2) Using Hadoop fs and Linux command combination to identify the zero byte file and delete it. 3) LazyOutputFormat (Applicable in Hadoop based custom jobs). Kindly guide on efficient ways to achieve the same. Regards, Abhishek
Re: Zero Byte file in HDFS
Hi Abshiek Merging happens as a last stage of hive jobs. Say your hive query is translated to n MR jobs when you enable merge you can set a size that is needed to merge (usually block size). So after n MR jobs there would be a map only job that is automatically triggered from hive which merges the smaller files into larger ones. The intermediate output files are not retained in hive and hence the final large enough files remains in hdfs. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhishek Pratap Singh manu.i...@gmail.com Date: Mon, 26 Mar 2012 14:41:53 To: common-user@hadoop.apache.org; bejoy.had...@gmail.com Cc: u...@hive.apache.org Subject: Re: Zero Byte file in HDFS Thanks Bejoy for this input, Does this merging will combine all the small files to least block size for the very first mappers of the hive job? Well i ll explore on this, my interest on deleting the zero byte files from HDFS comes from reducing the cost of Bookkeeping these files in system. The metadata of any files in HDFS occupy approx 150kb space, considering thousand or millions of zero byte file in HDFS will cost a lot. Regards, Abhishek On Mon, Mar 26, 2012 at 2:27 PM, Bejoy KS bejoy.had...@gmail.com wrote: Hi Abshikek I can propose a better solution. Enable merge in hive. So that the smaller files would be merged to at lest the hdfs block size(your choice) and would benefit subsequent hive jobs on the same. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhishek Pratap Singh manu.i...@gmail.com Date: Mon, 26 Mar 2012 14:20:18 To: common-user@hadoop.apache.org; u...@hive.apache.org Reply-To: common-user@hadoop.apache.org Subject: Zero Byte file in HDFS Hi All, I was just going through the implementation scenario of avoiding or deleting Zero byte file in HDFS. I m using Hive partition table where the data in partition come from INSERT OVERWRITE command using the SELECT from few other tables. Sometimes 0 Byte files are being generated in those partitions and during the course of time the amount of these files in the HDFS will increase enormously, decreasing the performance of hadoop job on that table / folder. I m looking for best way to avoid generation or deleting the zero byte file. I can think of few ways to implement this 1) Programmatically using the Filesystem object and cleaning the zero byte file. 2) Using Hadoop fs and Linux command combination to identify the zero byte file and delete it. 3) LazyOutputFormat (Applicable in Hadoop based custom jobs). Kindly guide on efficient ways to achieve the same. Regards, Abhishek
Re: Zero Byte file in HDFS
This sounds goods as long as the output of select query has at least one row. But in my case it can be zero rows. Thanks, Abhishek On Mon, Mar 26, 2012 at 2:48 PM, Bejoy KS bejoy.had...@gmail.com wrote: ** Hi Abshiek Merging happens as a last stage of hive jobs. Say your hive query is translated to n MR jobs when you enable merge you can set a size that is needed to merge (usually block size). So after n MR jobs there would be a map only job that is automatically triggered from hive which merges the smaller files into larger ones. The intermediate output files are not retained in hive and hence the final large enough files remains in hdfs. Regards Bejoy KS Sent from handheld, please excuse typos. -- *From: * Abhishek Pratap Singh manu.i...@gmail.com *Date: *Mon, 26 Mar 2012 14:41:53 -0700 *To: *common-user@hadoop.apache.org; bejoy.had...@gmail.com *Cc: *u...@hive.apache.org *Subject: *Re: Zero Byte file in HDFS Thanks Bejoy for this input, Does this merging will combine all the small files to least block size for the very first mappers of the hive job? Well i ll explore on this, my interest on deleting the zero byte files from HDFS comes from reducing the cost of Bookkeeping these files in system. The metadata of any files in HDFS occupy approx 150kb space, considering thousand or millions of zero byte file in HDFS will cost a lot. Regards, Abhishek On Mon, Mar 26, 2012 at 2:27 PM, Bejoy KS bejoy.had...@gmail.com wrote: Hi Abshikek I can propose a better solution. Enable merge in hive. So that the smaller files would be merged to at lest the hdfs block size(your choice) and would benefit subsequent hive jobs on the same. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Abhishek Pratap Singh manu.i...@gmail.com Date: Mon, 26 Mar 2012 14:20:18 To: common-user@hadoop.apache.org; u...@hive.apache.org Reply-To: common-user@hadoop.apache.org Subject: Zero Byte file in HDFS Hi All, I was just going through the implementation scenario of avoiding or deleting Zero byte file in HDFS. I m using Hive partition table where the data in partition come from INSERT OVERWRITE command using the SELECT from few other tables. Sometimes 0 Byte files are being generated in those partitions and during the course of time the amount of these files in the HDFS will increase enormously, decreasing the performance of hadoop job on that table / folder. I m looking for best way to avoid generation or deleting the zero byte file. I can think of few ways to implement this 1) Programmatically using the Filesystem object and cleaning the zero byte file. 2) Using Hadoop fs and Linux command combination to identify the zero byte file and delete it. 3) LazyOutputFormat (Applicable in Hadoop based custom jobs). Kindly guide on efficient ways to achieve the same. Regards, Abhishek
Separating mapper intermediate files
I am a newbie to Hadoop and map reduce. I am running a single node hadoop setup. I have created 2 partitions on my HDD. I want the mapper intermediate files (i.e. the spill files and the mapper output) to be sent to a file system on Partition1 whereas everything else including HDFS should be run on partition2. I am struggling to find the appropriate parametes in the conf files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am not sure how to use what. I would really appreciate if someone could tell me exactly which parameters to modify to achieve the goal. Furthermore, can someone also tell me how to save intermediate mapper files(spill outputs) and where are they saved. Thanks in advance for any help. -- View this message in context: http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3859787.html Sent from the Users mailing list archive at Nabble.com.
Re: Avro, Hadoop0.20.2, Jackson Error
Does it still happen if you configure avro-tools to use dependency groupIdorg.apache.avro/groupId artifactIdavro-tools/artifactId version1.6.3/version classifiernodeps/classifier /dependency ? You have two hadoop's, two jacksons, and even two avro:avro artifacts in your classpath if you use the avro bundle jar with a default classifier. avro-tools jar is not intended for inclusion in a project, as it is a jar with dependencies inside. https://cwiki.apache.org/confluence/display/AVRO/Build+Documentation#BuildD ocumentation-ProjectStructure On 3/26/12 7:52 PM, Deepak Nettem deepaknet...@gmail.com wrote: When I include some Avro code in my Mapper, I get this error: Error: org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$F eature;)Lorg/codehaus/jackson/JsonFactory; Particularly, just these two lines of code: InputStream in = getClass().getResourceAsStream(schema.avsc); Schema schema = Schema.parse(in); This code works perfectly when run as a stand alone application outside of Hadoop. Why do I get this error? and what's the best way to get rid of it? I am using Hadoop 0.20.2, and writing code in the new API. I found that the Hadoop lib directory contains jackson-core-asl-1.0.1.jar and jackson-mapper-asl-1.0.1.jar. I removed these, but got this error: hadoop Exception in thread main java.lang. NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException I am using Maven as a build tool, and my pom.xml has this dependency: dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-mapper-asl/artifactId version1.5.2/version scopecompile/scope /dependency I added the dependency: dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-core-asl/artifactId version1.5.2/version scopecompile/scope /dependency But that still gives me this error: Error: org.codehaus.jackson. JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus /jackson/JsonFactory; - I also tried replacing the earlier dependencies with these: dependency groupIdorg.apache.avro/ groupId artifactIdavro-tools/artifactId version1.6.3/version /dependency dependency groupIdorg.apache.avro/groupId artifactIdavro/artifactId version1.6.3/version /dependency dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-mapper-asl/artifactId version1.8.8/version scopecompile/scope /dependency dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-core-asl/artifactId version1.8.8/version scopecompile/scope /dependency And this is my app dependency tree: [INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ AvroTest --- [INFO] org.avrotest:AvroTest:jar:1.0-SNAPSHOT [INFO] +- junit:junit:jar:3.8.1:test (scope not updated to compile) [INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile [INFO] +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile [INFO] +- net.sf.json-lib:json-lib:jar:jdk15:2.3:compile [INFO] | +- commons-beanutils:commons-beanutils:jar:1.8.0:compile [INFO] | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | +- commons-lang:commons-lang:jar:2.4:compile [INFO] | +- commons-logging:commons-logging:jar:1.1.1:compile [INFO] | \- net.sf.ezmorph:ezmorph:jar:1.0.6:compile [INFO] +- org.apache.avro:avro-tools:jar:1.6.3:compile [INFO] | \- org.slf4j:slf4j-api:jar:1.6.4:compile [INFO] +- org.apache.avro:avro:jar:1.6.3:compile [INFO] | +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | \- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile [INFO] \- org.apache.hadoop:hadoop-core:jar:0.20.2:compile [INFO]+- commons-cli:commons-cli:jar:1.2:compile [INFO]+- xmlenc:xmlenc:jar:0.52:compile [INFO]+- commons-httpclient:commons-httpclient:jar:3.0.1:compile [INFO]+- commons-codec:commons-codec:jar:1.3:compile [INFO]+- commons-net:commons-net:jar:1.4.1:compile [INFO]+- org.mortbay.jetty:jetty:jar:6.1.14:compile [INFO]+- org.mortbay.jetty:jetty-util:jar:6.1.14:compile [INFO]+- tomcat:jasper-runtime:jar:5.5.12:compile [INFO]+- tomcat:jasper-compiler:jar:5.5.12:compile [INFO]+- org.mortbay.jetty:jsp-api-2.1:jar:6.1.14:compile [INFO]+- org.mortbay.jetty:jsp-2.1:jar:6.1.14:compile [INFO]| \- ant:ant:jar:1.6.5:compile [INFO]+- commons-el:commons-el:jar:1.0:compile [INFO]+- net.java.dev.jets3t:jets3t:jar:0.7.1:compile [INFO]+- org.mortbay.jetty:servlet-api-2.5:jar:6.1.14:compile [INFO]+- net.sf.kosmosfs:kfs:jar:0.3:compile [INFO]+- hsqldb:hsqldb:jar:1.8.0.10:compile [INFO]+- oro:oro:jar:2.0.8:compile [INFO]\- org.eclipse.jdt:core:jar:3.1.1:compile I still get the same error. Somebody please please help me with this. I need to resolve this asap!! Best, Deepak
Re: Separating mapper intermediate files
Hello Aayush, Three things that'd help clear your confusion: 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this to a partition1 path. 2. mapred.local.dir controls where intermediate task data go to. Set this to a partition2 path. Furthermore, can someone also tell me how to save intermediate mapper files(spill outputs) and where are they saved. Intermediate outputs are handled by the framework itself (There is no user/manual work involved), and are saved inside attempt directories under mapred.local.dir. On Tue, Mar 27, 2012 at 4:46 AM, aayush aayushgupta...@gmail.com wrote: I am a newbie to Hadoop and map reduce. I am running a single node hadoop setup. I have created 2 partitions on my HDD. I want the mapper intermediate files (i.e. the spill files and the mapper output) to be sent to a file system on Partition1 whereas everything else including HDFS should be run on partition2. I am struggling to find the appropriate parametes in the conf files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am not sure how to use what. I would really appreciate if someone could tell me exactly which parameters to modify to achieve the goal. -- Harsh J