Re: is there more smarter way to execute a hadoop cluster?
hello, harsh. to use MultipleOutput class, I need to use a Job class to set it as a first argument to configure about my hadoop job. |*addNamedOutput http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#addNamedOutput%28org.apache.hadoop.mapreduce.Job,%20java.lang.String,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class%29*(Job http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/Job.html job,String http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true namedOutput,Class http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true? extendsOutputFormat http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/OutputFormat.html outputFormatClass,Class http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true? keyClass,Class http://java.sun.com/javase/6/docs/api/java/lang/Class.html?is-external=true? valueClass)| Adds a named output for the job. AYK, Job class is deprecated in 0.21.0. to submit my job in a cluster like runJob(). How are I going to do? Junyoung Kim (juneng...@gmail.com) On 02/24/2011 04:12 PM, Harsh J wrote: Hello, On Thu, Feb 24, 2011 at 12:25 PM, Jun Young Kimjuneng...@gmail.com wrote: Hi, I executed my cluster by this way. call a command in shell directly. What are you doing within your testCluster.jar? If you are simply submitting a job, you can use a Driver method and get rid of all these hassles. JobClient and Job classes both support submitting jobs from Java API itself. Please read the tutorial on submitting application code via code itself: http://developer.yahoo.com/hadoop/tutorial/module4.html#driver Notice the last line in the code presented there, which submits a job itself. Using runJob() also prints your progress/counters etc. The way you've implemented this looks unnecessary when your Jar itself can be made runnable with a Driver!
Re: is there more smarter way to execute a hadoop cluster?
Hey, On Thu, Feb 24, 2011 at 2:36 PM, Jun Young Kim juneng...@gmail.com wrote: How are I going to do? In new API, 'Job' class too has a Job.submit() and Job.waitForCompletion(bool) method. Please see the API here: http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html -- Harsh J www.harshj.com
Re: is there more smarter way to execute a hadoop cluster?
Now, I am using Job.waitForCompletion(bool) method to submit my job. but, my jar cannot open hdfs files. and also after submitting my job, I couldn't look job history on admin pages(jobtracker.jsp) even if my job is succeeded.. for example) I set the input path as hdfs:/user/juneng/1.input. but, look this error.. Wrong FS: hdfs:/user/juneng/1.input, expected: file:/// Junyoung Kim (juneng...@gmail.com) On 02/24/2011 06:41 PM, Harsh J wrote: In new API, 'Job' class too has a Job.submit() and Job.waitForCompletion(bool) method. Please see the API here: http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html
java.io.FileNotFoundException: File /var/lib/hadoop-0.20/cache/mapred/mapred/staging/job/
Hi all, This issue could very well be related to the Cloudera distribution (CDH3b4) I use, but maybe someone knows the solution: I configured a Job, something like this: Configuration conf = getConf(); // ... set configuration conf.set(mapred.jar, localJarFile.toString()) // tracker, zookeeper, hbase etc. Job job = new Job(conf); // map: job.setMapperClass(DataImportMap.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(Put.class); // reduce: TableMapReduceUtil.initTableReducerJob(MyTable, DataImportReduce.class, job); FileInputFormat.addInputPath(job, new Path(inputData)); // execute: job.waitForCompletion(true); Now the server throws a strange exception below, see the stacktrace below. When i take look at the hdfs file system - through hdfs fuse - the file is there, it really is the jar that contains my mapred classes. Any clue wat goes wrong here? Thanks, Job - java.io.FileNotFoundException: File /var/lib/hadoop-0.20/cache/mapred/mapred/staging/job/.staging/job_201102241026_0002/job.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157) at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:61) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1303) at org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:273) at org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:381) at org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:371) at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:198) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1154) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1129) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1055) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2212) at org.apache.hadoop.mapred.TaskTracker $TaskLauncher.run(TaskTracker.java:2176) -- Drs. Job Tiel Groenestege GridLine - Intranet en Zoeken GridLine Keizersgracht 520 1017 EK Amsterdam www: http://www.gridline.nl mail: j...@gridline.nl tel: +31 20 616 2050 fax: +31 20 616 2051 De inhoud van dit bericht en de eventueel daarbij behorende bijlagen zijn persoonlijk gericht aan en derhalve uitsluitend bestemd voor de geadresseerde. Zij kunnen gegevens met betrekking tot een derde bevatten. De ontvanger die niet de geadresseerde is, noch bevoegd is dit bericht namens geadresseerde te ontvangen, wordt verzocht de afzender onmiddellijk op de hoogte te stellen van de ontvangst. Elk gebruik van de inhoud van dit bericht en/of van de daarbij behorende bijlagen door een ander dan de geadresseerde is onrechtmatig jegens afzender respectievelijk de hiervoor bedoelde derde.
Re: Benchmarking pipelined MapReduce jobs
Thanks for your help! I had a look at the gridmix_config.xml file in the gridmix2 directory. However, I'm having difficulties to map the descriptions of the simulated jobs from the README-file 1) Three stage map/reduce job 2) Large sort of variable key/value size 3) Reference select 4) API text sort (java, streaming) 5) Jobs with combiner (word count jobs) to the jobs names in gridmix_config.xml: -streamSort -javaSort -combiner -monsterQuery -webdataScan -webdataSort I would really appreciate any help, getting the right configuration! Which job do I have to enable to simulate a pipelined execution as described in 1) Three stage map/reduce job? Thanks David Am 23.02.2011 um 04:01 schrieb Shrinivas Joshi: I am not sure about this but you might want to take a look at the GridMix config file. FWIU, it lets you define the # of jobs for different workloads and categories. HTH, -Shrinivas On Tue, Feb 22, 2011 at 10:46 AM, David Saile da...@uni-koblenz.de wrote: Hello everybody, I am trying to benchmark a Hadoop-cluster with regards to throughput of pipelined MapReduce jobs. Looking for benchmarks, I found the Gridmix benchmark that is supplied with Hadoop. In its README-file it says that part of this benchmark is a Three stage map/reduce job. As this seems to match my needs, I was wondering if it possible to configure Gridmix, in order to only run this job (without the rest of the Gridmix benchmark)? Or do I have to build my own benchmark? If this is the case, which classes are used by this Three stage map/reduce job? Thanks for any help! David
About MapTask.java
Hi, I want to know how MapTask.java is implemented, especially MapOutputBuffer class defined in MapTask.java. I've been trying to read MapTask.java after reading some references such as Hadoop definitive guide and http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html;, but it's quite tough to directly read the code without detailed comments. As I know, when each intermediate (key, value) pair is generated by the user-defined map function, the pair is written by MapOutputBuffer class defined in MapTask.java with MapOutputBuffer.collect() invoked. However, I can't understand what each variable defined in MapOutputBuffer means. What I've understood is as follows (* please correct any misunderstanding): - The byte buffer kvbuffer is where each actual (partition, key, value) triple is written. - An integer array kvindices is called accounting buffer, every three elements of which save indices to the corresponding triple in kvbuffer. - Another integer array kvoffsets contains indices of triples in kvindices. - kvstart, kvend, kvindex are used to point kvindex - bufstart, bufend, bufvoid, bufindex, bufmark are used to point kvbuffer What I can't understand is the comments beside variable definitions. = definitions of some variables = private volatile int kvstart = 0; // marks beginning of *spill* private volatile int kvend = 0;// marks beginning of *collectable* private int kvindex = 0; // marks end of *collected* private final int[] kvoffsets; // indices into kvindices private final int[] kvindices; // partition, k/v offsets into kvbuffer private volatile int bufstart = 0; // marks beginning of *spill* private volatile int bufend = 0; // marks beginning of *collectable* private volatile int bufvoid = 0; // marks the point where we should stop // reading at the end of the buffer private int bufindex = 0; // marks end of *collected* private int bufmark = 0; // marks end of *record* private byte[] kvbuffer; // main output buffer == Q1) What do the terms spill, collectable, and collected mean? I guess, because map outputs continue to be written to the buffer while the spill takes place, there must be at least two pointers: from where to write map outputs and to where to spill data; but I don't know what those spill collectable, and collected mean exactly. Q2) Is it efficient to partition data first and then sort records inside each partition? Does it happen to avoid comparing expensive pair-wise key comparisons? Q3) Are there any documents containing explanations about how such internal classes are implemented? Thanks, eastcirclek
Re: About MapTask.java
Hey, On Thu, Feb 24, 2011 at 6:26 PM, Dongwon Kim eastcirc...@postech.ac.kr wrote: I've been trying to read MapTask.java after reading some references such as Hadoop definitive guide and http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html;, but it's quite tough to directly read the code without detailed comments. Perhaps you can add some after getting things cleared ;-) Q2) Is it efficient to partition data first and then sort records inside each partition? Does it happen to avoid comparing expensive pair-wise key comparisons? Typically you would only want sorting done inside a partitioned set, since all of the different partitions are sent off to different reducers. Total-order partitioning may be an exception here, perhaps. Q3) Are there any documents containing explanations about how such internal classes are implemented? There's a very good presentation you may want to see, on the spill/shuffle/sort framework portions your doubts are about: http://www.slideshare.net/hadoopusergroup/ordered-record-collection HTH :) -- Harsh J www.harshj.com
Trouble in installing Hbase
Hi All I was trying to install CDH3 Hhase in Fedora14 . It gives the following error. Any solution to resolve this Transaction Test Succeeded Running Transaction Error in PREIN scriptlet in rpm package hadoop-hbase-0.90.1+8-1.noarch /usr/bin/install: invalid user `hbase' /usr/bin/install: invalid user `hbase' error: %pre(hadoop-hbase-0.90.1+8-1.noarch) scriptlet failed, exit status 1 error: install: %pre scriptlet failed (2), skipping hadoop-hbase-0.90.1+8-1 Failed: hadoop-hbase.noarch 0:0.90.1+8-1 Complete! [root@linguist hexp]# -- ** JAGANADH G http://jaganadhg.freeflux.net/blog *ILUGCBE* http://ilugcbe.techstud.org
Re: Trouble in installing Hbase
You probably should ask on the cloudera support forums as cloudera has for some reason changed the users that things run under. James Sent from my mobile. Please excuse the typos. On 2011-02-24, at 8:00 AM, JAGANADH G jagana...@gmail.com wrote: Hi All I was trying to install CDH3 Hhase in Fedora14 . It gives the following error. Any solution to resolve this Transaction Test Succeeded Running Transaction Error in PREIN scriptlet in rpm package hadoop-hbase-0.90.1+8-1.noarch /usr/bin/install: invalid user `hbase' /usr/bin/install: invalid user `hbase' error: %pre(hadoop-hbase-0.90.1+8-1.noarch) scriptlet failed, exit status 1 error: install: %pre scriptlet failed (2), skipping hadoop-hbase-0.90.1+8-1 Failed: hadoop-hbase.noarch 0:0.90.1+8-1 Complete! [root@linguist hexp]# -- ** JAGANADH G http://jaganadhg.freeflux.net/blog *ILUGCBE* http://ilugcbe.techstud.org
Check lzo is working on intermediate data
Hey there, I am using hadoop 0.20.2. I 've successfully installed LZOCompression following these steps: https://github.com/kevinweil/hadoop-lzo I have some MR jobs written with the new API and I want to compress intermediate data. Not sure if my mapred-site.xml should have the properties: property namemapred.compress.map.output/name valuetrue/value /property property namemapred.map.output.compression.codec/name valuecom.hadoop.compression.lzo.LzoCodec/value /property or: property namemapreduce.map.output.compress/name valuetrue/value /property property namemapreduce.map.output.compress.codec/name valuecom.hadoop.compression.lzo.LzoCodec/value /property How can I check that the compression is been applied? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Check-lzo-is-working-on-intermediate-data-tp2567704p2567704.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Check lzo is working on intermediate data
Run a standard job before. Look at the summary data. Run the job again after the changes and look at the summary. You should see less file system bytes written from the map stage. Sorry, might be most obvious in shuffle bytes. I don't have a terminal in front of me right now. James Sent from my mobile. Please excuse the typos. On 2011-02-24, at 8:22 AM, Marc Sturlese marc.sturl...@gmail.com wrote: Hey there, I am using hadoop 0.20.2. I 've successfully installed LZOCompression following these steps: https://github.com/kevinweil/hadoop-lzo I have some MR jobs written with the new API and I want to compress intermediate data. Not sure if my mapred-site.xml should have the properties: property namemapred.compress.map.output/name valuetrue/value /property property namemapred.map.output.compression.codec/name valuecom.hadoop.compression.lzo.LzoCodec/value /property or: property namemapreduce.map.output.compress/name valuetrue/value /property property namemapreduce.map.output.compress.codec/name valuecom.hadoop.compression.lzo.LzoCodec/value /property How can I check that the compression is been applied? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Check-lzo-is-working-on-intermediate-data-tp2567704p2567704.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Check lzo is working on intermediate data
I use the first one, and it seems to work because I see the size of data output from mappers is much smaller. Da On 2/24/11 10:12 AM, Marc Sturlese wrote: Hey there, I am using hadoop 0.20.2. I 've successfully installed LZOCompression following these steps: https://github.com/kevinweil/hadoop-lzo I have some MR jobs written with the new API and I want to compress intermediate data. Not sure if my mapred-site.xml should have the properties: property namemapred.compress.map.output/name valuetrue/value /property property namemapred.map.output.compression.codec/name valuecom.hadoop.compression.lzo.LzoCodec/value /property or: property namemapreduce.map.output.compress/name valuetrue/value /property property namemapreduce.map.output.compress.codec/name valuecom.hadoop.compression.lzo.LzoCodec/value /property How can I check that the compression is been applied? Thanks in advance
Re: java.io.FileNotFoundException: File /var/lib/hadoop-0.20/cache/mapred/mapred/staging/job/
Hi Job, This seems CDH-specific, so I've moved the thread over to the cdh-users mailing list (BCC common-user) Thanks -Todd On Thu, Feb 24, 2011 at 2:52 AM, Job j...@gridline.nl wrote: Hi all, This issue could very well be related to the Cloudera distribution (CDH3b4) I use, but maybe someone knows the solution: I configured a Job, something like this: Configuration conf = getConf(); // ... set configuration conf.set(mapred.jar, localJarFile.toString()) // tracker, zookeeper, hbase etc. Job job = new Job(conf); // map: job.setMapperClass(DataImportMap.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(Put.class); // reduce: TableMapReduceUtil.initTableReducerJob(MyTable, DataImportReduce.class, job); FileInputFormat.addInputPath(job, new Path(inputData)); // execute: job.waitForCompletion(true); Now the server throws a strange exception below, see the stacktrace below. When i take look at the hdfs file system - through hdfs fuse - the file is there, it really is the jar that contains my mapred classes. Any clue wat goes wrong here? Thanks, Job - java.io.FileNotFoundException: File /var/lib/hadoop-0.20/cache/mapred/mapred/staging/job/.staging/job_201102241026_0002/job.jar does not exist. at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:207) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157) at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:61) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1303) at org.apache.hadoop.mapred.JobLocalizer.localizeJobJarFile(JobLocalizer.java:273) at org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:381) at org.apache.hadoop.mapred.JobLocalizer.localizeJobFiles(JobLocalizer.java:371) at org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:198) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1154) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1129) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1055) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2212) at org.apache.hadoop.mapred.TaskTracker $TaskLauncher.run(TaskTracker.java:2176) -- Drs. Job Tiel Groenestege GridLine - Intranet en Zoeken GridLine Keizersgracht 520 1017 EK Amsterdam www: http://www.gridline.nl mail: j...@gridline.nl tel: +31 20 616 2050 fax: +31 20 616 2051 De inhoud van dit bericht en de eventueel daarbij behorende bijlagen zijn persoonlijk gericht aan en derhalve uitsluitend bestemd voor de geadresseerde. Zij kunnen gegevens met betrekking tot een derde bevatten. De ontvanger die niet de geadresseerde is, noch bevoegd is dit bericht namens geadresseerde te ontvangen, wordt verzocht de afzender onmiddellijk op de hoogte te stellen van de ontvangst. Elk gebruik van de inhoud van dit bericht en/of van de daarbij behorende bijlagen door een ander dan de geadresseerde is onrechtmatig jegens afzender respectievelijk de hiervoor bedoelde derde. -- Todd Lipcon Software Engineer, Cloudera
Re: Current available Memory
Hi Yang, The problem could be solved using the following link: http://www.roseindia.net/java/java-get-example/get-memory-usage.shtml You need to use other memory managers like the Garbage collector and its finalize method to measure memory accurately. Good Luck, Maha On Feb 23, 2011, at 10:11 PM, Yang Xiaoliang wrote: I had also encuntered the smae problem a few days ago. any one has another method? 2011/2/24 maha m...@umail.ucsb.edu Based on the Java function documentation, it gives approximately the available memory, so I need to tweak it with other functions. So it's a Java issue not Hadoop. Thanks anyways, Maha On Feb 23, 2011, at 6:31 PM, maha wrote: Hello Everyone, I'm using Runtime.getRuntime().freeMemory() to see current memory available before and after creation of an object, but this doesn't seem to work well with Hadoop? Why? and is there another alternative? Thank you, Maha
File size shown in HDFS using -lsr
Silly question.. bin/hadoop dfs -lsr / -rw-r--r-- 1 Hadoop supergroup 832011-02-24 10:52 /tmp/File-Size-4k Why do I see my 4KB file has a size of 83 bytes?? Thanks, Maha
Slides and videos from Feb 2011 Bay Area HUG posted
The February 2011 Bay Area HUG had a record turn out with 336 people signed up. We had two great talks: * The next generation of Hadoop MapReduce by Arun Murthy * The next generation of Hadoop Operations at Facebook by Andrew Ryan The videos and slides are posted on Yahoo's blog: http://developer.yahoo.com/blogs/hadoop/posts/2011/02/hug-feb-2011-recap/ -- Owen
Re: File size shown in HDFS using -lsr
It's because of the HDFS_BYTES_READ . So, my question now is what examples other than compression that change HDFS_BYTES_READ from Map-input-bytes? In my case, input file is 67K but is stored in HDFS as 83K and this doesn't happen all the time, sometimes they're the same and other times they're different (nothing else was changed). Please any explanation is appreciated ! Thank you, Maha On Feb 24, 2011, at 11:00 AM, maha wrote: Silly question.. bin/hadoop dfs -lsr / -rw-r--r-- 1 Hadoop supergroup 832011-02-24 10:52 /tmp/File-Size-4k Why do I see my 4KB file has a size of 83 bytes?? Thanks, Maha
hadoop file format query
hi, I have a use case to upload gzipped text files of sizes ranging from 10-30 GB on hdfs. We have decided on sequence file format as format on hdfs. I have some doubts/questions regarding it: i) what should be the optimal size for a sequence file considering the input text files range from 10-30 GB in size ? Can we have a sequence file as same size as text file ? ii) is there some tool that could be used to convert a gzipped text file to sequence file ? ii) what should be a good metadata management for the files. Currently, we have about 30-40 different types of schema for these text files. We thought of 2 options: - uploading metadata as a text file on hdfs along with data. So users can view using hadoop fs -cat file. - adding metadata in seq file header. In this case, we could not find how to fetch the metadata from sequence file as we need to provide our downstream users a way to see what is the metadata of the data they are reading. thanks a lot ! -JJ
setJarByClass question
Hi, this call, job.setJarByClass tells Hadoop which jar to use. But we also tell Hadoop which jar to use on the command line, hadoop jar your-jar parameters Why do we need this in both places? Thank you, Mark
Re: setJarByClass question
The jar in the command line might only be the jar to submit the map-reduce job, rather than the jar contains the Mapper and Reducer which will be transferred to different node. What the hadoop jar your-jar really did, is setting the classpath and other related environment, and run the main method in your-jar. You might have a different map-reduce-jar in the classpath which contains the real mapper and reducer used to do the job. Best wishes, Stanley Xu On Fri, Feb 25, 2011 at 7:23 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, this call, job.setJarByClass tells Hadoop which jar to use. But we also tell Hadoop which jar to use on the command line, hadoop jar your-jar parameters Why do we need this in both places? Thank you, Mark
Re: Current available Memory
Thanks a lot! Yang Xiaoliang 2011/2/25 maha m...@umail.ucsb.edu Hi Yang, The problem could be solved using the following link: http://www.roseindia.net/java/java-get-example/get-memory-usage.shtml You need to use other memory managers like the Garbage collector and its finalize method to measure memory accurately. Good Luck, Maha On Feb 23, 2011, at 10:11 PM, Yang Xiaoliang wrote: I had also encuntered the smae problem a few days ago. any one has another method? 2011/2/24 maha m...@umail.ucsb.edu Based on the Java function documentation, it gives approximately the available memory, so I need to tweak it with other functions. So it's a Java issue not Hadoop. Thanks anyways, Maha On Feb 23, 2011, at 6:31 PM, maha wrote: Hello Everyone, I'm using Runtime.getRuntime().freeMemory() to see current memory available before and after creation of an object, but this doesn't seem to work well with Hadoop? Why? and is there another alternative? Thank you, Maha
Re: is there more smarter way to execute a hadoop cluster?
hi, I got the reason of my problem. in case of submitting a job by shell, conf.get(fs.default.name) is hdfs://localhost in case of submitting a job by a java application directly, conf.get(fs.default.name) is file://localhost so I couldn't read any files from hdfs. I think the execution of my java app couldn't read *-site.xml configurations properly. Junyoung Kim (juneng...@gmail.com) On 02/24/2011 06:41 PM, Harsh J wrote: Hey, On Thu, Feb 24, 2011 at 2:36 PM, Jun Young Kimjuneng...@gmail.com wrote: How are I going to do? In new API, 'Job' class too has a Job.submit() and Job.waitForCompletion(bool) method. Please see the API here: http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/Job.html
Re: is there more smarter way to execute a hadoop cluster?
Hi, On Fri, Feb 25, 2011 at 10:17 AM, Jun Young Kim juneng...@gmail.com wrote: hi, I got the reason of my problem. in case of submitting a job by shell, conf.get(fs.default.name) is hdfs://localhost in case of submitting a job by a java application directly, conf.get(fs.default.name) is file://localhost so I couldn't read any files from hdfs. I think the execution of my java app couldn't read *-site.xml configurations properly. Have a look at this Q: http://wiki.apache.org/hadoop/FAQ#How_do_I_get_my_MapReduce_Java_Program_to_read_the_Cluster.27s_set_configuration_and_not_just_defaults.3F -- Harsh J www.harshj.com
Re: is there more smarter way to execute a hadoop cluster?
Hi, Harsh. I've already tried to do use final tag to set it unmodifiable. but, my result is not different. *core-site.xml:* configuration property namefs.default.name/name valuehdfs://localhost/value finaltrue/final /property /configuration other *-site.xml files are also modified by this rule. thanks. Junyoung Kim (juneng...@gmail.com) On 02/25/2011 02:50 PM, Harsh J wrote: Hi, On Fri, Feb 25, 2011 at 10:17 AM, Jun Young Kimjuneng...@gmail.com wrote: hi, I got the reason of my problem. in case of submitting a job by shell, conf.get(fs.default.name) is hdfs://localhost in case of submitting a job by a java application directly, conf.get(fs.default.name) is file://localhost so I couldn't read any files from hdfs. I think the execution of my java app couldn't read *-site.xml configurations properly. Have a look at this Q: http://wiki.apache.org/hadoop/FAQ#How_do_I_get_my_MapReduce_Java_Program_to_read_the_Cluster.27s_set_configuration_and_not_just_defaults.3F
Re: is there more smarter way to execute a hadoop cluster?
Hello again, Finals won't help all the logic you require to be performed in the front-end/Driver code. If you're using fs.default.name inside a Task somehow, final will help there. It is best if your application gets the right configuration files on its classpath itself, so that the right values are read (how else would it know your values!). Alternatively, you can use GenericOptionsParser to parse -fs and -jt arguments when the Driver is launched from commandline. On Fri, Feb 25, 2011 at 11:46 AM, Jun Young Kim juneng...@gmail.com wrote: Hi, Harsh. I've already tried to do use final tag to set it unmodifiable. but, my result is not different. *core-site.xml:* configuration property namefs.default.name/name valuehdfs://localhost/value finaltrue/final /property /configuration other *-site.xml files are also modified by this rule. thanks. Junyoung Kim (juneng...@gmail.com) On 02/25/2011 02:50 PM, Harsh J wrote: Hi, On Fri, Feb 25, 2011 at 10:17 AM, Jun Young Kimjuneng...@gmail.com wrote: hi, I got the reason of my problem. in case of submitting a job by shell, conf.get(fs.default.name) is hdfs://localhost in case of submitting a job by a java application directly, conf.get(fs.default.name) is file://localhost so I couldn't read any files from hdfs. I think the execution of my java app couldn't read *-site.xml configurations properly. Have a look at this Q: http://wiki.apache.org/hadoop/FAQ#How_do_I_get_my_MapReduce_Java_Program_to_read_the_Cluster.27s_set_configuration_and_not_just_defaults.3F -- Harsh J www.harshj.com
Re: is there more smarter way to execute a hadoop cluster?
hello, harsh. do you mean I need to read xml files and then parse it to set in my app? Junyoung Kim (juneng...@gmail.com) On 02/25/2011 03:32 PM, Harsh J wrote: It is best if your application gets the right configuration files on its classpath itself, so that the right values are read (how else would it know your values!).