input files
Hadoop usually takes either a single file or a folder as an input parameter. But is it possible to modify it so that it can take list of files(not a folder) as input parameter -- - Deepak Diwakar,
Re: input files
You can add more paths to input using FileInputFormat.addInputPath(JobConf, Path). You can also specify comma separated filenames as input path using FileInputFormat.setInputPaths(JobConf, String commaSeparatedPaths) More details at http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html You can also use glob path to specify multiple paths in a single path. Thanks Amareshwari Deepak Diwakar wrote: Hadoop usually takes either a single file or a folder as an input parameter. But is it possible to modify it so that it can take list of files(not a folder) as input parameter
Re: Missing lib/native/Linux-amd64-64 on hadoop-0.17.2.tar.gz
hi Could anyone help to re-pack the 0.17.2 with missing lib/native/Linux-amd64-64 ? thanks On Wed, Aug 20, 2008 at 9:31 AM, Yi-Kai Tsai [EMAIL PROTECTED] wrote: But we do have lib/native/Linux-amd64-64 on hadoop-0.17.1.tar.gz and hadoop-0.18.0.tar.gz ? At least for -0.17.1, yes there is. Regards, Leon Mergen -- Yi-Kai Tsai (cuma) [EMAIL PROTECTED], Asia Regional Search Engineering.
Hadoop 0.17.2 released
Hadoop Core 0.17.2 has been released and the website updated. It fixes a couple of critical bugs in the 0.17 branch. It can be downloaded from: http://www.apache.org/dyn/closer.cgi/hadoop/core/ -- Owen
Re: Cannot read reducer values into a list
On Aug 19, 2008, at 4:57 PM, Deepika Khera wrote: Thanks for the clarification on this. So, it seems like cloning the object before adding to the list is the only solution for this problem. Is that right? Yes. You can use WritableUtils.clone to do the job. -- Owen
Re: pseudo-global variable constuction
Thank you very much, Paco and Jason. It works! For any users who may be curious what this may look like in code, here is a small snippet of mine: file: myLittleMRProgram.java package.org.apache.hadoop.examples; public static class Reduce extends MapReduceBase implements ReducerText, LongWritable, Text, LongWritable { private int nTax = 0; public void configure(JobConf job) { super.configure(job); String Tax = job.get(nTax); nTax = Integer.parseInt(Tax); } public void reduce() throws IOException { System.out.println(nTax is: + nTax); } main() { conf.set(nTax, other_args.get(2)); JobClient.runJob(conf); return 0; } -SM On Tue, Aug 19, 2008 at 5:02 PM, Jason Venner [EMAIL PROTECTED] wrote: Since the map reduce tasks generally run in a separate java virtual machine and on distinct machines from your main task's java virtual machine, there is no sharing of variables between the main task and the map or reduce tasks. The standard way is to store the variable in the Configuration (or JobConf) object in your main task Then in the configure method of your map and reduce task class, extract the variable value from the JobConf object. You will need to implement an overriding to the configure method in your map and reduce classes. This will also require that the variable value be serializable. For lots of large variables this can be expensive. Sandy wrote: Hello, My M/R program is going smoothly, except for one small problem. I have a global variable that is set by the user (and thus in the main function), that I want one of my reduce functions to access. This is a read-only variable. After some reading in the forums, I tried something like this: file: MyGlobalVars.java package org.apache.hadoop.examples; public class MyGlobalVars { static public int nTax; } -- file: myLittleMRProgram.java package.org.apache.hadoop.examples; map function() { System.out.println(in map function, nTax is: + MyGlobalVars.nTax); } main() { MyGlobalVars.nTax = other_args.get(2); System.out.println(in main function, nTax is: + MyGlobalVars.nTax); JobClient.runJob(conf); return 0; } When I run it, I get: in main function, nTax is 20 (which is what I want) in map function, nTax is 0 (--- this is not right). I am a little confused on how to resolve this. I apologize in advance if this is an blatant java error; I only began programming in the language a few weeks ago. Since Map Reduce tries to avoid the whole shared-memory scene, I am more than willing to have each reduce function receive a local copy of this user defined value. However, I am a little confused on what the best way to do this would be. As I see it, my options are: 1.) write the user defined value to the hdfs in the main function, and have it read from the hdfs in the reduce function. I can't quite figure out the code to this though. I know how to specify -an- input file for the map reduce task, but if I did it this way, won't I need to specify two separate input files? 2. Put it in the construction of the reduce object (I saw this mentioned in the archives). How would I accomplish this exactly when the value is user defined? Parameter Passing? If so, won't this require me changing the underlying map reduce base (which makes me a touch nervous, since i'm still very new to hadoop). What would be the easiest way to do this? Thanks in advance for the help. I appreciate your time. -SM -- Jason Venner Attributor - Program the Web http://www.attributor.com/ Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested
Reminder: Monthly Hadoop User Group Meeting (Bay Area) today
Reminder: The next Hadoop User Group (Bay Area) meeting is scheduled for today, Wednesday, Aug 20th from 6 - 7:30 pm at Yahoo! Mission College, Santa Clara, CA, Building 1, Training Rooms 34. Agenda: Pig Update: Olga Natkovich Hadoop 0.18 and post 0.18 - Sameer Paranjpye Registration and directions: http://upcoming.yahoo.com/event/1011188 Look forward to seeing you there! Ajay
Know how many records remain?
Hi mailing, Are there any way to know whether the mapper is processing the last record that assigned to this node, or know how many records remain to be processed in this node? Qin
Re: Why is scaling HBase much simpler then scaling a relational db?
On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 [EMAIL PROTECTED] wrote: Can you please explain, why someone should use HBase for horizontal scaling instead of a relational database? One reason for me would be, that i don't have to implement the sharding logic myself. Are there other? A slight tangent -- there are various tools that implement sharding over relational databases like MySQL. Two that I know of are DBSlayer, http://code.nytimes.com/projects/dbslayer and MySQL Proxy, http://forge.mysql.com/wiki/MySQL_Proxy I don't know of any formal comparisons between sharding traditional database servers and distributed databases like HBase. -Stuart
RE: Why is scaling HBase much simpler then scaling a relational db?
Stuart, In general you will get a quicker response to HBase questions by posting them to the HBase mailing list ([EMAIL PROTECTED]) see http://hadoop.apache.org/hbase/mailing_lists.html for how to subscribe. Perhaps the best document on scaling HBase is actually the Bigtable paper: http://labs.google.com/papers/bigtable.html --- Jim Kellerman, Senior Engineer; Powerset (a Microsoft Company) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stuart Sierra Sent: Wednesday, August 20, 2008 1:03 PM To: core-user@hadoop.apache.org Subject: Re: Why is scaling HBase much simpler then scaling a relational db? On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 [EMAIL PROTECTED] wrote: Can you please explain, why someone should use HBase for horizontal scaling instead of a relational database? One reason for me would be, that i don't have to implement the sharding logic myself. Are there other? A slight tangent -- there are various tools that implement sharding over relational databases like MySQL. Two that I know of are DBSlayer, http://code.nytimes.com/projects/dbslayer and MySQL Proxy, http://forge.mysql.com/wiki/MySQL_Proxy I don't know of any formal comparisons between sharding traditional database servers and distributed databases like HBase. -Stuart
RE: Cannot read reducer values into a list
Thanks...this works beautifully :) ! Deepika -Original Message- From: Owen O'Malley [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 20, 2008 7:52 AM To: core-user@hadoop.apache.org Subject: Re: Cannot read reducer values into a list On Aug 19, 2008, at 4:57 PM, Deepika Khera wrote: Thanks for the clarification on this. So, it seems like cloning the object before adding to the list is the only solution for this problem. Is that right? Yes. You can use WritableUtils.clone to do the job. -- Owen
hadoop 0.18.0 ec2 images?
Are there any publicly available EC2 images for Hadoop 0.18.0 yet? There don't seem to be any in the hadoop-ec2-images bucket.