Re: how to output to stdout
2009/11/8 Gang Luo lgpub...@yahoo.com.cn: Hi everyone, To check whether my hadoop program goes as I expected, I add some println in my program. But it seems they don't work. Somebody gives me some suggestion how to output something to stdout? Thanks. look out for a folder called logs/userlogs//attempt_200911082348_0002_m_00_0/stdout: --Gang ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/ -- Regards, ~Sid~ I have never met a man so ignorant that i couldn't learn something from him
Re: how to output to stdout
Thanks, Sid. I got it in the jobtracker --Gang - 原始邮件 发件人: Siddu siddu.s...@gmail.com 收件人: common-user@hadoop.apache.org 发送日期: 2009/11/8 (周日) 2:42:18 下午 主 题: Re: how to output to stdout 2009/11/8 Gang Luo lgpub...@yahoo.com.cn: Hi everyone, To check whether my hadoop program goes as I expected, I add some println in my program. But it seems they don't work. Somebody gives me some suggestion how to output something to stdout? Thanks. look out for a folder called logs/userlogs//attempt_200911082348_0002_m_00_0/stdout: --Gang ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/ -- Regards, ~Sid~ I have never met a man so ignorant that i couldn't learn something from him ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
Re: Confused by new API MultipleOutputFormats using Hadoop 0.20.1
Multiple outputs has been ported to the new API in 0.21. See https://issues.apache.org/jira/browse/MAPREDUCE-370. Cheers, Tom On Sat, Nov 7, 2009 at 6:45 AM, Xiance SI(司宪策) adam...@gmail.com wrote: I just fall back to old mapred.* APIs, seems MultipleOutputs only works for the old API. wishes, Xiance On Mon, Nov 2, 2009 at 9:12 AM, Paul Smith psm...@aconex.com wrote: Totally stuck here, I can't seem to find a way to resolve this, but I can't use the new API _and_ use the MultipleOutputFormats class. I found this thread which is related, but doesn't seem to help me (or I missed something completely, certainly possible): http://markmail.org/message/u4wz5nbcn5rawydq#query:hadoop%20MultipleTextOutputFormat%20OutputFormat%20Job%20JobConf+page:1+mid:5wy63oqa2vs6bj7b+state:results My controller Job class is simple, but I get a compile error trying to add the new MultipleOutputs: public class ControllerMetricGrinder { public static class MetricNameMultipleTextOutputFormat extends MultipleTextOutputFormatString, ControllerMetric { �...@override protected String generateFileNameForKeyValue(String key, ControllerMetric value, String name) { return key; } } public static void main(String[] args) throws Exception { Job job = new Job(); job.setJarByClass(ControllerMetricGrinder.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(ControllerMetric.class); job.setMapperClass(ControllerMetricMapper.class); job.setCombinerClass(ControllerMetricReducer.class); job.setReducerClass(ControllerMetricReducer.class); // COMPILE ERROR HERE MultipleOutputs.addMultiNamedOutput(job, metrics, MetricNameMultipleTextOutputFormat.class, Text.class, ControllerMetric.class); job.setNumReduceTasks(5); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } (mappers and reducers are using the new API, and are in separate classes). MultipleOutputs doesn't take a Job, it only takes a JobConf. Any ideas? I'd prefer to use the new API (because I've written it that way), but I'm guessing now I'll have to go and rework everything to the OLD API to get this to work. I'm trying to create a File-per-metric name (there's only 5). thoughts? Paul
Re: How to build and deploy Hadoop 0.21 ?
On Thu, Nov 5, 2009 at 2:34 AM, Andrei Dragomir adrag...@adobe.com wrote: Hello everyone. We ran into a bunch of issues with building and deploying hadoop 0.21. It would be great to get some answers about how things should work, so we can try to fix them. 1. When checking out the repositories, each of them can be built by itself perfectly. BUT, if you look in hdfs it has mapreduce libraries, and in mapreduce it has hdfs libraries. That's kind of a cross- reference between projects. Q: Is this dependence necessary ? Can we get rid of it ? Those are build-time dependencies. Ideally you'll ignore them post-build. Q: if it's necessary, how does one build the jars with the latest version of the source code ? how are the jars in the scm repository created (hadoop-hdfs/lib/hadoop-mapred-0.21-dev.jar) as long as there is a cross-reference ? 2. There are issues with the jar files and the webapps (dfshealth.jsp, etc). Right now, the only way to have a hadoop functioning system is to: build hdfs and mapreduce; copy everything from hdfs/build and mapreduce/build to common/build. Yup. Q: Is there a better way of doing this ? What needs to be fixed to have the webapps in the jar files (like on 0.20). Are there JIRA issues logged on this ? I have created a Makefile and some associated scripts that will build everything and squash it together for you; see https://issues.apache.org/jira/browse/HADOOP-6342 There is also a longer-term effort to use Maven to coordinate the three subprojects, and use a local repository for inter-project development on a single machine; see https://issues.apache.org/jira/browse/HADOOP-5107 for progress there. We would really appreciate some answers at least related to where hadoop is going with this build step, so we can help with patches / fixes. Thank you, Andrei Dragomirt
Re: Multiple Input Paths
MultipleInputs is available from Hadoop 0.19 onwards (in org.apache.hadoop.mapred.lib, or org.apache.hadoop.mapreduce.lib.input for the new API in later versions). Tom On Wed, Nov 4, 2009 at 8:07 AM, Mark Vigeant mark.vige...@riskmetrics.com wrote: Amogh, That sounds so awesome! Yeah I wish I had that class now. Do you have any tips on how to create such a delegating class? The best I can come up with is to just submit both files to the mapper using multiple input paths and then having anif statement at the beginning of the map that checks which file it's dealing with but I'm skeptical that I can even make that work... Is there a way you know of that I could submit 2 mapper classes to the job? -Original Message- From: Amogh Vasekar [mailto:am...@yahoo-inc.com] Sent: Wednesday, November 04, 2009 1:50 AM To: common-user@hadoop.apache.org Subject: Re: Multiple Input Paths Hi Mark, A future release of Hadoop will have a MultipleInputs class, akin to MultipleOutputs. This would allow you to have a different inputformat, mapper depending on the path you are getting the split from. It uses special Delegating[mapper/input] classes to resolve this. I understand backporting this is more or less out of question, but the ideas there might provide pointers to help you solve your current problem. Just a thought :) Amogh On 11/3/09 8:44 PM, Mark Vigeant mark.vige...@riskmetrics.com wrote: Hey Vipul No I haven't concatenated my files yet, and I was just thinking over how to approach the issue of multiple input paths. I actually did what Amandeep hinted at which was we wrote our own XMLInputFormat and XMLRecordReader. When configuring the job in my driver I set job.setInputFormatClass(XMLFileInputFormat.class) and what it does is send chunks of XML to the mapper as opposed to lines of text or whole files. So I specified the Line Delimiter in the XMLRecordReader (ie startTag) and everything in between the tags startTag and /startTag are sent to the mapper. Inside the map function is where to parse the data and write it to the table. What I have to do now is just figure out how to set the Line Delimiter to be something common in both XML files I'm reading. Currently I have 2 mapper classes and thus 2 submitted jobs which is really inefficient and time consuming. Make sense at all? Sorry if it doesn't, feel free to ask more questions Mark -Original Message- From: Vipul Sharma [mailto:sharmavi...@gmail.com] Sent: Monday, November 02, 2009 7:48 PM To: common-user@hadoop.apache.org Subject: RE: Multiple Input Paths Mark, were you able to concatenate both the xml files together. What did you do to keep the resulting xml well forned? Regards, Vipul Sharma, Cell: 281-217-0761