Re: how to output to stdout

2009-11-08 Thread Siddu
2009/11/8 Gang Luo lgpub...@yahoo.com.cn:
 Hi everyone,
 To check whether my hadoop program goes as I expected, I add some println 
 in my program. But it seems they don't work. Somebody gives me some 
 suggestion how to output something to stdout? Thanks.


look out for a folder called
logs/userlogs//attempt_200911082348_0002_m_00_0/stdout:

  --Gang


  ___
  好玩贺卡等你发,邮箱贺卡全新上线!
 http://card.mail.cn.yahoo.com/




-- 
Regards,
~Sid~
I have never met a man so ignorant that i couldn't learn something from him


Re: how to output to stdout

2009-11-08 Thread Gang Luo
Thanks, Sid. I got it in the jobtracker

 
--Gang



- 原始邮件 
发件人: Siddu siddu.s...@gmail.com
收件人: common-user@hadoop.apache.org
发送日期: 2009/11/8 (周日) 2:42:18 下午
主   题: Re: how to output to stdout

2009/11/8 Gang Luo lgpub...@yahoo.com.cn:
 Hi everyone,
 To check whether my hadoop program goes as I expected, I add some println 
 in my program. But it seems they don't work. Somebody gives me some 
 suggestion how to output something to stdout? Thanks.


look out for a folder called
logs/userlogs//attempt_200911082348_0002_m_00_0/stdout:

  --Gang


  ___
  好玩贺卡等你发,邮箱贺卡全新上线!
 http://card.mail.cn.yahoo.com/




-- 
Regards,
~Sid~
I have never met a man so ignorant that i couldn't learn something from him



  ___ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/


Re: Confused by new API MultipleOutputFormats using Hadoop 0.20.1

2009-11-08 Thread Tom White
Multiple outputs has been ported to the new API in 0.21. See
https://issues.apache.org/jira/browse/MAPREDUCE-370.

Cheers,
Tom

On Sat, Nov 7, 2009 at 6:45 AM, Xiance SI(司宪策) adam...@gmail.com wrote:
 I just fall back to old mapred.* APIs, seems MultipleOutputs only works for
 the old API.

 wishes,
 Xiance

 On Mon, Nov 2, 2009 at 9:12 AM, Paul Smith psm...@aconex.com wrote:

 Totally stuck here, I can't seem to find a way to resolve this, but I can't
 use the new API _and_ use the MultipleOutputFormats class.

 I found this thread which is related, but doesn't seem to help me (or I
 missed something completely, certainly possible):


 http://markmail.org/message/u4wz5nbcn5rawydq#query:hadoop%20MultipleTextOutputFormat%20OutputFormat%20Job%20JobConf+page:1+mid:5wy63oqa2vs6bj7b+state:results

 My controller Job class is simple, but I get a compile error trying to add
 the new MultipleOutputs:

 public class ControllerMetricGrinder {

    public static class MetricNameMultipleTextOutputFormat extends
            MultipleTextOutputFormatString, ControllerMetric {

       �...@override
        protected String generateFileNameForKeyValue(String key,
 ControllerMetric value, String name) {
            return key;
        }

    }
    public static void main(String[] args) throws Exception {

        Job job = new Job();
        job.setJarByClass(ControllerMetricGrinder.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(ControllerMetric.class);

        job.setMapperClass(ControllerMetricMapper.class);

        job.setCombinerClass(ControllerMetricReducer.class);
        job.setReducerClass(ControllerMetricReducer.class);

        // COMPILE ERROR HERE
        MultipleOutputs.addMultiNamedOutput(job, metrics,
                MetricNameMultipleTextOutputFormat.class,
                Text.class, ControllerMetric.class);

        job.setNumReduceTasks(5);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
 }

 (mappers and reducers are using the new API, and are in separate classes).

 MultipleOutputs doesn't take a Job, it only takes a JobConf.  Any ideas?
  I'd prefer to use the new API (because I've written it that way), but I'm
 guessing now I'll have to go and rework everything to the OLD API to get
 this to work.

 I'm trying to create a File-per-metric name (there's only 5).

 thoughts?

 Paul




Re: How to build and deploy Hadoop 0.21 ?

2009-11-08 Thread Aaron Kimball
On Thu, Nov 5, 2009 at 2:34 AM, Andrei Dragomir adrag...@adobe.com wrote:

 Hello everyone.
 We ran into a bunch of issues with building and deploying hadoop 0.21.
 It would be great to get some answers about how things should work, so
 we can try to fix them.

 1. When checking out the repositories, each of them can be built by
 itself perfectly. BUT, if you look in hdfs it has mapreduce libraries,
 and in mapreduce it has hdfs libraries. That's kind of a cross-
 reference between projects.
Q: Is this dependence necessary ? Can we get rid of it ?


Those are build-time dependencies. Ideally you'll ignore them post-build.


Q: if it's necessary, how does one build the jars with the latest
 version of the source code ? how are the jars in the scm repository
 created  (hadoop-hdfs/lib/hadoop-mapred-0.21-dev.jar) as long as there
 is a cross-reference ?
 2. There are issues with the jar files and the webapps (dfshealth.jsp,
 etc). Right now, the only way to have a hadoop functioning system is
 to: build hdfs and mapreduce; copy everything from hdfs/build and
 mapreduce/build to common/build.


Yup.



Q: Is there a better way of doing this ? What needs to be fixed to
 have the webapps in the jar files (like on 0.20). Are there JIRA
 issues logged on this ?


I have created a Makefile and some associated scripts that will build
everything and squash it together for you; see
https://issues.apache.org/jira/browse/HADOOP-6342

There is also a longer-term effort to use Maven to coordinate the three
subprojects, and use a local repository for inter-project development on a
single machine; see https://issues.apache.org/jira/browse/HADOOP-5107 for
progress there.



 We would really appreciate some answers at least related to where
 hadoop is going with this build step, so we can help with patches /
 fixes.

 Thank you,
   Andrei Dragomirt



Re: Multiple Input Paths

2009-11-08 Thread Tom White
MultipleInputs is available from Hadoop 0.19 onwards (in
org.apache.hadoop.mapred.lib, or org.apache.hadoop.mapreduce.lib.input
for the new API in later versions).

Tom

On Wed, Nov 4, 2009 at 8:07 AM, Mark Vigeant
mark.vige...@riskmetrics.com wrote:
 Amogh,

 That sounds so awesome! Yeah I wish I had that class now. Do you have any 
 tips on how to create such a delegating class? The best I can come up with is 
 to just submit both files to the mapper using multiple input paths and then 
 having anif statement at the beginning of the map that checks which file it's 
 dealing with but I'm skeptical that I can even make that work... Is there a 
 way you know of that I could submit 2 mapper classes to the job?

 -Original Message-
 From: Amogh Vasekar [mailto:am...@yahoo-inc.com]
 Sent: Wednesday, November 04, 2009 1:50 AM
 To: common-user@hadoop.apache.org
 Subject: Re: Multiple Input Paths

 Hi Mark,
 A future release of Hadoop will have a MultipleInputs class, akin to 
 MultipleOutputs. This would allow you to have a different inputformat, mapper 
 depending on the path you are getting the split from. It uses special 
 Delegating[mapper/input] classes to resolve this. I understand backporting 
 this is more or less out of question, but the ideas there might provide 
 pointers to help you solve your current problem.
 Just a thought :)

 Amogh


 On 11/3/09 8:44 PM, Mark Vigeant mark.vige...@riskmetrics.com wrote:

 Hey Vipul

 No I haven't concatenated my files yet, and I was just thinking over how to 
 approach the issue of multiple input paths.

 I actually did what Amandeep hinted at which was we wrote our own 
 XMLInputFormat and XMLRecordReader. When configuring the job in my driver I 
 set job.setInputFormatClass(XMLFileInputFormat.class) and what it does is 
 send chunks of XML to the mapper as opposed to lines of text or whole files. 
 So I specified the Line Delimiter in the XMLRecordReader (ie startTag) and 
 everything in between the tags startTag and /startTag are sent to the 
 mapper. Inside the map function is where to parse the data and write it to 
 the table.

 What I have to do now is just figure out how to set the Line Delimiter to be 
 something common in both XML files I'm reading. Currently I have 2 mapper 
 classes and thus 2 submitted jobs which is really inefficient and time 
 consuming.

 Make sense at all? Sorry if it doesn't, feel free to ask more questions

 Mark

 -Original Message-
 From: Vipul Sharma [mailto:sharmavi...@gmail.com]
 Sent: Monday, November 02, 2009 7:48 PM
 To: common-user@hadoop.apache.org
 Subject: RE: Multiple Input Paths

 Mark,

 were you able to concatenate both the xml files together. What did you do to
 keep the resulting xml well forned?

 Regards,
 Vipul Sharma,
 Cell: 281-217-0761