Re: How to control the map and reduce step sequentially

2008-07-29 Thread rae l
2008/7/29 晋光峰 <[EMAIL PROTECTED]>:
> I got it. Thanks!
>
> 2008/7/28 Shengkai Zhu <[EMAIL PROTECTED]>
>
>> The real reduce logic is actually started when all map tasks are finished.
>>
>> Is it still unexpected?
>>
>> 朱盛凯
>>
>> Jash Zhu
>>
>> 复旦大学软件学院
根据我使用Hadoop和看过的Hadoop代码的经验,Reducer不会在Mapper之前运行;有时能观察到mapper先启动了,但也没有对程序运行的结果有影响;

BTW: 
原来有这么多国内的朋友在研究Hadoop啊,我也是在几个月前根据公司的任务开始研究和部署Hadoop;照此看来,如果我们建设一个Hadoop中文讨论区不知如何?或者哪位已知有了中文的Hadoop讨论区?根据PowerBy页面国内已经有了Koubei网已经在用上了:
http://wiki.apache.org/hadoop/PoweredBy

--
程任全


Re: How to control the map and reduce step sequentially

2008-07-29 Thread Xuebing Yan

阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了,
Hadoop 0.17的中文文档有可能在近期发布。

-闫雪冰

On Tue, 2008-07-29 at 16:25 +0800, rae l wrote:
> 2008/7/29 晋光峰 <[EMAIL PROTECTED]>:
> > I got it. Thanks!
> >
> > 2008/7/28 Shengkai Zhu <[EMAIL PROTECTED]>
> >
> >> The real reduce logic is actually started when all map tasks are finished.
> >>
> >> Is it still unexpected?
> >>
> >> 朱盛凯
> >>
> >> Jash Zhu
> >>
> >> 复旦大学软件学院
> 根据我使用Hadoop和看过的Hadoop代码的经验,Reducer不会在Mapper之前运行;有时能观察到mapper先启动了,但也没有对程序运行的结果有影响;
> 
> BTW: 
> 原来有这么多国内的朋友在研究Hadoop啊,我也是在几个月前根据公司的任务开始研究和部署Hadoop;照此看来,如果我们建设一个Hadoop中文讨论区不知如何?或者哪位已知有了中文的Hadoop讨论区?根据PowerBy页面国内已经有了Koubei网已经在用上了:
> http://wiki.apache.org/hadoop/PoweredBy
> 
> --
> 程任全



Hadoop warnings in pseudo-distributed mode

2008-07-29 Thread Arv Mistry
 
Could anyone tell me, is it normal to get warnings "could only be
replicated to 0 nodes, instead of 1" when running in a psudo-distributed
mode i.e. everything on one machine?

It seems to be writing to the files that I expect, just I get this
warning.

If it isn't normal, just some background;
 - I did have it running in a distributed mode, but have since deleted
the old file system. Is there any cleanup I may have missed?

Any help would be appreciated,

Cheers Arv


RE: Hadoop warnings in pseudo-distributed mode

2008-07-29 Thread Arv Mistry

Sorry, found the errors of my ways  I forgot to add 127.0.0.1 to the
master/slave files

Cheers Arv


-Original Message-
From: Arv Mistry 
Sent: Tuesday, July 29, 2008 8:53 AM
To: 'core-user@hadoop.apache.org'
Subject: Hadoop warnings in pseudo-distributed mode

 
Could anyone tell me, is it normal to get warnings "could only be
replicated to 0 nodes, instead of 1" when running in a psudo-distributed
mode i.e. everything on one machine?

It seems to be writing to the files that I expect, just I get this
warning.

If it isn't normal, just some background;
 - I did have it running in a distributed mode, but have since deleted
the old file system. Is there any cleanup I may have missed?

Any help would be appreciated,

Cheers Arv


iterative map-reduce

2008-07-29 Thread Shirley Cohen

Hi,

I want to call a map-reduce program recursively until some condition  
is met.  How do I do that?


Thanks,

Shirley



Re: iterative map-reduce

2008-07-29 Thread Qin Gao
if you are using java, just create job configure again and run it, otherwise
you just need to write a iterative script.

On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I want to call a map-reduce program recursively until some condition is
> met.  How do I do that?
>
> Thanks,
>
> Shirley
>
>


Re: iterative map-reduce

2008-07-29 Thread Shirley Cohen
Thanks... would the iterative script be run outside of Hadoop? I was  
actually trying to figure out if the framework could handle iterations.


Shirley

On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:

if you are using java, just create job configure again and run it,  
otherwise

you just need to write a iterative script.

On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen  
<[EMAIL PROTECTED]> wrote:



Hi,

I want to call a map-reduce program recursively until some  
condition is

met.  How do I do that?

Thanks,

Shirley






Re: iterative map-reduce

2008-07-29 Thread Qin Gao
I think it is nothing to do with the framework, just treat the mapredcue as
a batch process or a subroutine, and you may iteratively call them. If there
are such interface, I am also interested to know.



On Tue, Jul 29, 2008 at 10:31 AM, Shirley Cohen <[EMAIL PROTECTED]>wrote:

> Thanks... would the iterative script be run outside of Hadoop? I was
> actually trying to figure out if the framework could handle iterations.
>
> Shirley
>
>
> On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:
>
>  if you are using java, just create job configure again and run it,
>> otherwise
>> you just need to write a iterative script.
>>
>> On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <[EMAIL PROTECTED]>
>> wrote:
>>
>>  Hi,
>>>
>>> I want to call a map-reduce program recursively until some condition is
>>> met.  How do I do that?
>>>
>>> Thanks,
>>>
>>> Shirley
>>>
>>>
>>>
>


Re: How to control the map and reduce step sequentially

2008-07-29 Thread rae l
2008/7/29 Xuebing Yan <[EMAIL PROTECTED]>:
>
> 阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了,
> Hadoop 0.17的中文文档有可能在近期发布。
好。

http://www.hadoop.org.cn/
这个似乎是一个人建立的BLOG,查询结果是:

www.hadoop.org.cn >> 218.240.14.21

* 本站主数据:北京市 中关村信息工程股份有限公司
* 查询结果2:北京市 中关村信息工程股份有限公司
* 查询结果3:北京市

--
程任全


Re: How to control the map and reduce step sequentially

2008-07-29 Thread rae l
2008/7/29 Xuebing Yan <[EMAIL PROTECTED]>:
>
> 阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了,
> Hadoop 0.17的中文文档有可能在近期发布。
好。

http://www.hadoop.org.cn/
这个似乎是一个人建立的BLOG,查询结果是:

www.hadoop.org.cn >> 218.240.14.21

* 本站主数据:北京市 中关村信息工程股份有限公司
* 查询结果2:北京市 中关村信息工程股份有限公司
* 查询结果3:北京市

--
程任全


Re: iterative map-reduce

2008-07-29 Thread Christian Ulrik Søttrup

Hi Shirley,

I am basically doing as Qin suggested.
I am running a job iteratively until some condition is met.
My main looks something like:(in pseudo code)

main:
while (!converged):
  make new jobconf
  setup jobconf
  run jobconf
  check reporter for statistics
  decide if converged

I use a custom reporter to check on the fitness of the solution in the 
reduce phase.


If you need more(real java) code drop me a line.

Cheers,
Christian

Qin Gao wrote:

I think it is nothing to do with the framework, just treat the mapredcue as
a batch process or a subroutine, and you may iteratively call them. If there
are such interface, I am also interested to know.



On Tue, Jul 29, 2008 at 10:31 AM, Shirley Cohen <[EMAIL PROTECTED]>wrote:

  

Thanks... would the iterative script be run outside of Hadoop? I was
actually trying to figure out if the framework could handle iterations.

Shirley


On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:

 if you are using java, just create job configure again and run it,


otherwise
you just need to write a iterative script.

On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <[EMAIL PROTECTED]>
wrote:

 Hi,
  

I want to call a map-reduce program recursively until some condition is
met.  How do I do that?

Thanks,

Shirley






  




LongSumReducer and TextOutputFormat

2008-07-29 Thread Stuart Sierra
Hello all,

I tried using o.a.h.mapred.lib.LongSumReducer in a wordcount-like job.
 My custom Mapper outputs Text keys and LongWritable values.  I
expected to get the Text keys in the final output.  But instead, I got
this:

[EMAIL PROTECTED] 12286
[EMAIL PROTECTED] 5037
[EMAIL PROTECTED] 29786
[EMAIL PROTECTED] 1219
[EMAIL PROTECTED] 1335
[EMAIL PROTECTED] 1705
[EMAIL PROTECTED] 3178

My job configuration looks like this:

job.setMapperClass(MyMap.class);
job.setReducerClass(LongSumReducer.class);

job.setInputFormat(SequenceFileInputFormat.class);

job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);

Anybody know what's going on here?  This is Hadoop 0.17.1.

Thanks,
-Stuart


Question about fault tolerance and fail over for name nodes

2008-07-29 Thread Jason Venner

What are people doing?

For jobs that have a long enough SLA, just shutting down the cluster and 
bringing up the secondary as the master works for us.
We have some jobs where that doesn't work well, because the recovery 
time is not acceptable.


There has been internal discussion of using drdb to hotfail a namenode 
to a backup so that the running job can continue.




Re: Question about fault tolerance and fail over for name nodes

2008-07-29 Thread Paco NATHAN
Jason,

FWIW -- based on a daily batch process, requiring 9 Hadoop jobs in
sequence -- 100+2 EC2 nodes, 2 Tb data, 6 hrs run time.

We tend to see a namenode failing early, e.g., the "problem advancing"
exception in the values iterator, particularly during a reduce phase.

Hot-fail would be great. Otherwise, given the duration of our batch
job overall, we use what you describe: shut down cluster, etc.

Would prefer to observe this kind of failure sooner than later. We've
discussed internally how to craft an initial job which could stress
test the namenode.  Think of a "unit test" for the cluster.

The business case for this becomes especially important when you need
to automate the Hadoop cluster launch, e.g. with RightScale or another
"cloud enabler" service.

Anybody else heading in this direction?

Paco


On Tue, Jul 29, 2008 at 11:01 AM, Jason Venner <[EMAIL PROTECTED]> wrote:
> What are people doing?
>
> For jobs that have a long enough SLA, just shutting down the cluster and
> bringing up the secondary as the master works for us.
> We have some jobs where that doesn't work well, because the recovery time is
> not acceptable.
>
> There has been internal discussion of using drdb to hotfail a namenode to a
> backup so that the running job can continue.


Re: How to control the map and reduce step sequentially

2008-07-29 Thread Daniel Yu
我现在在国外读书 我的毕业设计课题正好是用hadoop和hbase的 有一个中文社区是件挺不错的事
希望相关的文档资料都能及时跟进
2008/7/29 Xuebing Yan <[EMAIL PROTECTED]>

>
> 阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了,
> Hadoop 0.17的中文文档有可能在近期发布。
>
> -闫雪冰
>
> On Tue, 2008-07-29 at 16:25 +0800, rae l wrote:
> > 2008/7/29 晋光峰 <[EMAIL PROTECTED]>:
> > > I got it. Thanks!
> > >
> > > 2008/7/28 Shengkai Zhu <[EMAIL PROTECTED]>
> > >
> > >> The real reduce logic is actually started when all map tasks are
> finished.
> > >>
> > >> Is it still unexpected?
> > >>
> > >> 朱盛凯
> > >>
> > >> Jash Zhu
> > >>
> > >> 复旦大学软件学院
> >
> 根据我使用Hadoop和看过的Hadoop代码的经验,Reducer不会在Mapper之前运行;有时能观察到mapper先启动了,但也没有对程序运行的结果有影响;
> >
> > BTW:
> 原来有这么多国内的朋友在研究Hadoop啊,我也是在几个月前根据公司的任务开始研究和部署Hadoop;照此看来,如果我们建设一个Hadoop中文讨论区不知如何?或者哪位已知有了中文的Hadoop讨论区?根据PowerBy页面国内已经有了Koubei网已经在用上了:
> > http://wiki.apache.org/hadoop/PoweredBy
> >
> > --
> > 程任全
>
>


Re: LongSumReducer and TextOutputFormat

2008-07-29 Thread Stuart Sierra
NEVER MIND -- found the bug in my code.
Sorry,
-Stuart


On Tue, Jul 29, 2008 at 11:55 AM, Stuart Sierra <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I tried using o.a.h.mapred.lib.LongSumReducer in a wordcount-like job.
>  My custom Mapper outputs Text keys and LongWritable values.  I
> expected to get the Text keys in the final output.  But instead, I got
> this:
>
> [EMAIL PROTECTED] 12286
> [EMAIL PROTECTED] 5037
> [EMAIL PROTECTED] 29786
> [EMAIL PROTECTED] 1219
> [EMAIL PROTECTED] 1335
> [EMAIL PROTECTED] 1705
> [EMAIL PROTECTED] 3178
>
> My job configuration looks like this:
>
>job.setMapperClass(MyMap.class);
>job.setReducerClass(LongSumReducer.class);
>
>job.setInputFormat(SequenceFileInputFormat.class);
>
>job.setOutputFormat(TextOutputFormat.class);
>job.setOutputKeyClass(Text.class);
>job.setOutputValueClass(LongWritable.class);
>
> Anybody know what's going on here?  This is Hadoop 0.17.1.
>
> Thanks,
> -Stuart
>


Re: How to write one file per key as mapreduce output

2008-07-29 Thread Alejandro Abdelnur
On Thu, Jul 24, 2008 at 12:32 AM, Lincoln Ritter
<[EMAIL PROTECTED]> wrote:

> Alejandro said:
>> Take a look at the MultipleOutputFormat class or MultipleOutputs (in SVN tip)
>
> I'm muddling through both
> http://issues.apache.org/jira/browse/HADOOP-2906 and
> http://issues.apache.org/jira/browse/HADOOP-3149 trying to make sense
> of these.  I'm a little confused by the way this works but it looks
> like I can define a number of named outputs which looks like it
> enables different output formats and I can also define some of these
> as "multi", meaning that I can write to different "targets" (like
> files).  Is this correct?

Exactly.



> A couple of questions:
>
>  - I needed to pass 'null' to the collect method so as to not write
> the key to the file.  These files are meant to be consumable chunks of
> content so I want to control exactly what goes into them.  Does this
> seem normal or am i missing something?  Is there a downside to passing
> null here?

Not sure what happens if you write NULL as key or value.

>  - What is the 'part-0' file for?  I have seen this in other
> places in the dfs. But it seems extraneous here.  It's not super
> critical but if I can make it go away that would be great.

This is the standard output of the M/R job whatever is written the
OutputCollector you get in the reduce() call (or in the map() call
when reduce=0)

>  - What is the purpose of the '-r-0' suffix?  Perhaps it is to
> help with collisions?

Yes, files written from a map have '-m-', files written from a reduce have '-r-'

> I guess it seems strange that I can't just say
> "the output file should be called X" and have an output file called X
> appear.

Well, you need the map, reduce mask and the task number mask to avoid
collisions.


Re: How to control the map and reduce step sequentially

2008-07-29 Thread lohit
Wiki和文件应该帮助。 否则,请打开JIRA要求将帮助大家:)的更好的文献



- Original Message 
From: Daniel Yu <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, July 29, 2008 9:22:00 AM
Subject: Re: How to control the map and reduce step sequentially

我现在在国外读书 我的毕业设计课题正好是用hadoop和hbase的 有一个中文社区是件挺不错的事
希望相关的文档资料都能及时跟进
2008/7/29 Xuebing Yan <[EMAIL PROTECTED]>

>
> 阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了,
> Hadoop 0.17的中文文档有可能在近期发布。
>
> -闫雪冰
>
> On Tue, 2008-07-29 at 16:25 +0800, rae l wrote:
> > 2008/7/29 晋光峰 <[EMAIL PROTECTED]>:
> > > I got it. Thanks!
> > >
> > > 2008/7/28 Shengkai Zhu <[EMAIL PROTECTED]>
> > >
> > >> The real reduce logic is actually started when all map tasks are
> finished.
> > >>
> > >> Is it still unexpected?
> > >>
> > >> 朱盛凯
> > >>
> > >> Jash Zhu
> > >>
> > >> 复旦大学软件学院
> >
> 根据我使用Hadoop和看过的Hadoop代码的经验,Reducer不会在Mapper之前运行;有时能观察到mapper先启动了,但也没有对程序运行的结果有影响;
> >
> > BTW:
> 原来有这么多国内的朋友在研究Hadoop啊,我也是在几个月前根据公司的任务开始研究和部署Hadoop;照此看来,如果我们建设一个Hadoop中文讨论区不知如何?或者哪位已知有了中文的Hadoop讨论区?根据PowerBy页面国内已经有了Koubei网已经在用上了:
> > http://wiki.apache.org/hadoop/PoweredBy
> >
> > --
> > 程任全
>
>



Re: How to write one file per key as mapreduce output

2008-07-29 Thread Lincoln Ritter
Thanks for the info!

> Not sure what happens if you write NULL as key or value.

Looking at the code, it doesn't seem to really make a difference, and
the function in question (basically 'collect') looks to be robust to
null - but I may be missing something!

In my case, I basically want the key to be the output filename, and
the data in the files to be directly consumable by my app.  Having the
key show up in the file complicates things on the app side so I'm
trying to avoid this.  Passing null seems to work for now.


-lincoln

--
lincolnritter.com




On Tue, Jul 29, 2008 at 9:27 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 24, 2008 at 12:32 AM, Lincoln Ritter
> <[EMAIL PROTECTED]> wrote:
>
>> Alejandro said:
>>> Take a look at the MultipleOutputFormat class or MultipleOutputs (in SVN 
>>> tip)
>>
>> I'm muddling through both
>> http://issues.apache.org/jira/browse/HADOOP-2906 and
>> http://issues.apache.org/jira/browse/HADOOP-3149 trying to make sense
>> of these.  I'm a little confused by the way this works but it looks
>> like I can define a number of named outputs which looks like it
>> enables different output formats and I can also define some of these
>> as "multi", meaning that I can write to different "targets" (like
>> files).  Is this correct?
>
> Exactly.
>
> 
>
>> A couple of questions:
>>
>>  - I needed to pass 'null' to the collect method so as to not write
>> the key to the file.  These files are meant to be consumable chunks of
>> content so I want to control exactly what goes into them.  Does this
>> seem normal or am i missing something?  Is there a downside to passing
>> null here?
>
> Not sure what happens if you write NULL as key or value.
>
>>  - What is the 'part-0' file for?  I have seen this in other
>> places in the dfs. But it seems extraneous here.  It's not super
>> critical but if I can make it go away that would be great.
>
> This is the standard output of the M/R job whatever is written the
> OutputCollector you get in the reduce() call (or in the map() call
> when reduce=0)
>
>>  - What is the purpose of the '-r-0' suffix?  Perhaps it is to
>> help with collisions?
>
> Yes, files written from a map have '-m-', files written from a reduce have 
> '-r-'
>
>> I guess it seems strange that I can't just say
>> "the output file should be called X" and have an output file called X
>> appear.
>
> Well, you need the map, reduce mask and the task number mask to avoid
> collisions.
>


Re: How to write one file per key as mapreduce output

2008-07-29 Thread Alejandro Abdelnur
Then you may want to look at the MultipleOutputFile, it can do what you need.

On Tue, Jul 29, 2008 at 10:11 PM, Lincoln Ritter
<[EMAIL PROTECTED]> wrote:
> Thanks for the info!
>
>> Not sure what happens if you write NULL as key or value.
>
> Looking at the code, it doesn't seem to really make a difference, and
> the function in question (basically 'collect') looks to be robust to
> null - but I may be missing something!
>
> In my case, I basically want the key to be the output filename, and
> the data in the files to be directly consumable by my app.  Having the
> key show up in the file complicates things on the app side so I'm
> trying to avoid this.  Passing null seems to work for now.
>
>
> -lincoln
>
> --
> lincolnritter.com
>
>
>
>
> On Tue, Jul 29, 2008 at 9:27 AM, Alejandro Abdelnur <[EMAIL PROTECTED]> wrote:
>> On Thu, Jul 24, 2008 at 12:32 AM, Lincoln Ritter
>> <[EMAIL PROTECTED]> wrote:
>>
>>> Alejandro said:
 Take a look at the MultipleOutputFormat class or MultipleOutputs (in SVN 
 tip)
>>>
>>> I'm muddling through both
>>> http://issues.apache.org/jira/browse/HADOOP-2906 and
>>> http://issues.apache.org/jira/browse/HADOOP-3149 trying to make sense
>>> of these.  I'm a little confused by the way this works but it looks
>>> like I can define a number of named outputs which looks like it
>>> enables different output formats and I can also define some of these
>>> as "multi", meaning that I can write to different "targets" (like
>>> files).  Is this correct?
>>
>> Exactly.
>>
>> 
>>
>>> A couple of questions:
>>>
>>>  - I needed to pass 'null' to the collect method so as to not write
>>> the key to the file.  These files are meant to be consumable chunks of
>>> content so I want to control exactly what goes into them.  Does this
>>> seem normal or am i missing something?  Is there a downside to passing
>>> null here?
>>
>> Not sure what happens if you write NULL as key or value.
>>
>>>  - What is the 'part-0' file for?  I have seen this in other
>>> places in the dfs. But it seems extraneous here.  It's not super
>>> critical but if I can make it go away that would be great.
>>
>> This is the standard output of the M/R job whatever is written the
>> OutputCollector you get in the reduce() call (or in the map() call
>> when reduce=0)
>>
>>>  - What is the purpose of the '-r-0' suffix?  Perhaps it is to
>>> help with collisions?
>>
>> Yes, files written from a map have '-m-', files written from a reduce have 
>> '-r-'
>>
>>> I guess it seems strange that I can't just say
>>> "the output file should be called X" and have an output file called X
>>> appear.
>>
>> Well, you need the map, reduce mask and the task number mask to avoid
>> collisions.
>>
>


Re: iterative map-reduce

2008-07-29 Thread Paco NATHAN
A simple example of Hadoop application code which follows that pattern
(iterate until condition). In the "jyte" section:

   http://code.google.com/p/ceteri-mapred/

Loop and condition test are in the same code which calls ToolRunner
and JobClient.

Best,
Paco


On Tue, Jul 29, 2008 at 10:03 AM, Christian Ulrik Søttrup
<[EMAIL PROTECTED]> wrote:
> Hi Shirley,
>
> I am basically doing as Qin suggested.
> I am running a job iteratively until some condition is met.
> My main looks something like:(in pseudo code)
>
> main:
> while (!converged):
>  make new jobconf
>  setup jobconf
>  run jobconf
>  check reporter for statistics
>  decide if converged
>
> I use a custom reporter to check on the fitness of the solution in the
> reduce phase.
>
> If you need more(real java) code drop me a line.
>
> Cheers,
> Christian


Multiple master nodes

2008-07-29 Thread Ryan Shih
Dear Hadoop Community --

I am wondering if it is already possible or in the plans to add capability
for multiple master nodes. I'm in a situation where I have a master node
that may potentially be in a less than ideal execution and networking
environment. For this reason, it's possible that the master node could die
at any time. On the other hand, the application must always be available. I
have accessible to me other machines but I'm still unclear on the best
method to add reliability.

Here are a few options that I'm exploring:
a) To create a completely secondary Hadoop cluster that we can flip to when
we detect that the master node has died. This will double hardware costs, so
if we originally have a 5 node cluster, then we would need to pull 5 more
machines out of somewhere for this decision. This is not the preferable
choice.
b) Just mirror the master node via other always available software, such as
DRBD for real time synchronization. Upon detection we could swap to the
alternate node.
c) Or if Hadoop had some functionality already in place, it would be
fantastic to be able to take advantage of that. I don't know if anything
like this is available but I could not find anything as of yet. It seems to
me, however, that having multiple master nodes would be the direction Hadoop
needs to go if it is to be useful in high availability applications. I was
told there are some papers on Amazon's Elastic Computing that I'm about to
look for that follow this approach.

In any case, could someone with experience in solving this type of problem
share how they approached this issue?

Thanks!


Re: Multiple master nodes

2008-07-29 Thread paul
I'm currently running with your option B setup and it seems to be reliable
for me (so far).  I use a combination of drbd and various hearbeat/LinuxHA
scripts that handle the failover process, including a virtual IP for the
namenode.  I haven't had any real-world unexpected failures to deal with,
yet, but all manual testing has had consistent and reliable results.



-paul


On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih <[EMAIL PROTECTED]> wrote:

> Dear Hadoop Community --
>
> I am wondering if it is already possible or in the plans to add capability
> for multiple master nodes. I'm in a situation where I have a master node
> that may potentially be in a less than ideal execution and networking
> environment. For this reason, it's possible that the master node could die
> at any time. On the other hand, the application must always be available. I
> have accessible to me other machines but I'm still unclear on the best
> method to add reliability.
>
> Here are a few options that I'm exploring:
> a) To create a completely secondary Hadoop cluster that we can flip to when
> we detect that the master node has died. This will double hardware costs,
> so
> if we originally have a 5 node cluster, then we would need to pull 5 more
> machines out of somewhere for this decision. This is not the preferable
> choice.
> b) Just mirror the master node via other always available software, such as
> DRBD for real time synchronization. Upon detection we could swap to the
> alternate node.
> c) Or if Hadoop had some functionality already in place, it would be
> fantastic to be able to take advantage of that. I don't know if anything
> like this is available but I could not find anything as of yet. It seems to
> me, however, that having multiple master nodes would be the direction
> Hadoop
> needs to go if it is to be useful in high availability applications. I was
> told there are some papers on Amazon's Elastic Computing that I'm about to
> look for that follow this approach.
>
> In any case, could someone with experience in solving this type of problem
> share how they approached this issue?
>
> Thanks!
>


Re: Multiple master nodes

2008-07-29 Thread lohit
It would be really helpful for many if you could create a twiki of this. Those 
ideas could be used while implementing HA.
Thanks,
Lohit



- Original Message 
From: paul <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, July 29, 2008 11:56:44 AM
Subject: Re: Multiple master nodes

I'm currently running with your option B setup and it seems to be reliable
for me (so far).  I use a combination of drbd and various hearbeat/LinuxHA
scripts that handle the failover process, including a virtual IP for the
namenode.  I haven't had any real-world unexpected failures to deal with,
yet, but all manual testing has had consistent and reliable results.



-paul


On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih <[EMAIL PROTECTED]> wrote:

> Dear Hadoop Community --
>
> I am wondering if it is already possible or in the plans to add capability
> for multiple master nodes. I'm in a situation where I have a master node
> that may potentially be in a less than ideal execution and networking
> environment. For this reason, it's possible that the master node could die
> at any time. On the other hand, the application must always be available. I
> have accessible to me other machines but I'm still unclear on the best
> method to add reliability.
>
> Here are a few options that I'm exploring:
> a) To create a completely secondary Hadoop cluster that we can flip to when
> we detect that the master node has died. This will double hardware costs,
> so
> if we originally have a 5 node cluster, then we would need to pull 5 more
> machines out of somewhere for this decision. This is not the preferable
> choice.
> b) Just mirror the master node via other always available software, such as
> DRBD for real time synchronization. Upon detection we could swap to the
> alternate node.
> c) Or if Hadoop had some functionality already in place, it would be
> fantastic to be able to take advantage of that. I don't know if anything
> like this is available but I could not find anything as of yet. It seems to
> me, however, that having multiple master nodes would be the direction
> Hadoop
> needs to go if it is to be useful in high availability applications. I was
> told there are some papers on Amazon's Elastic Computing that I'm about to
> look for that follow this approach.
>
> In any case, could someone with experience in solving this type of problem
> share how they approached this issue?
>
> Thanks!
>



Re: Question about fault tolerance and fail over for name nodes

2008-07-29 Thread Andreas Kostyrka
On Tuesday 29 July 2008 18:22:07 Paco NATHAN wrote:
> Jason,
>
> FWIW -- based on a daily batch process, requiring 9 Hadoop jobs in
> sequence -- 100+2 EC2 nodes, 2 Tb data, 6 hrs run time.
>
> We tend to see a namenode failing early, e.g., the "problem advancing"
> exception in the values iterator, particularly during a reduce phase.
>
> Hot-fail would be great. Otherwise, given the duration of our batch
> job overall, we use what you describe: shut down cluster, etc.
>
> Would prefer to observe this kind of failure sooner than later. We've
> discussed internally how to craft an initial job which could stress
> test the namenode.  Think of a "unit test" for the cluster.

ssh namenode 'kill -9 $(ps ax | grep java.*NameNode | cut -f 1 -d " " )'

Here goes your namenode failure, if you just want to do the exercise for a 
failover ;)

Andreas

>
> The business case for this becomes especially important when you need
> to automate the Hadoop cluster launch, e.g. with RightScale or another
> "cloud enabler" service.
>
> Anybody else heading in this direction?
>
> Paco
>
> On Tue, Jul 29, 2008 at 11:01 AM, Jason Venner <[EMAIL PROTECTED]> wrote:
> > What are people doing?
> >
> > For jobs that have a long enough SLA, just shutting down the cluster and
> > bringing up the secondary as the master works for us.
> > We have some jobs where that doesn't work well, because the recovery time
> > is not acceptable.
> >
> > There has been internal discussion of using drdb to hotfail a namenode to
> > a backup so that the running job can continue.




signature.asc
Description: This is a digitally signed message part.


IllegalStateException: Job tracker still initializing

2008-07-29 Thread Marco
Hi,

Using hadoop 0.17.1, we see this exception consistently after one job has run 
and the dfs has been restarted. Unless we delete the dfs/data dir and reformat 
the namenode, the start of a new job seems to be blocked by this exception. Is 
there a workaround? Thanks


org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.mapred.JobTracker$IllegalStateException: Job tracker still 
initializing
    at org.apache.hadoop.mapred.JobTracker.ensureRunning(JobTracker.java:1722)
    at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:1730)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

    at org.apache.hadoop.ipc.Client.call(Client.java:557)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
    at $Proxy18.getNewJobId(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy18.getNewJobId(Unknown Source)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:696)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
    at ...




  

how to LZO

2008-07-29 Thread Stefan Groschupf

Hi,
I would love to use lzo codec. However for some reasons I always only  
get  ...
"INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native- 
hadoop library
INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully  
loaded & initialized native-zlib library"


My hadoop-site looks like:

 io.compression.codecs  
< 
value 
> 
org 
.apache 
.hadoop 
.io 
.compress 
.LzoCodec 
,org 
.apache 
.hadoop 
.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodecvalue> A list of the compression codec classes that can  
be used for compression/decompression. 

I also think I have lzo installed on all my nodes:
yum list | grep lzo
lzo.x86_64 2.02-3.fc8 installed
lzo.i386 2.02-3.fc8 installed
lzo-devel.i386 2.02-3.fc8 fedora
lzo-devel.x86_64 2.02-3.fc8 fedora
 lzop.x86_64 1.02-0.5.rc1.fc8 fedora
Anything I miss you could think of?
Thanks for any hints!

Stefan



Re: how to LZO

2008-07-29 Thread Chris Douglas

Stefan-

Where and how are you trying to use lzo? As input/output? For  
intermediate compression? In SequenceFiles? -C


On Jul 29, 2008, at 4:32 PM, Stefan Groschupf wrote:


Hi,
I would love to use lzo codec. However for some reasons I always  
only get  ...
"INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native- 
hadoop library
INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully  
loaded & initialized native-zlib library"


My hadoop-site looks like:

 io.compression.codecs  
< 
value 
> 
org 
.apache 
.hadoop 
.io 
.compress 
.LzoCodec 
,org 
.apache 
.hadoop 
.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodecvalue> A list of the compression codec classes that can  
be used for compression/decompression. 

I also think I have lzo installed on all my nodes:
yum list | grep lzo
lzo.x86_64 2.02-3.fc8 installed
lzo.i386 2.02-3.fc8 installed
lzo-devel.i386 2.02-3.fc8 fedora
lzo-devel.x86_64 2.02-3.fc8 fedora
lzop.x86_64 1.02-0.5.rc1.fc8 fedora
Anything I miss you could think of?
Thanks for any hints!

Stefan





Hadoop 4 disks per server

2008-07-29 Thread Rafael Turk
Hi All,

 I´m setting up a cluster with 4 disks per server. Is there any way to make
Hadoop aware of this setup and take benefits from that?

 *** I´m not planning to set RAID in each node (only on the namenode server)
since HA is granted by the HDFS.

Thanks.
--Rafael


Re: Hadoop 4 disks per server

2008-07-29 Thread Allen Wittenauer
On 7/29/08 6:37 PM, "Rafael Turk" <[EMAIL PROTECTED]> wrote:
>  I´m setting up a cluster with 4 disks per server. Is there any way to make
> Hadoop aware of this setup and take benefits from that?

This is how we run our nodes.  You just need to list the four file
systems in the configuration files and the datanode and map/red processes
will know what to do.



Re: Hadoop 4 disks per server

2008-07-29 Thread James Moore
On Tue, Jul 29, 2008 at 6:37 PM, Rafael Turk <[EMAIL PROTECTED]> wrote:
> Hi All,
>
>  I´m setting up a cluster with 4 disks per server. Is there any way to make
> Hadoop aware of this setup and take benefits from that?

I believe all you need to do is give four directories (one on each
drive) as  the value for dfs.data.dir and mapred.local.dir.  Something
like:


  dfs.data.dir
  
/drive1/myDfsDir,/drive2/myDfsDir,/drive3/myDfsDir,/drive4/myDfsDir
  Determines where on the local filesystem an DFS data node
  should store its blocks.  If this is a comma-delimited
  list of directories, then data will be stored in all named
  directories, typically on different devices.
  Directories that do not exist are ignored.
  


-- 
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com


Re: Multiple master nodes

2008-07-29 Thread Ryan Shih
Thanks Paul. Sounds like that's the way to go then. We're just starting to
experiment a bit with DRBD so we'll give that a shot and see how it works
out.

On Tue, Jul 29, 2008 at 11:56 AM, paul <[EMAIL PROTECTED]> wrote:

> I'm currently running with your option B setup and it seems to be reliable
> for me (so far).  I use a combination of drbd and various hearbeat/LinuxHA
> scripts that handle the failover process, including a virtual IP for the
> namenode.  I haven't had any real-world unexpected failures to deal with,
> yet, but all manual testing has had consistent and reliable results.
>
>
>
> -paul
>
>
> On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih <[EMAIL PROTECTED]> wrote:
>
> > Dear Hadoop Community --
> >
> > I am wondering if it is already possible or in the plans to add
> capability
> > for multiple master nodes. I'm in a situation where I have a master node
> > that may potentially be in a less than ideal execution and networking
> > environment. For this reason, it's possible that the master node could
> die
> > at any time. On the other hand, the application must always be available.
> I
> > have accessible to me other machines but I'm still unclear on the best
> > method to add reliability.
> >
> > Here are a few options that I'm exploring:
> > a) To create a completely secondary Hadoop cluster that we can flip to
> when
> > we detect that the master node has died. This will double hardware costs,
> > so
> > if we originally have a 5 node cluster, then we would need to pull 5 more
> > machines out of somewhere for this decision. This is not the preferable
> > choice.
> > b) Just mirror the master node via other always available software, such
> as
> > DRBD for real time synchronization. Upon detection we could swap to the
> > alternate node.
> > c) Or if Hadoop had some functionality already in place, it would be
> > fantastic to be able to take advantage of that. I don't know if anything
> > like this is available but I could not find anything as of yet. It seems
> to
> > me, however, that having multiple master nodes would be the direction
> > Hadoop
> > needs to go if it is to be useful in high availability applications. I
> was
> > told there are some papers on Amazon's Elastic Computing that I'm about
> to
> > look for that follow this approach.
> >
> > In any case, could someone with experience in solving this type of
> problem
> > share how they approached this issue?
> >
> > Thanks!
> >
>