On Fri, Jun 19, 2009 at 10:37 PM, Allen Wittenauer <a...@yahoo-inc.com> wrote:
> On 6/19/09 3:49 AM, "Harish Mallipeddi" <harish.mallipe...@gmail.com> > wrote: > > Why do you want to do this in the first place? It seems like you want > > cluster1 to be a plain HDFS cluster and cluster2 to be a mapred cluster. > > Doing something like that will be disastrous - Hadoop is all about > sending > > computation closer to your data. If you don't want that, you need not > even > > use hadoop. > > Given some of the limitations with HDFS (quota operability, security), > I > can easily why it would be desirable to have static data coming from one > grid while doing computation/intermediate outputs/real output to another. > > Using performance as your sole metric of viability is a bigger disaster > waiting to happen. "Sure, we crashed the file system, but look how fast it > went down in flames!" > > Well apart from doing a distcp between the 2 clusters periodically, I don't see how this can be done in a way that would yield acceptable performance. -- Harish Mallipeddi http://blog.poundbang.in