Re: Specifying NameNode externally to hadoop-site.xml

2009-05-26 Thread Stas Oskin
Hi.

Thanks for the tip.

Regards.

2009/5/26 Aaron Kimball 

> Same way.
>
> Configuration conf = new Configuration();
> conf.set("fs.default.name", "hdfs://foo");
> FileSystem fs = FileSystem.get(conf);
>
> - Aaron
>
> On Mon, May 25, 2009 at 1:02 PM, Stas Oskin  wrote:
>
> > Hi.
> >
> > And if I don't use jobs but only DFS for now?
> >
> > Regards.
> >
> > 2009/5/25 jason hadoop 
> >
> > > conf.set("fs.default.name", "hdfs://host:port");
> > > where conf is the JobConf object of your job, before you submit it.
> > >
> > >
> > > On Mon, May 25, 2009 at 10:16 AM, Stas Oskin 
> > wrote:
> > >
> > > > Hi.
> > > >
> > > > Thanks for the tip, but is it possible to set this in dynamic way via
> > > code?
> > > >
> > > > Thanks.
> > > >
> > > > 2009/5/25 jason hadoop 
> > > >
> > > > > if you launch your jobs via bin/hadoop jar jar_file [main class]
> > > >  [options]
> > > > >
> > > > > you can simply specify -fs hdfs://host:port before the jar_file
> > > > >
> > > > > On Sun, May 24, 2009 at 3:02 PM, Stas Oskin 
> > > > wrote:
> > > > >
> > > > > > Hi.
> > > > > >
> > > > > > I'm looking to move the Hadoop NameNode URL outside the
> > > hadoop-site.xml
> > > > > > file, so I could set it at the run-time.
> > > > > >
> > > > > > Any idea how to do it?
> > > > > >
> > > > > > Or perhaps there is another configuration that can be applied to
> > the
> > > > > > FileSystem object?
> > > > > >
> > > > > > Regards.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Alpha Chapters of my book on Hadoop are available
> > > > > http://www.apress.com/book/view/9781430219422
> > > > > www.prohadoopbook.com a community for Hadoop Professionals
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Alpha Chapters of my book on Hadoop are available
> > > http://www.apress.com/book/view/9781430219422
> > > www.prohadoopbook.com a community for Hadoop Professionals
> > >
> >
>


Re: Setting up another machine as secondary node

2009-05-26 Thread Rakhi Khatwani
Hi,
   I followed the instructions suggested by you all. but i still
come across this exception when i use the following command:
./hadoop-daemon.sh start namenode -importCheckpoint

the exception is as follows:
2009-05-26 14:43:48,004 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = germapp/192.168.0.1
STARTUP_MSG:   args = [-importCheckpoint]
STARTUP_MSG:   version = 0.19.0
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
/
2009-05-26 14:43:48,147 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=4
2009-05-26 14:43:48,154 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
germapp/192.168.0.1:4
2009-05-26 14:43:48,160 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2009-05-26 14:43:48,166 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-05-26 14:43:48,316 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=ithurs,ithurs
2009-05-26 14:43:48,317 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
2009-05-26 14:43:48,317 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2009-05-26 14:43:48,343 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-05-26 14:43:48,347 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2009-05-26 14:43:48,455 INFO
org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/tmp/hadoop-ithurs/dfs/name is not formatted.
2009-05-26 14:43:48,455 INFO
org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2009-05-26 14:43:48,457 INFO
org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage
/tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
2009-05-26 14:43:48,460 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: Cannot lock storage
/tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:273)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:504)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:344)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:290)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
2009-05-26 14:43:48,464 INFO org.apache.hadoop.ipc.Server: Stopping
server on 4
2009-05-26 14:43:48,466 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
Cannot lock storage /tmp/hadoop-ithurs/dfs/namesecondary. The
directory is already locked.
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:273)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:504)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:344)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:290)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
a

HDFS data for HBase and other projects

2009-05-26 Thread Wasim Bari
Hi,
 If we have already data stored in HDFS. Which of following sub-projects 
can use this data for further processing/operations:

  1.. Pig 
  2.. HBase 
  3.. ZooKeeper 
  4.. Hive 
  5.. Any other Hadoop related project

Thanks,

Wasim 


Bisection bandwidth needed for a small Hadoop cluster

2009-05-26 Thread stephen mulcahy

Hi,

Has anyone here investigated what level of bisection bandwidth is needed 
for a Hadoop cluster which spans more than one rack?


I'm currently sizing and planning a new Hadoop cluster and I'm wondering 
what the performance implications will be if we end up with a cluster 
spread across two racks. I'd expect we'll have one 48-port gigabit 
switch in each 42u rack. If we end up with 60 systems spread across 
these two switches - how much bandwidth should I have between the racks?


I'll have 6 gigabit ports available for links between racks - i.e. up to 
6 Gbps. Would this be sufficient bisection bandwidth for Hadoop or 
should I be considering increased bandwidth between racks (maybe using 
fibre links between the switches or introducing another switch)?


Thanks for any thoughts on this.

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com


Re: ssh issues

2009-05-26 Thread Steve Loughran

hmar...@umbc.edu wrote:

Steve,

Security through obscurity is always a good practice from a development
standpoint and one of the reasons why tricking you out is an easy task.


:)

My most recent presentation on HDFS clusters is now online, notice how it
doesn't gloss over the security: 
http://www.slideshare.net/steve_l/hdfs-issues



Please, keep hiding relevant details from people in order to keep everyone
smiling.



HDFS is as secure as NFS: you are trusted to be who you say you are. 
Which means that you have to run it on a secured subnet -access 
restricted to trusted hosts and/or one two front end servers or accept 
that your dataset is readable and writeable by anyone on the network.


There is user identification going in; it is currently at the level 
where it will stop someone accidentally deleting the entire filesystem 
if they lack the rights. Which has been known to happen.


If the team looking after the cluster demand separate SSH keys/login for 
every machine then not only are they making their operations costs high, 
once you have got the HDFS cluster and MR engine live, it's moot. You 
can push out work to the JobTracker, which then runs it on the machines, 
under whatever userid the TaskTrackers are running on. Now,  0.20+ will 
run it under the identity of the user who claimed to be submitting the 
job, but without that, your MR Jobs get the access rights to the 
filesystem of the user that is running the TT, but it's fairly 
straightforward to create a modified hadoop client JAR that doesn't call 
whoami to get the userid, and instead spoofs to be anyone. Which means 
that even if you lock down the filesystem -no out of datacentre access-, 
if I can run my java code as MR jobs in your cluster, I can have 
unrestricted access to the filesystem by way of the task tracker server.


But Hal, if you are running Ant for your build I'm running my code on 
your machines anyway, so you had better be glad that I'm not malicious.


-Steve


When directly writing to HDFS, the data is moved only on file close

2009-05-26 Thread Stas Oskin
Hi.

I'm trying to continuously write data to HDFS via OutputStream(), and want
to be able to read it at the same time from another client.

Problem is, that after the file is created on HDFS with size of 0, it stays
that way, and only fills up when I close the OutputStream().

Here is a simple code sample illustrating this issue:

try {

FSDataOutputStream out=fileSystem.create(new
Path("/test/test.bin")); // Here the file created with 0 size
for(int i=0;i<1000;i++)
{
out.write(1); // Still stays 0
out.flush(); // Even when I flush it out???
}

Thread.currentThread().sleep(1);
out.close(); //Only here the file is updated
} catch (Exception e) {
e.printStackTrace();
}

So, two questions here:

1) How it's possible to write the files directly to HDFS, and have them
update there immedaitely?
2) Just for information, in this case, where the file content stays all the
time - on server local disk, in memory, etc...?

Thanks in advance.


InputStream.open() efficiency

2009-05-26 Thread Stas Oskin
Hi.

I'm looking to find out, how the InputStream.open() + skip(), compares to
keeping a handle of InputStream() and just seeking the position.

Has anyone compared these approaches, and can advice on their speed?

Regards.


Re: When directly writing to HDFS, the data is moved only on file close

2009-05-26 Thread Tom White
This feature is not available yet, and is still under active
discussion. (The current version of HDFS will make the previous block
available to readers.) Michael Stack gave a good summary on the HBase
dev list:

http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3c7c962aed0905231601g533088ebj4a7a068505ba3...@mail.gmail.com%3e

Tom

On Tue, May 26, 2009 at 12:08 PM, Stas Oskin  wrote:
> Hi.
>
> I'm trying to continuously write data to HDFS via OutputStream(), and want
> to be able to read it at the same time from another client.
>
> Problem is, that after the file is created on HDFS with size of 0, it stays
> that way, and only fills up when I close the OutputStream().
>
> Here is a simple code sample illustrating this issue:
>
> try {
>
>            FSDataOutputStream out=fileSystem.create(new
> Path("/test/test.bin")); // Here the file created with 0 size
>            for(int i=0;i<1000;i++)
>            {
>                out.write(1); // Still stays 0
>                out.flush(); // Even when I flush it out???
>            }
>
>            Thread.currentThread().sleep(1);
>            out.close(); //Only here the file is updated
>        } catch (Exception e) {
>            e.printStackTrace();
>        }
>
> So, two questions here:
>
> 1) How it's possible to write the files directly to HDFS, and have them
> update there immedaitely?
> 2) Just for information, in this case, where the file content stays all the
> time - on server local disk, in memory, etc...?
>
> Thanks in advance.
>


Re: When directly writing to HDFS, the data is moved only on file close

2009-05-26 Thread Stas Oskin
Hi.

Does it means there is no way to access the data being written to HDFS,
while it's written?

Where it's stored then via the writing - on cluster or on local disks?

Thanks.

2009/5/26 Tom White 

> This feature is not available yet, and is still under active
> discussion. (The current version of HDFS will make the previous block
> available to readers.) Michael Stack gave a good summary on the HBase
> dev list:
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3c7c962aed0905231601g533088ebj4a7a068505ba3...@mail.gmail.com%3e
>
> Tom
>
> On Tue, May 26, 2009 at 12:08 PM, Stas Oskin  wrote:
> > Hi.
> >
> > I'm trying to continuously write data to HDFS via OutputStream(), and
> want
> > to be able to read it at the same time from another client.
> >
> > Problem is, that after the file is created on HDFS with size of 0, it
> stays
> > that way, and only fills up when I close the OutputStream().
> >
> > Here is a simple code sample illustrating this issue:
> >
> > try {
> >
> >FSDataOutputStream out=fileSystem.create(new
> > Path("/test/test.bin")); // Here the file created with 0 size
> >for(int i=0;i<1000;i++)
> >{
> >out.write(1); // Still stays 0
> >out.flush(); // Even when I flush it out???
> >}
> >
> >Thread.currentThread().sleep(1);
> >out.close(); //Only here the file is updated
> >} catch (Exception e) {
> >e.printStackTrace();
> >}
> >
> > So, two questions here:
> >
> > 1) How it's possible to write the files directly to HDFS, and have them
> > update there immedaitely?
> > 2) Just for information, in this case, where the file content stays all
> the
> > time - on server local disk, in memory, etc...?
> >
> > Thanks in advance.
> >
>


Re: HDFS data for HBase and other projects

2009-05-26 Thread Todd Lipcon
On Tue, May 26, 2009 at 2:28 AM, Wasim Bari  wrote:

> Hi,
> If we have already data stored in HDFS. Which of following sub-projects
> can use this data for further processing/operations:
>
>  1.. Pig
>
Yes

 2.. HBase
>
No, needs to be imported into HBase

 3.. ZooKeeper
>

No, not used for processing data

4.. Hive
>

Yes, if it's in a well defined format


> 5.. Any other Hadoop related project
>

Maybe?


Sorry for the vague answer, but you'll have to ask a specific question to
get a specific answer. I recommend reading the web pages for the projects
and checking out the "Getting Started" guides.

-Todd


Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-26 Thread Grace
Yep. I have tried the option apred.job.reuse.jvm.num.tasks with -1. There is
no big difference. Furthermore, even using this option with -1, there are
still several task jvms besides the tasktracker jvm.



On Mon, May 18, 2009 at 9:34 PM, Steve Loughran  wrote:

> Tom White wrote:
>
>> On Mon, May 18, 2009 at 11:44 AM, Steve Loughran 
>> wrote:
>>
>>> Grace wrote:
>>>
 To follow up this question, I have also asked help on Jrockit forum.
 They
 kindly offered some useful and detailed suggestions according to the JRA
 results. After updating the option list, the performance did become
 better
 to some extend. But it is still not comparable with the Sun JVM. Maybe,
 it
 is due to the use case with short duration and different implementation
 in
 JVM layer between Sun and Jrockit. I would like to be back to use Sun
 JVM
 currently. Thanks all for your time and help.

  what about flipping the switch that says "run tasks in the TT's own
>>> JVM?".
>>> That should handle startup costs, and reduce the memory footprint
>>>
>>>
>> The property mapred.job.reuse.jvm.num.tasks allows you to set how many
>> tasks the JVM may be reused for (within a job), but it always runs in
>> a separate JVM to the tasktracker. (BTW
>> https://issues.apache.org/jira/browse/HADOOP-3675has some discussion
>> about running tasks in the tasktracker's JVM).
>>
>> Tom
>>
>
> Tom,
> that's why you are writing a book on Hadoop and I'm not ...you know the
> answers and I have some vague misunderstandings,
>
> -steve
> (returning to the svn book)
>


PNW Hadoop + Apache Cloud Stack Meetup, Wed. May 27th:

2009-05-26 Thread Bradford Stephens
Greetings,
This is a friendly reminder that the 1st meetup for the PNW Hadoop + Apache
Cloud Stack User Group is THIS WEDNESDAY at 6:45pm. We're very excited to
have everyone attend!

University of Washington, Allen Center Room 303, at 6:45pm on Wednesday, May
27, 2009.
I'm going to put together a map, and a wiki so we can collab.

The Allen Center is located here:
http://www.washington.edu/home/maps/?CSE

What I'm envisioning is a meetup for about 2 hours: we'll have two in-depth
talks of 15-20 minutes each, and then several "lightning talks" of 5
minutes. We'll then have discussion and 'social time'.
Let me know if you're interested in speaking or attending.

I'd like to focus on education, so every presentation *needs* to ask some
questions at the end. We can talk about these after the presentations, and
I'll record what we've learned in a wiki and share that with the rest of
us.

Looking forward to meeting you all!


Cheers,
Bradford Stephens


Re: Setting up another machine as secondary node

2009-05-26 Thread Konstantin Shvachko

Hi Rakhi,

This is because your name-node is trying to -importCheckpoint from a directory,
which is locked by secondary name-node.
The secondary node is also running in your case, right?
You should use -importCheckpoint as the last resort, when name-node's 
directories
are damaged.
In regular case you start name-node with
./hadoop-daemon.sh start namenode

Thanks,
--Konstantin

Rakhi Khatwani wrote:

Hi,
   I followed the instructions suggested by you all. but i still
come across this exception when i use the following command:
./hadoop-daemon.sh start namenode -importCheckpoint

the exception is as follows:
2009-05-26 14:43:48,004 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = germapp/192.168.0.1
STARTUP_MSG:   args = [-importCheckpoint]
STARTUP_MSG:   version = 0.19.0
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
/
2009-05-26 14:43:48,147 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=4
2009-05-26 14:43:48,154 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
germapp/192.168.0.1:4
2009-05-26 14:43:48,160 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2009-05-26 14:43:48,166 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-05-26 14:43:48,316 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=ithurs,ithurs
2009-05-26 14:43:48,317 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
2009-05-26 14:43:48,317 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2009-05-26 14:43:48,343 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-05-26 14:43:48,347 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2009-05-26 14:43:48,455 INFO
org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/tmp/hadoop-ithurs/dfs/name is not formatted.
2009-05-26 14:43:48,455 INFO
org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2009-05-26 14:43:48,457 INFO
org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage
/tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
2009-05-26 14:43:48,460 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: Cannot lock storage
/tmp/hadoop-ithurs/dfs/namesecondary. The directory is already locked.
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:273)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:504)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:344)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:290)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
2009-05-26 14:43:48,464 INFO org.apache.hadoop.ipc.Server: Stopping
server on 4
2009-05-26 14:43:48,466 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
Cannot lock storage /tmp/hadoop-ithurs/dfs/namesecondary. The
directory is already locked.
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:510)
at 
org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:363)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:273)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doImportCheckpoint(FSImage.java:504)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:344)
at 
org.apache.hadoop.hdfs.server.namenode.FSDir

Re: ssh issues

2009-05-26 Thread Allen Wittenauer



On 5/26/09 3:40 AM, "Steve Loughran"  wrote:
> HDFS is as secure as NFS: you are trusted to be who you say you are.
> Which means that you have to run it on a secured subnet -access
> restricted to trusted hosts and/or one two front end servers or accept
> that your dataset is readable and writeable by anyone on the network.
> 
> There is user identification going in; it is currently at the level
> where it will stop someone accidentally deleting the entire filesystem
> if they lack the rights. Which has been known to happen.

Actually, I'd argue that HDFS is worse than even rudimentary NFS
implementations.  Off the top of my head:

a) There is no equivalent of squash root/force anonymous.  Any host can
assume privilege.
 
b) There is no 'read only from these hosts'.  If you can read blocks
over Hadoop RPC, you can write as well (minus safe mode).



Re: When directly writing to HDFS, the data is moved only on file close

2009-05-26 Thread Stas Oskin
Hi.

You probably referring to the following paragraph?

After some back and forth over a set of slides presented by Sanjay on
work being done by Hairong as part of HADOOP-5744, "Revising append",
the room settled on API3 from the list of options below as the
priority feature needed by HADOOP 0.21.0.  Readers must be able to
read up to the last writer 'successful' flush.  Its not important that
the file length is 'inexact'.

If I'm understand correctly, this, means the data actually gets written to
cluster - but it's not visible until the block is closed.
Work is ongoing to allow in version 0.21 to make the file visible on
flush().

Am I correct up to here?

Regards.


2009/5/26 Tom White 
>
> This feature is not available yet, and is still under active
>> discussion. (The current version of HDFS will make the previous block
>> available to readers.) Michael Stack gave a good summary on the HBase
>> dev list:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3c7c962aed0905231601g533088ebj4a7a068505ba3...@mail.gmail.com%3e
>>
>> Tom
>>
>> On Tue, May 26, 2009 at 12:08 PM, Stas Oskin 
>> wrote:
>> > Hi.
>> >
>> > I'm trying to continuously write data to HDFS via OutputStream(), and
>> want
>> > to be able to read it at the same time from another client.
>> >
>> > Problem is, that after the file is created on HDFS with size of 0, it
>> stays
>> > that way, and only fills up when I close the OutputStream().
>> >
>> > Here is a simple code sample illustrating this issue:
>> >
>> > try {
>> >
>> >FSDataOutputStream out=fileSystem.create(new
>> > Path("/test/test.bin")); // Here the file created with 0 size
>> >for(int i=0;i<1000;i++)
>> >{
>> >out.write(1); // Still stays 0
>> >out.flush(); // Even when I flush it out???
>> >}
>> >
>> >Thread.currentThread().sleep(1);
>> >out.close(); //Only here the file is updated
>> >} catch (Exception e) {
>> >e.printStackTrace();
>> >}
>> >
>> > So, two questions here:
>> >
>> > 1) How it's possible to write the files directly to HDFS, and have them
>> > update there immedaitely?
>> > 2) Just for information, in this case, where the file content stays all
>> the
>> > time - on server local disk, in memory, etc...?
>> >
>> > Thanks in advance.
>> >
>>
>


Re: InputStream.open() efficiency

2009-05-26 Thread Raghu Angadi


'in.seek(); in.read()' is certainly better than,
'in = fs.open(); in.seek(); in.read()'

The difference is is exactly one open() call. So you would save an RPC 
to NameNode.


There are couple of issues that affect apps that keep the handlers open 
very long time (many hours to days).. but those will be fixed soon.


Raghu.

Stas Oskin wrote:

Hi.

I'm looking to find out, how the InputStream.open() + skip(), compares to
keeping a handle of InputStream() and just seeking the position.

Has anyone compared these approaches, and can advice on their speed?

Regards.





Re: Username in Hadoop cluster

2009-05-26 Thread Alex Loddengaard
It looks to me like you didn't install Hadoop consistently across all nodes.

xxx.xx.xx.251: bash:
> /home/utdhadoop1/Hadoop/

hadoop-0.18.3/bin/hadoop-daemon.sh: No such file or
directory

The above makes me suspect that xxx.xx.xx.251 has Hadoop installed
differently.  Can you try and locate hadoop-daemon.sh on xxx.xx.xx.251 and
adjust its location properly?

Alex

On Mon, May 25, 2009 at 10:25 PM, Pankil Doshi  wrote:

> Hello,
>
> I tried adding "usern...@hostname" for eachentry in slaves file.
>
> My slave file have 2 data nodes.it looks like below
>
> localhost
> utdhado...@xxx.xx.xx.229
> utdhad...@xxx.xx.xx.251
>
>
> error what I get when i start dfs is as below:
>
> starting namenode, logging to
>
> /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/../logs/hadoop-utdhadoop1-namenode-opencirrus-992.hpl.hp.com.out
> xxx.xx.xx.229: starting datanode, logging to
>
> /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/../logs/hadoop-utdhadoop1-datanode-opencirrus-992.hpl.hp.com.out
> *xxx.xx.xx.251: bash: line 0: cd:
> /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/..: No such file or directory
> xxx.xx.xx.251: bash:
> /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/hadoop-daemon.sh: No such file or
> directory
> *localhost: datanode running as process 25814. Stop it first.
> xxx.xx.xx.229: starting secondarynamenode, logging to
>
> /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/../logs/hadoop-utdhadoop1-secondarynamenode-opencirrus-992.hpl.hp.com.out
> localhost: secondarynamenode running as process 25959. Stop it first.
>
>
>
> Basically it looks for "* /home/utdhadoop1/Hadoop/**
> hadoop-0.18.3/bin/hadoop-**daemon.sh"
> but instead it should look for "/home/utdhadoop/Hadoop/" as
> xxx.xx.xx.251 has username as utdhadoop* .
>
> Any inputs??
>
> Thanks
> Pankil
>
> On Wed, May 20, 2009 at 6:30 PM, Todd Lipcon  wrote:
>
> > On Wed, May 20, 2009 at 4:14 PM, Alex Loddengaard 
> > wrote:
> >
> > > First of all, if you can get all machines to have the same user, that
> > would
> > > greatly simplify things.
> > >
> > > If, for whatever reason, you absolutely can't get the same user on all
> > > machines, then you could do either of the following:
> > >
> > > 1) Change the *-all.sh scripts to read from a slaves file that has two
> > > fields: a host and a user
> >
> >
> > To add to what Alex said, you should actually already be able to do this
> > with the existing scripts by simply using the format "usern...@hostname"
> > for
> > each entry in the slaves file.
> >
> > -Todd
> >
>


Re: InputStream.open() efficiency

2009-05-26 Thread Stas Oskin
Hi.

Thanks for the answer.

Would up to 5 minute of handlers cause any issues?

And same about writing?

Regards.

2009/5/26 Raghu Angadi 

>
> 'in.seek(); in.read()' is certainly better than,
> 'in = fs.open(); in.seek(); in.read()'
>
> The difference is is exactly one open() call. So you would save an RPC to
> NameNode.
>
> There are couple of issues that affect apps that keep the handlers open
> very long time (many hours to days).. but those will be fixed soon.
>
> Raghu.
>
>
> Stas Oskin wrote:
>
>> Hi.
>>
>> I'm looking to find out, how the InputStream.open() + skip(), compares to
>> keeping a handle of InputStream() and just seeking the position.
>>
>> Has anyone compared these approaches, and can advice on their speed?
>>
>> Regards.
>>
>>
>


Re: InputStream.open() efficiency

2009-05-26 Thread Raghu Angadi

Stas Oskin wrote:

Hi.

Thanks for the answer.

Would up to 5 minute of handlers cause any issues?


5 min should not cause any issues..


And same about writing?


writing is not affected by the couple of issues I mentioned. Writing 
over a long time should work as well as writing over shorter time.


Raghu.


Regards.

2009/5/26 Raghu Angadi 


'in.seek(); in.read()' is certainly better than,
'in = fs.open(); in.seek(); in.read()'

The difference is is exactly one open() call. So you would save an RPC to
NameNode.

There are couple of issues that affect apps that keep the handlers open
very long time (many hours to days).. but those will be fixed soon.

Raghu.


Stas Oskin wrote:


Hi.

I'm looking to find out, how the InputStream.open() + skip(), compares to
keeping a handle of InputStream() and just seeking the position.

Has anyone compared these approaches, and can advice on their speed?

Regards.








MultipleOutputs / Hadoop 0.20.0?

2009-05-26 Thread Daniel Young
Greetings, list:

I'm part of a project just getting started using Hadoop-core, and we figured we 
may as well start with version 0.20.0 due to the refactoring of the Mapper and 
Reducer classes.  We'd also like to make use of MultipleOutputs, or something 
with a similar functionality.   However, it depends upon the now-deprecated 
JobConf class, as opposed to using Job as seems preferred by 0.20.0.

Does the ability to define and configure multiple outputs exist in a 
0.20.0-friendly form?  Perhaps rolled into a OutputFormat somewhere?  Using the 
deprecated classes isn't too big of a deal; I'm just wondering if this feature 
was updated in the 0.20.0 implementation.

Thanks,

Dan


get mapper number in pipes

2009-05-26 Thread Brian Seaman
Hi,

I am trying to figure out if it is possible to extract the number of the
mapper running in the actual mapper of a pipes job.  There are
occasionally times when I want to rerun a job but not have certain mappers
emit data depending on what they did the previous time.  I can pass in
which mappers to skip in a config file but can't figure out how to find
the number of the mapper currently running.

Any help will be greatly appreciated.

Thanks,

-Brian



Question regarding dwontime model for DataNodes

2009-05-26 Thread Joe Hammerman
Hello Hadoop Users list:

We are running Hadoop version 0.18.2. My team lead has asked me 
to investigate the answer to a particular question regarding Hadoop's handling 
of offline DataNodes - specifically, we would like to know how long a node can 
be offline before it is totally rebuilt when it has been readded to the cluster.
From what I've been able to determine from the documentation it 
appears to me that the NameNode will simply begin scheduling block replication 
on its remaining cluster members. If the offline node comes back online, and it 
reports all its blocks as being uncorrupted, then the NameNode just cleans up 
the "extra" blocks.
In other words, there is no explicit handling based on the 
length of the outage - the behavior of the cluster will depend entirely on the 
outage duration.

Anyone care to shed some light on this?

Thanks!
Regards,
Joseph Hammerman


Re: Bisection bandwidth needed for a small Hadoop cluster

2009-05-26 Thread Todd Lipcon
Hi Stephen,

The true answer depends on the types of jobs you're running. As a back of
the envelope calculation I might figure something like this:

60 nodes total = 30 nodes per rack
Each node might process about 100MB/sec of data
In the case of a sort job where the intermediate data is the same size as
the input data, that means each node needs to shuffle 100MB/sec of data
In aggregate, each rack is then producing about 3GB/sec of data
However, given even reducer spread across the racks, each rack will need to
send 1.5GB/sec to reducers running on the other rack.
Since the connection is full duplex, that means you need 1.5GB/sec of
bisection bandwidth for this theoretical job. So that's 12Gbps.

However, the above calculations are probably somewhat of an upper bound. A
large number of jobs have significant data reduction during the map phase,
either by some kind of filtering/selection going on in the Mapper itself, or
by good usage of Combiners. Additionally, intermediate data compression can
cut the intermediate data transfer by a significant factor. Lastly, although
your disks can probably provide 100MB sustained throughput, it's rare to see
a MR job which can sustain disk speed IO through the entire pipeline. So,
I'd say my estimate is at least a factor of 2 too high.

So, the simple answer is that 4-6Gbps is most likely just fine for most
practical jobs. If you want to be extra safe, many inexpensive switches can
operate in a "stacked" configuration where the bandwidth between them is
essentially backplane speed. That should scale you to 96 nodes with plenty
of headroom.

-Todd

On Tue, May 26, 2009 at 3:10 AM, stephen mulcahy
wrote:

> Hi,
>
> Has anyone here investigated what level of bisection bandwidth is needed
> for a Hadoop cluster which spans more than one rack?
>
> I'm currently sizing and planning a new Hadoop cluster and I'm wondering
> what the performance implications will be if we end up with a cluster spread
> across two racks. I'd expect we'll have one 48-port gigabit switch in each
> 42u rack. If we end up with 60 systems spread across these two switches -
> how much bandwidth should I have between the racks?
>
> I'll have 6 gigabit ports available for links between racks - i.e. up to 6
> Gbps. Would this be sufficient bisection bandwidth for Hadoop or should I be
> considering increased bandwidth between racks (maybe using fibre links
> between the switches or introducing another switch)?
>
> Thanks for any thoughts on this.
>
> -stephen
>
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com
>


Announcement: Cloudera Hadoop Training in Washington DC (June 22-23)

2009-05-26 Thread Christophe Bisciglia
Hadoop Fans, just a quick note that we are hosting two days of Hadoop
training in Washington DC area (Alexandria, VA) on June 22 and 23.

We cover Hadoop, Hive, Pig and more with a focus on hands-on work.
Please share this with your friends who might not be on the user list
yet.

Both are listed under "Live Sessions" at http://www.cloudera.com/hadoop-training

Registration is discounted for the next few days, so if you are
interested, feel free to take advantage of the savings ($200) by
registering early.

We also have room for one more private session later in the week.
Please contact me directly if you are interested in a customized,
private session for your organization.

Christophe

-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera


Re: Username in Hadoop cluster

2009-05-26 Thread Aaron Kimball
A slightly longer answer:

If you're starting daemons with bin/start-dfs.sh or start-all.sh, you'll
notice that these defer to hadoop-daemons.sh to do the heavy lifting. This
evaluates the string: cd "$HADOOP_HOME" \; "$bin/hadoop-daemon.sh" --config
$HADOOP_CONF_DIR "$@" and passes it to an underlying loop to execute on all
the slaves via ssh.

$bin and $HADOOP_HOME are thus macro-replaced on the server-side. The more
problematic one here is the $bin one, which resolves to the absolute path of
the cwd on the server that is starting Hadoop.

You've got three basic options:
1) Install Hadoop in the exact same path on all nodes
2) Modify bin/hadoop-daemons.sh to do something more clever on your system
by deferring evaluation of HADOOP_HOME and the bin directory (probably
really hairy; you might have to escape the variable names more than once
since there's another script named slaves.sh that this goes through)
3) Start the slaves "manually" on each node by logging in yourself, and
doing a "cd $HADOOP_HOME && bin/hadoop-daemon.sh datanode start"

As a shameless plug, Cloudera's distribution for Hadoop (
www.cloudera.com/hadoop) will also provide init.d scripts so that you can
start Hadoop daemons via the 'service' command. By default, the RPM
installation will also standardize on the "hadoop" username. But you can't
install this without being root.

- Aaron

On Tue, May 26, 2009 at 12:30 PM, Alex Loddengaard wrote:

> It looks to me like you didn't install Hadoop consistently across all
> nodes.
>
> xxx.xx.xx.251: bash:
> > /home/utdhadoop1/Hadoop/
>
> hadoop-0.18.3/bin/hadoop-daemon.sh: No such file or
> directory
>
> The above makes me suspect that xxx.xx.xx.251 has Hadoop installed
> differently.  Can you try and locate hadoop-daemon.sh on xxx.xx.xx.251 and
> adjust its location properly?
>
> Alex
>
> On Mon, May 25, 2009 at 10:25 PM, Pankil Doshi 
> wrote:
>
> > Hello,
> >
> > I tried adding "usern...@hostname" for eachentry in slaves file.
> >
> > My slave file have 2 data nodes.it looks like below
> >
> > localhost
> > utdhado...@xxx.xx.xx.229
> > utdhad...@xxx.xx.xx.251
> >
> >
> > error what I get when i start dfs is as below:
> >
> > starting namenode, logging to
> >
> >
> /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/../logs/hadoop-utdhadoop1-namenode-opencirrus-992.hpl.hp.com.out
> > xxx.xx.xx.229: starting datanode, logging to
> >
> >
> /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/../logs/hadoop-utdhadoop1-datanode-opencirrus-992.hpl.hp.com.out
> > *xxx.xx.xx.251: bash: line 0: cd:
> > /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/..: No such file or directory
> > xxx.xx.xx.251: bash:
> > /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/hadoop-daemon.sh: No such file
> or
> > directory
> > *localhost: datanode running as process 25814. Stop it first.
> > xxx.xx.xx.229: starting secondarynamenode, logging to
> >
> >
> /home/utdhadoop1/Hadoop/hadoop-0.18.3/bin/../logs/hadoop-utdhadoop1-secondarynamenode-opencirrus-992.hpl.hp.com.out
> > localhost: secondarynamenode running as process 25959. Stop it first.
> >
> >
> >
> > Basically it looks for "* /home/utdhadoop1/Hadoop/**
> > hadoop-0.18.3/bin/hadoop-**daemon.sh"
> > but instead it should look for "/home/utdhadoop/Hadoop/" as
> > xxx.xx.xx.251 has username as utdhadoop* .
> >
> > Any inputs??
> >
> > Thanks
> > Pankil
> >
> > On Wed, May 20, 2009 at 6:30 PM, Todd Lipcon  wrote:
> >
> > > On Wed, May 20, 2009 at 4:14 PM, Alex Loddengaard 
> > > wrote:
> > >
> > > > First of all, if you can get all machines to have the same user, that
> > > would
> > > > greatly simplify things.
> > > >
> > > > If, for whatever reason, you absolutely can't get the same user on
> all
> > > > machines, then you could do either of the following:
> > > >
> > > > 1) Change the *-all.sh scripts to read from a slaves file that has
> two
> > > > fields: a host and a user
> > >
> > >
> > > To add to what Alex said, you should actually already be able to do
> this
> > > with the existing scripts by simply using the format "usern...@hostname
> "
> > > for
> > > each entry in the slaves file.
> > >
> > > -Todd
> > >
> >
>


Re: Question regarding dwontime model for DataNodes

2009-05-26 Thread Aaron Kimball
A DataNode is regarded as "dead" after 10 minutes of inactivity. If a
DataNode is down for < 10 minutes, and quickly returns, then nothing
happens. After 10 minutes, its blocks are assumed to be lost forever. So the
NameNode then begins scheduling those blocks for re-replication from the two
surviving copies. There is no real-time bound on how long this process
takes. It's a function of your available network bandwidth, server loads,
disk speeds, amount of storage used, etc. Blocks which are down to a single
surviving replica are prioritized over blocks which have two surviving
replicas (in the event of a simultaneous or near-simultaneous double fault),
since they are more vulnerable.

If a DataNode does reappear, then its re-replication is cancelled, and
over-provisioned blocks are scaled back to the target number of replicas. If
that machine comes back with all blocks intact, then the node needs no
"rebuilding." (In fact, some of the over-provisioned replicas that get
removed might be from the original node, if they're available elsewhere
too!)

Don't forget that machines in Hadoop do not have strong notions of identity.
If a particular machine is taken offline and its disks are wiped, the blocks
which were there (which also existed in two other places) will be
re-replicated elsewhere from the live copies. When that same machine is then
brought back online, it has no incentive to "copy back" all the blocks that
it used to have, as there will be three replicas elsewhere in the cluster.
Blocks are never permanently bound to particular machines.

If you add recommissed or new nodes to a cluster, you should run the
rebalancing script which will take a random sampling of blocks from
heavily-laden nodes and move them onto emptier nodes in an attempt to spread
the data as evenly as possible.

- Aaron


On Tue, May 26, 2009 at 3:08 PM, Joe Hammerman wrote:

> Hello Hadoop Users list:
>
>We are running Hadoop version 0.18.2. My team lead has asked
> me to investigate the answer to a particular question regarding Hadoop's
> handling of offline DataNodes - specifically, we would like to know how long
> a node can be offline before it is totally rebuilt when it has been readded
> to the cluster.
>From what I've been able to determine from the documentation
> it appears to me that the NameNode will simply begin scheduling block
> replication on its remaining cluster members. If the offline node comes back
> online, and it reports all its blocks as being uncorrupted, then the
> NameNode just cleans up the "extra" blocks.
>In other words, there is no explicit handling based on the
> length of the outage - the behavior of the cluster will depend entirely on
> the outage duration.
>
>Anyone care to shed some light on this?
>
>Thanks!
> Regards,
>Joseph Hammerman
>


RE: Question regarding dwontime model for DataNodes

2009-05-26 Thread Joe Hammerman
Aaron -

Thank you!

Is the 10 minute interval configurable?

Regards,
Joseph Hammerman

-Original Message-
From: Aaron Kimball [mailto:aa...@cloudera.com]
Sent: Tuesday, May 26, 2009 4:29 PM
To: core-user@hadoop.apache.org
Subject: Re: Question regarding dwontime model for DataNodes

A DataNode is regarded as "dead" after 10 minutes of inactivity. If a
DataNode is down for < 10 minutes, and quickly returns, then nothing
happens. After 10 minutes, its blocks are assumed to be lost forever. So the
NameNode then begins scheduling those blocks for re-replication from the two
surviving copies. There is no real-time bound on how long this process
takes. It's a function of your available network bandwidth, server loads,
disk speeds, amount of storage used, etc. Blocks which are down to a single
surviving replica are prioritized over blocks which have two surviving
replicas (in the event of a simultaneous or near-simultaneous double fault),
since they are more vulnerable.

If a DataNode does reappear, then its re-replication is cancelled, and
over-provisioned blocks are scaled back to the target number of replicas. If
that machine comes back with all blocks intact, then the node needs no
"rebuilding." (In fact, some of the over-provisioned replicas that get
removed might be from the original node, if they're available elsewhere
too!)

Don't forget that machines in Hadoop do not have strong notions of identity.
If a particular machine is taken offline and its disks are wiped, the blocks
which were there (which also existed in two other places) will be
re-replicated elsewhere from the live copies. When that same machine is then
brought back online, it has no incentive to "copy back" all the blocks that
it used to have, as there will be three replicas elsewhere in the cluster.
Blocks are never permanently bound to particular machines.

If you add recommissed or new nodes to a cluster, you should run the
rebalancing script which will take a random sampling of blocks from
heavily-laden nodes and move them onto emptier nodes in an attempt to spread
the data as evenly as possible.

- Aaron


On Tue, May 26, 2009 at 3:08 PM, Joe Hammerman wrote:

> Hello Hadoop Users list:
>
>We are running Hadoop version 0.18.2. My team lead has asked
> me to investigate the answer to a particular question regarding Hadoop's
> handling of offline DataNodes - specifically, we would like to know how long
> a node can be offline before it is totally rebuilt when it has been readded
> to the cluster.
>From what I've been able to determine from the documentation
> it appears to me that the NameNode will simply begin scheduling block
> replication on its remaining cluster members. If the offline node comes back
> online, and it reports all its blocks as being uncorrupted, then the
> NameNode just cleans up the "extra" blocks.
>In other words, there is no explicit handling based on the
> length of the outage - the behavior of the cluster will depend entirely on
> the outage duration.
>
>Anyone care to shed some light on this?
>
>Thanks!
> Regards,
>Joseph Hammerman
>


Persistent storage on EC2

2009-05-26 Thread Malcolm Matalka
I'm using EBS volumes to have a persistent HDFS on EC2.  Do I need to keep the 
master updated on how to map the internal IPs, which change as I understand, to 
a known set of host names so it knows where the blocks are located each time I 
bring a cluster up?  If so, is keeping a mapping up to date in /etc/hosts 
sufficient?

Thanks



Re: Bisection bandwidth needed for a small Hadoop cluster

2009-05-26 Thread Amr Awadallah

Stephen,

 I highly recommend you get switches that have a few 10GigE ports on 
them for interlinking.


  So say you have 40 servers/rack, then get top-of-rack switches with 
forty 1GigE ports and two 10GigE ports, this way you can have half the 
servers talking to the other rack without bottlenecks. The prices for 
such switches have been dropping significantly in recent months (you 
should be able to get them for sub $10K). Take a look at the switches 
from Arista Networks (www.aristanetworks.com) for competitive pricing.


Cheers,

-- amr

Todd Lipcon wrote:

Hi Stephen,

The true answer depends on the types of jobs you're running. As a back of
the envelope calculation I might figure something like this:

60 nodes total = 30 nodes per rack
Each node might process about 100MB/sec of data
In the case of a sort job where the intermediate data is the same size as
the input data, that means each node needs to shuffle 100MB/sec of data
In aggregate, each rack is then producing about 3GB/sec of data
However, given even reducer spread across the racks, each rack will need to
send 1.5GB/sec to reducers running on the other rack.
Since the connection is full duplex, that means you need 1.5GB/sec of
bisection bandwidth for this theoretical job. So that's 12Gbps.

However, the above calculations are probably somewhat of an upper bound. A
large number of jobs have significant data reduction during the map phase,
either by some kind of filtering/selection going on in the Mapper itself, or
by good usage of Combiners. Additionally, intermediate data compression can
cut the intermediate data transfer by a significant factor. Lastly, although
your disks can probably provide 100MB sustained throughput, it's rare to see
a MR job which can sustain disk speed IO through the entire pipeline. So,
I'd say my estimate is at least a factor of 2 too high.

So, the simple answer is that 4-6Gbps is most likely just fine for most
practical jobs. If you want to be extra safe, many inexpensive switches can
operate in a "stacked" configuration where the bandwidth between them is
essentially backplane speed. That should scale you to 96 nodes with plenty
of headroom.

-Todd

On Tue, May 26, 2009 at 3:10 AM, stephen mulcahy
wrote:

  

Hi,

Has anyone here investigated what level of bisection bandwidth is needed
for a Hadoop cluster which spans more than one rack?

I'm currently sizing and planning a new Hadoop cluster and I'm wondering
what the performance implications will be if we end up with a cluster spread
across two racks. I'd expect we'll have one 48-port gigabit switch in each
42u rack. If we end up with 60 systems spread across these two switches -
how much bandwidth should I have between the racks?

I'll have 6 gigabit ports available for links between racks - i.e. up to 6
Gbps. Would this be sufficient bisection bandwidth for Hadoop or should I be
considering increased bandwidth between racks (maybe using fibre links
between the switches or introducing another switch)?

Thanks for any thoughts on this.

-stephen

--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp://sindice.com




  


Re: MultipleOutputs / Hadoop 0.20.0?

2009-05-26 Thread Jothi Padmanabhan
Hi Dan,

I do not think we have support for MultipeOutputs with the New API just yet.

Jothi


On 5/27/09 2:08 AM, "Daniel Young"  wrote:

> Greetings, list:
> 
> I'm part of a project just getting started using Hadoop-core, and we figured
> we may as well start with version 0.20.0 due to the refactoring of the Mapper
> and Reducer classes.  We'd also like to make use of MultipleOutputs, or
> something with a similar functionality.   However, it depends upon the
> now-deprecated JobConf class, as opposed to using Job as seems preferred by
> 0.20.0.
> 
> Does the ability to define and configure multiple outputs exist in a
> 0.20.0-friendly form?  Perhaps rolled into a OutputFormat somewhere?  Using
> the deprecated classes isn't too big of a deal; I'm just wondering if this
> feature was updated in the 0.20.0 implementation.
> 
> Thanks,
> 
> Dan



Re: PNW Hadoop + Apache Cloud Stack Meetup, Wed. May 27th:

2009-05-26 Thread Robert Burrell Donkin
On Tue, May 26, 2009 at 6:42 PM, Bradford Stephens
 wrote:
> Greetings,
> This is a friendly reminder that the 1st meetup for the PNW Hadoop + Apache
> Cloud Stack User Group is THIS WEDNESDAY at 6:45pm. We're very excited to
> have everyone attend!
>
> University of Washington, Allen Center Room 303, at 6:45pm on Wednesday, May
> 27, 2009.
> I'm going to put together a map, and a wiki so we can collab.
>
> The Allen Center is located here:
> http://www.washington.edu/home/maps/?CSE
>
> What I'm envisioning is a meetup for about 2 hours: we'll have two in-depth
> talks of 15-20 minutes each, and then several "lightning talks" of 5
> minutes. We'll then have discussion and 'social time'.
> Let me know if you're interested in speaking or attending.
>
> I'd like to focus on education, so every presentation *needs* to ask some
> questions at the end. We can talk about these after the presentations, and
> I'll record what we've learned in a wiki and share that with the rest of
> us.
>
> Looking forward to meeting you all!

sadly i'm on the other side of the pond

(we did some talking at ApacheConEurope about the Apache Cloud Stack.
so, i'm going to jump in here with a few ideas.)

1. after a while, mass cross postings start to become less efficient
than a dedicated announcements list. since clouds cuts across project
boundaries, this would need to be a cross-project mailing list -
announceme...@clouds.apache.org, say.

if this would be useful to the community, i'm willing to take a look
at steering it through on the apache side.

opinions?

2. we've made a start at http://cwiki.apache.org/labs/clouds.html
trying to describe the projects, to document how they relate and to
create space for discussions of architecture. as with any labs
project, karma will be granted to any committer (just ask on the labs
list). all very welcome.

- robert