Re: hadoop question using VMWARE
On 28/09/11 08:37, N Keywal wrote: For example: - It's adding two layers (windows& linux), that can both fail, especially under heavy workload (and hadoop is built to use all the resources available). They will need to be managed as well (software upgrades, hardware support...), it's an extra cost. - These two layers will use randomly the different resources (HDD, CPU,network) making issues and performance analysis more complicated. - there will be a real performance impact. It's depends on what you do, and how is configured Windows& vmware, but on my non optimized laptop I lose more than 50%. VMWare claims 15% max, but it's without Windows (using direct ESX) Where you take a big hit is in disk IO, as what your OS thinks is a disk with sequentially stored files is just a single file in the host OS that may be scattered round the real HDD. Disk IO goes through too many layers. It's often faster to NFS mount the real HDD. For compute intensive work, the performance hit isn't so bad, at least provided you don't swap. - Last time I checked (a few months ago), vmware was not able to use all the core& memory of medium sized servers. Same with VirtualBox, which I like because it is lighter weight. I use VMs because the infrastructure provides it; things like ElasticMR from AWS also offer it. Your code may be slower, but what you get is the ability to bring up clusters on a pay-per-hour basis, and the ability to vary the #of machines based on the workload/execution plan. If you can compensate for the IO hit by renting four more servers, you may still come out ahead. http://www.slideshare.net/steve_l/farming-hadoop-inthecloud
Re: hadoop question using VMWARE
For example: - It's adding two layers (windows & linux), that can both fail, especially under heavy workload (and hadoop is built to use all the resources available). They will need to be managed as well (software upgrades, hardware support...), it's an extra cost. - These two layers will use randomly the different resources (HDD, CPU,network) making issues and performance analysis more complicated. - there will be a real performance impact. It's depends on what you do, and how is configured Windows & vmware, but on my non optimized laptop I lose more than 50%. VMWare claims 15% max, but it's without Windows (using direct ESX) - Last time I checked (a few months ago), vmware was not able to use all the core & memory of medium sized servers. - The namenode needs to be secured, as it's a spof. On Wed, Sep 28, 2011 at 9:07 AM, praveenesh kumar wrote: > "it's not something you can do for production nor performance > analysis." > Can you please tell me what does it mean ? > Why Can't we use this approach for production ??? > > Thanks > > On Tue, Sep 27, 2011 at 11:56 PM, N Keywal wrote: > > > Hi, > > > > Yes, it will work. HBase won't see the difference, it's a pure vmware > > stuff. > > Obviously, it's not something you can do for production nor performance > > analysis. > > > > Cheers, > > > > N. > > > > On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar > >wrote: > > > > > Hi, > > > > > > Suppose I am having 10 windows machines and if I have 10 VM individual > > > instances running on these machines independently, can I use these VM > > > instances to communicate with each other so that I can make hadoop > > cluster > > > using those VM instances. > > > > > > Did anyone tried that thing ? > > > > > > I know we can setup multiple VM instances on same machine, but can we > do > > it > > > across different machines also ? > > > And if I do like this, Is it a good approach, considering I don't have > > > dedicated ubuntu machines for hadoop ? > > > > > > Thanks, > > > Praveenesh > > > > > >
Re: hadoop question using VMWARE
"it's not something you can do for production nor performance analysis." Can you please tell me what does it mean ? Why Can't we use this approach for production ??? Thanks On Tue, Sep 27, 2011 at 11:56 PM, N Keywal wrote: > Hi, > > Yes, it will work. HBase won't see the difference, it's a pure vmware > stuff. > Obviously, it's not something you can do for production nor performance > analysis. > > Cheers, > > N. > > On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar >wrote: > > > Hi, > > > > Suppose I am having 10 windows machines and if I have 10 VM individual > > instances running on these machines independently, can I use these VM > > instances to communicate with each other so that I can make hadoop > cluster > > using those VM instances. > > > > Did anyone tried that thing ? > > > > I know we can setup multiple VM instances on same machine, but can we do > it > > across different machines also ? > > And if I do like this, Is it a good approach, considering I don't have > > dedicated ubuntu machines for hadoop ? > > > > Thanks, > > Praveenesh > > >
Re: hadoop question using VMWARE
Hi, Yes, it will work. HBase won't see the difference, it's a pure vmware stuff. Obviously, it's not something you can do for production nor performance analysis. Cheers, N. On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar wrote: > Hi, > > Suppose I am having 10 windows machines and if I have 10 VM individual > instances running on these machines independently, can I use these VM > instances to communicate with each other so that I can make hadoop cluster > using those VM instances. > > Did anyone tried that thing ? > > I know we can setup multiple VM instances on same machine, but can we do it > across different machines also ? > And if I do like this, Is it a good approach, considering I don't have > dedicated ubuntu machines for hadoop ? > > Thanks, > Praveenesh >
hadoop question using VMWARE
Hi, Suppose I am having 10 windows machines and if I have 10 VM individual instances running on these machines independently, can I use these VM instances to communicate with each other so that I can make hadoop cluster using those VM instances. Did anyone tried that thing ? I know we can setup multiple VM instances on same machine, but can we do it across different machines also ? And if I do like this, Is it a good approach, considering I don't have dedicated ubuntu machines for hadoop ? Thanks, Praveenesh
Re: Hadoop Question
Nitin, On 2011/07/28 14:51, Nitin Khandelwal wrote: How can I determine if a file is being written to (by any thread) in HDFS. That information is exposed by the NameNode http servlet. You can obtain it with the fsck tool (hadoop fsck /path/to/dir -openforwrite) or you can do an http get http://namenode:port/fsck?path=/your/path&openforwrite=1 George
Re: Hadoop Question
How about having the slave write to temp file first, then move it to the file the master is monitoring for after they close it? -Joey On Jul 27, 2011, at 22:51, Nitin Khandelwal wrote: > Hi All, > > How can I determine if a file is being written to (by any thread) in HDFS. I > have a continuous process on the master node, which is tracking a particular > folder in HDFS for files to process. On the slave nodes, I am creating files > in the same folder using the following code : > > At the slave node: > > import org.apache.commons.io.IOUtils; > import org.apache.hadoop.fs.FileSystem; > import java.io.OutputStream; > > OutputStream oStream = fileSystem.create(path); > IOUtils.write(, oStream); > IOUtils.closeQuietly(oStream); > > > At the master node, > I am getting the earliest modified file in the folder. At times when I try > reading the file, I get nothing in the file, mostly because the slave might > be still finishing writing to the file. Is there any way, to somehow tell > the master, that the slave is still writing to the file and to check the > file sometime later for actual content. > > Thanks, > -- > > > Nitin Khandelwal
Hadoop Question
Hi All, How can I determine if a file is being written to (by any thread) in HDFS. I have a continuous process on the master node, which is tracking a particular folder in HDFS for files to process. On the slave nodes, I am creating files in the same folder using the following code : At the slave node: import org.apache.commons.io.IOUtils; import org.apache.hadoop.fs.FileSystem; import java.io.OutputStream; OutputStream oStream = fileSystem.create(path); IOUtils.write(, oStream); IOUtils.closeQuietly(oStream); At the master node, I am getting the earliest modified file in the folder. At times when I try reading the file, I get nothing in the file, mostly because the slave might be still finishing writing to the file. Is there any way, to somehow tell the master, that the slave is still writing to the file and to check the file sometime later for actual content. Thanks, -- Nitin Khandelwal