Re: Using HDFS for common purpose
Today Nitesh has given an answer to a similar thread, that was what I wanted to learn. I'm writing it here to help others having same question. HDFS is a file system for distributed storage typically for distributed computing scenerio over hadoop. For office purpose you will require a SAN (Storage Area Network) - an architecture to attach remote computer storage devices to servers in such a way that, to the operating system, the devices appear as locally attached. Or you can even go for AmazonS3, if the data is really authentic. For opensource solution related to SAN, you can go with any of the linux server distributions (eg. RHEL, SuSE) or Solaris (ZFS + zones) or perhaps best plug-n-play solution (non-open-source) would be a Mac Server + XSan. --nitesh Thanks, Rasit 2009/1/28 Rasit OZDAS > Thanks for responses, > > Sorry, I made a mistake, it's actually not a db what I wanted. We need a > simple storage for files. Only get and put commands are enough (no queries > needed). We don't even need append, chmod, etc. > > Probably from a thread on this list, I came across a link to a KFS-HDFS > comparison: > http://deliberateambiguity.typepad.com/blog/2007/10/advantages-of-k.html<https://webmail.uzay.tubitak.gov.tr/owa/redir.aspx?C=55b317b7ca7548209f9929c643fcbf93&URL=http%3a%2f%2fdeliberateambiguity.typepad.com%2fblog%2f2007%2f10%2fadvantages-of-k.html> > > It's good, that KFS is written in C++, but handling errors in C++ is > usually more difficult. > I need your opinion about which one could best fit. > > Thanks, > Rasit > > 2009/1/27 Jim Twensky > > You may also want to have a look at this to reach a decision based on your >> needs: >> >> http://www.swaroopch.com/notes/Distributed_Storage_Systems >> >> Jim >> >> On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky >> wrote: >> >> > Rasit, >> > >> > What kind of data will you be storing on Hbase or directly on HDFS? Do >> you >> > aim to use it as a data source to do some key/value lookups for small >> > strings/numbers or do you want to store larger files labeled with some >> sort >> > of a key and retrieve them during a map reduce run? >> > >> > Jim >> > >> > >> > On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray >> wrote: >> > >> >> Perhaps what you are looking for is HBase? >> >> >> >> http://hbase.org >> >> >> >> HBase is a column-oriented, distributed store that sits on top of HDFS >> and >> >> provides random access. >> >> >> >> JG >> >> >> >> > -Original Message- >> >> > From: Rasit OZDAS [mailto:rasitoz...@gmail.com] >> >> > Sent: Tuesday, January 27, 2009 1:20 AM >> >> > To: core-user@hadoop.apache.org >> >> > Cc: arif.yil...@uzay.tubitak.gov.tr; emre.gur...@uzay.tubitak.gov.tr >> ; >> >> > hilal.tara...@uzay.tubitak.gov.tr; serdar.ars...@uzay.tubitak.gov.tr >> ; >> >> > hakan.kocaku...@uzay.tubitak.gov.tr; >> caglar.bi...@uzay.tubitak.gov.tr >> >> > Subject: Using HDFS for common purpose >> >> > >> >> > Hi, >> >> > I wanted to ask, if HDFS is a good solution just as a distributed db >> >> > (no >> >> > running jobs, only get and put commands) >> >> > A review says that "HDFS is not designed for low latency" and >> besides, >> >> > it's >> >> > implemented in Java. >> >> > Do these disadvantages prevent us using it? >> >> > Or could somebody suggest a better (faster) one? >> >> > >> >> > Thanks in advance.. >> >> > Rasit >> >> >> >> >> > >> > > > > -- > M. Raşit ÖZDAŞ > -- M. Raşit ÖZDAŞ
Re: Using HDFS for common purpose
Thanks for responses, Sorry, I made a mistake, it's actually not a db what I wanted. We need a simple storage for files. Only get and put commands are enough (no queries needed). We don't even need append, chmod, etc. Probably from a thread on this list, I came across a link to a KFS-HDFS comparison: http://deliberateambiguity.typepad.com/blog/2007/10/advantages-of-k.html<https://webmail.uzay.tubitak.gov.tr/owa/redir.aspx?C=55b317b7ca7548209f9929c643fcbf93&URL=http%3a%2f%2fdeliberateambiguity.typepad.com%2fblog%2f2007%2f10%2fadvantages-of-k.html> It's good, that KFS is written in C++, but handling errors in C++ is usually more difficult. I need your opinion about which one could best fit. Thanks, Rasit 2009/1/27 Jim Twensky > You may also want to have a look at this to reach a decision based on your > needs: > > http://www.swaroopch.com/notes/Distributed_Storage_Systems > > Jim > > On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky > wrote: > > > Rasit, > > > > What kind of data will you be storing on Hbase or directly on HDFS? Do > you > > aim to use it as a data source to do some key/value lookups for small > > strings/numbers or do you want to store larger files labeled with some > sort > > of a key and retrieve them during a map reduce run? > > > > Jim > > > > > > On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray > wrote: > > > >> Perhaps what you are looking for is HBase? > >> > >> http://hbase.org > >> > >> HBase is a column-oriented, distributed store that sits on top of HDFS > and > >> provides random access. > >> > >> JG > >> > >> > -Original Message- > >> > From: Rasit OZDAS [mailto:rasitoz...@gmail.com] > >> > Sent: Tuesday, January 27, 2009 1:20 AM > >> > To: core-user@hadoop.apache.org > >> > Cc: arif.yil...@uzay.tubitak.gov.tr; emre.gur...@uzay.tubitak.gov.tr; > >> > hilal.tara...@uzay.tubitak.gov.tr; serdar.ars...@uzay.tubitak.gov.tr; > >> > hakan.kocaku...@uzay.tubitak.gov.tr; caglar.bi...@uzay.tubitak.gov.tr > >> > Subject: Using HDFS for common purpose > >> > > >> > Hi, > >> > I wanted to ask, if HDFS is a good solution just as a distributed db > >> > (no > >> > running jobs, only get and put commands) > >> > A review says that "HDFS is not designed for low latency" and besides, > >> > it's > >> > implemented in Java. > >> > Do these disadvantages prevent us using it? > >> > Or could somebody suggest a better (faster) one? > >> > > >> > Thanks in advance.. > >> > Rasit > >> > >> > > > -- M. Raşit ÖZDAŞ
Re: Using HDFS for common purpose
You may also want to have a look at this to reach a decision based on your needs: http://www.swaroopch.com/notes/Distributed_Storage_Systems Jim On Tue, Jan 27, 2009 at 1:22 PM, Jim Twensky wrote: > Rasit, > > What kind of data will you be storing on Hbase or directly on HDFS? Do you > aim to use it as a data source to do some key/value lookups for small > strings/numbers or do you want to store larger files labeled with some sort > of a key and retrieve them during a map reduce run? > > Jim > > > On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray wrote: > >> Perhaps what you are looking for is HBase? >> >> http://hbase.org >> >> HBase is a column-oriented, distributed store that sits on top of HDFS and >> provides random access. >> >> JG >> >> > -Original Message- >> > From: Rasit OZDAS [mailto:rasitoz...@gmail.com] >> > Sent: Tuesday, January 27, 2009 1:20 AM >> > To: core-user@hadoop.apache.org >> > Cc: arif.yil...@uzay.tubitak.gov.tr; emre.gur...@uzay.tubitak.gov.tr; >> > hilal.tara...@uzay.tubitak.gov.tr; serdar.ars...@uzay.tubitak.gov.tr; >> > hakan.kocaku...@uzay.tubitak.gov.tr; caglar.bi...@uzay.tubitak.gov.tr >> > Subject: Using HDFS for common purpose >> > >> > Hi, >> > I wanted to ask, if HDFS is a good solution just as a distributed db >> > (no >> > running jobs, only get and put commands) >> > A review says that "HDFS is not designed for low latency" and besides, >> > it's >> > implemented in Java. >> > Do these disadvantages prevent us using it? >> > Or could somebody suggest a better (faster) one? >> > >> > Thanks in advance.. >> > Rasit >> >> >
Re: Using HDFS for common purpose
Rasit, What kind of data will you be storing on Hbase or directly on HDFS? Do you aim to use it as a data source to do some key/value lookups for small strings/numbers or do you want to store larger files labeled with some sort of a key and retrieve them during a map reduce run? Jim On Tue, Jan 27, 2009 at 11:51 AM, Jonathan Gray wrote: > Perhaps what you are looking for is HBase? > > http://hbase.org > > HBase is a column-oriented, distributed store that sits on top of HDFS and > provides random access. > > JG > > > -Original Message- > > From: Rasit OZDAS [mailto:rasitoz...@gmail.com] > > Sent: Tuesday, January 27, 2009 1:20 AM > > To: core-user@hadoop.apache.org > > Cc: arif.yil...@uzay.tubitak.gov.tr; emre.gur...@uzay.tubitak.gov.tr; > > hilal.tara...@uzay.tubitak.gov.tr; serdar.ars...@uzay.tubitak.gov.tr; > > hakan.kocaku...@uzay.tubitak.gov.tr; caglar.bi...@uzay.tubitak.gov.tr > > Subject: Using HDFS for common purpose > > > > Hi, > > I wanted to ask, if HDFS is a good solution just as a distributed db > > (no > > running jobs, only get and put commands) > > A review says that "HDFS is not designed for low latency" and besides, > > it's > > implemented in Java. > > Do these disadvantages prevent us using it? > > Or could somebody suggest a better (faster) one? > > > > Thanks in advance.. > > Rasit > >
RE: Using HDFS for common purpose
Perhaps what you are looking for is HBase? http://hbase.org HBase is a column-oriented, distributed store that sits on top of HDFS and provides random access. JG > -Original Message- > From: Rasit OZDAS [mailto:rasitoz...@gmail.com] > Sent: Tuesday, January 27, 2009 1:20 AM > To: core-user@hadoop.apache.org > Cc: arif.yil...@uzay.tubitak.gov.tr; emre.gur...@uzay.tubitak.gov.tr; > hilal.tara...@uzay.tubitak.gov.tr; serdar.ars...@uzay.tubitak.gov.tr; > hakan.kocaku...@uzay.tubitak.gov.tr; caglar.bi...@uzay.tubitak.gov.tr > Subject: Using HDFS for common purpose > > Hi, > I wanted to ask, if HDFS is a good solution just as a distributed db > (no > running jobs, only get and put commands) > A review says that "HDFS is not designed for low latency" and besides, > it's > implemented in Java. > Do these disadvantages prevent us using it? > Or could somebody suggest a better (faster) one? > > Thanks in advance.. > Rasit
Using HDFS for common purpose
Hi, I wanted to ask, if HDFS is a good solution just as a distributed db (no running jobs, only get and put commands) A review says that "HDFS is not designed for low latency" and besides, it's implemented in Java. Do these disadvantages prevent us using it? Or could somebody suggest a better (faster) one? Thanks in advance.. Rasit