Re: Hadoop for real time
Hi Ted. Thanks for sharing some of inner workings of Veoh, which btw I'm a frequent user of (or at least when time permits :) ). I indeed recall reading somewhere that Veoh used a heavily modified version of MogileFS, but have switched since as it wasn't ready enough for Veoh needs. If not Hadoop, are there any other available solutions that can assist in distributing the processing of real-time video data? Or the old way of separate application severs is the only way to go? Regards. 2008/10/20 Ted Dunning <[EMAIL PROTECTED]> > Hadoop may not be quite what you want for this. > > You could definitely use Hadopo for storage and streaming. You can also do > various kinds of processing on hadoop. > > But because Hadoop is primarily intended for batch style operations, there > is a bit of an assumption that some administrative tasks will take down the > cluster. That may be a problem (video serving tends to have a web audience > that isn't very tolerant of downtime). > > At Veoh, we used a simpler, but simpler system for serving videos that was > originally based on Mogile. The basic idea is that there is a database > that > contains name to URL mappings. The URL's point to storage boxes that have > a > bunch of disks that are served out to the net via LightHttpd. A management > machine runs occasionally to make sure that files are replicated according > to policy. The database is made redundant via conventional mechanisms. > Requests for files can be proxied a farm of front end machines that query > the database for locations or you can use redirects directly to the > content. How you do it depends on network topology and your sensitivity > about divulging internal details. Redirects can give higher peak read > speed > since you are going direct. Proxying avoids a network round trip for the > redirect. > > At Veoh, this system fed the content delivery networks as a caching layer > which meant that the traffic was essentially uniform random access. This > system handled a huge number of files (10^9 or so) very easily and has > essentially never had customer visible downtime. Extension with new files > systems is trivial (just tell the manager box and it starts using them). > > This arrangement lacks most of the things that make Hadoop really good for > what it does. But, in return, it is incredibly simple. It isn't very > suitable for map-reduce or other high bandwidth processing tasks. It > doesn't allow computation to go to the data. It doesn't allow large files > to be read in parallel from many machines. On the other hand, it handles > way more files than Hadoop does and it handles gobs of tiny files pretty > well. > > Video is also kind of a write-once medium in many cases and video files > aren't real splittable for map-reduce purposes. That might mean that you > could get away with a mogile-ish system. > > On Tue, Oct 14, 2008 at 1:29 PM, Stas Oskin <[EMAIL PROTECTED]> wrote: > > > Hi. > > > > Video storage, processing and streaming. > > > > Regards. > > > > 2008/9/25 Edward J. Yoon <[EMAIL PROTECTED]> > > > > > What kind of the real-time app? > > > > > > On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin <[EMAIL PROTECTED]> > > wrote: > > > > Hi. > > > > > > > > Is it possible to use Hadoop for real-time app, in video processing > > > field? > > > > > > > > Regards. > > > > > > > > > > -- > > > Best regards, Edward J. Yoon > > > [EMAIL PROTECTED] > > > http://blog.udanax.org > > > > > > > > > -- > ted >
Re: Hadoop for real time
Hadoop may not be quite what you want for this. You could definitely use Hadopo for storage and streaming. You can also do various kinds of processing on hadoop. But because Hadoop is primarily intended for batch style operations, there is a bit of an assumption that some administrative tasks will take down the cluster. That may be a problem (video serving tends to have a web audience that isn't very tolerant of downtime). At Veoh, we used a simpler, but simpler system for serving videos that was originally based on Mogile. The basic idea is that there is a database that contains name to URL mappings. The URL's point to storage boxes that have a bunch of disks that are served out to the net via LightHttpd. A management machine runs occasionally to make sure that files are replicated according to policy. The database is made redundant via conventional mechanisms. Requests for files can be proxied a farm of front end machines that query the database for locations or you can use redirects directly to the content. How you do it depends on network topology and your sensitivity about divulging internal details. Redirects can give higher peak read speed since you are going direct. Proxying avoids a network round trip for the redirect. At Veoh, this system fed the content delivery networks as a caching layer which meant that the traffic was essentially uniform random access. This system handled a huge number of files (10^9 or so) very easily and has essentially never had customer visible downtime. Extension with new files systems is trivial (just tell the manager box and it starts using them). This arrangement lacks most of the things that make Hadoop really good for what it does. But, in return, it is incredibly simple. It isn't very suitable for map-reduce or other high bandwidth processing tasks. It doesn't allow computation to go to the data. It doesn't allow large files to be read in parallel from many machines. On the other hand, it handles way more files than Hadoop does and it handles gobs of tiny files pretty well. Video is also kind of a write-once medium in many cases and video files aren't real splittable for map-reduce purposes. That might mean that you could get away with a mogile-ish system. On Tue, Oct 14, 2008 at 1:29 PM, Stas Oskin <[EMAIL PROTECTED]> wrote: > Hi. > > Video storage, processing and streaming. > > Regards. > > 2008/9/25 Edward J. Yoon <[EMAIL PROTECTED]> > > > What kind of the real-time app? > > > > On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin <[EMAIL PROTECTED]> > wrote: > > > Hi. > > > > > > Is it possible to use Hadoop for real-time app, in video processing > > field? > > > > > > Regards. > > > > > > > -- > > Best regards, Edward J. Yoon > > [EMAIL PROTECTED] > > http://blog.udanax.org > > > -- ted
Re: Hadoop for real time
Hi. Video storage, processing and streaming. Regards. 2008/9/25 Edward J. Yoon <[EMAIL PROTECTED]> > What kind of the real-time app? > > On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin <[EMAIL PROTECTED]> wrote: > > Hi. > > > > Is it possible to use Hadoop for real-time app, in video processing > field? > > > > Regards. > > > > -- > Best regards, Edward J. Yoon > [EMAIL PROTECTED] > http://blog.udanax.org >
Re: Hadoop for real time
What kind of the real-time app? On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin <[EMAIL PROTECTED]> wrote: > Hi. > > Is it possible to use Hadoop for real-time app, in video processing field? > > Regards. > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Hadoop for real time
Hi. Is it possible to use Hadoop for real-time app, in video processing field? Regards.