Hey Edward,

The application we used at Facebook to transmit new data is open
source now and available at
http://sourceforge.net/projects/scribeserver/.

Later,
Jeff

On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
> I came up with my line of thinking after reading this article:
>
> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
>
> As a guy that was intrigued by the java coffee cup in 95, that now
> lives as a data center/noc jock/unix guy. Lets say I look at a log
> management process from a data center prospective. I know:
>
> Syslog is a familiar model (human readable: UDP text)
> INETD/XINETD is a familiar model (programs that do amazing things with
> STD IN/STD OUT)
> Variety of hardware and software
>
> I may be supporting an older Solaris 8, windows or  Free BSD 5 for example.
>
> I want to be able to pipe apache custom log at HDFS, or forward
> syslog. That is where LHadoop (or something like it) would come into
> play.
>
> I am thinking to even accept raw streams and have the server side use
> source-host/regex to determine what file the data should go to.
>
> I want to stay light on the client side. An application that tails log
> files and transmits new data is another component to develop and
> manage. Had anyone had experience with moving this type of data?
>

Reply via email to