Hey Edward, The application we used at Facebook to transmit new data is open source now and available at http://sourceforge.net/projects/scribeserver/.
Later, Jeff On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > I came up with my line of thinking after reading this article: > > http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data > > As a guy that was intrigued by the java coffee cup in 95, that now > lives as a data center/noc jock/unix guy. Lets say I look at a log > management process from a data center prospective. I know: > > Syslog is a familiar model (human readable: UDP text) > INETD/XINETD is a familiar model (programs that do amazing things with > STD IN/STD OUT) > Variety of hardware and software > > I may be supporting an older Solaris 8, windows or Free BSD 5 for example. > > I want to be able to pipe apache custom log at HDFS, or forward > syslog. That is where LHadoop (or something like it) would come into > play. > > I am thinking to even accept raw streams and have the server side use > source-host/regex to determine what file the data should go to. > > I want to stay light on the client side. An application that tails log > files and transmits new data is another component to develop and > manage. Had anyone had experience with moving this type of data? >