Re: LHadoop Server simple Hadoop input and output
Chukwa is not quite ready for prime time. The collection part works OK and shouldn't be too evil to set up, but the analysis part, and the data storage documentation isn't there yet. On Mon, Oct 27, 2008 at 12:51 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote: > It could, but we have been unable to get Chukwa to run outside of Yahoo. > > On Fri, Oct 24, 2008 at 12:26 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote: >> >> Chukwa also could be used here. >> >> -- Ari Rabkin [EMAIL PROTECTED] UC Berkeley Computer Science Department
Re: LHadoop Server simple Hadoop input and output
It could, but we have been unable to get Chukwa to run outside of Yahoo. On Fri, Oct 24, 2008 at 12:26 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote: > > Chukwa also could be used here. > > > On 10/24/08 11:47 AM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote: > > Hey Edward, > > The application we used at Facebook to transmit new data is open > source now and available at > http://sourceforge.net/projects/scribeserver/. > > Later, > Jeff > > On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: >> I came up with my line of thinking after reading this article: >> >> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data >> >> As a guy that was intrigued by the java coffee cup in 95, that now >> lives as a data center/noc jock/unix guy. Lets say I look at a log >> management process from a data center prospective. I know: >> >> Syslog is a familiar model (human readable: UDP text) >> INETD/XINETD is a familiar model (programs that do amazing things with >> STD IN/STD OUT) >> Variety of hardware and software >> >> I may be supporting an older Solaris 8, windows or Free BSD 5 for example. >> >> I want to be able to pipe apache custom log at HDFS, or forward >> syslog. That is where LHadoop (or something like it) would come into >> play. >> >> I am thinking to even accept raw streams and have the server side use >> source-host/regex to determine what file the data should go to. >> >> I want to stay light on the client side. An application that tails log >> files and transmits new data is another component to develop and >> manage. Had anyone had experience with moving this type of data? >> > > >
Re: LHadoop Server simple Hadoop input and output
Chukwa also could be used here. On 10/24/08 11:47 AM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote: Hey Edward, The application we used at Facebook to transmit new data is open source now and available at http://sourceforge.net/projects/scribeserver/. Later, Jeff On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > I came up with my line of thinking after reading this article: > > http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data > > As a guy that was intrigued by the java coffee cup in 95, that now > lives as a data center/noc jock/unix guy. Lets say I look at a log > management process from a data center prospective. I know: > > Syslog is a familiar model (human readable: UDP text) > INETD/XINETD is a familiar model (programs that do amazing things with > STD IN/STD OUT) > Variety of hardware and software > > I may be supporting an older Solaris 8, windows or Free BSD 5 for example. > > I want to be able to pipe apache custom log at HDFS, or forward > syslog. That is where LHadoop (or something like it) would come into > play. > > I am thinking to even accept raw streams and have the server side use > source-host/regex to determine what file the data should go to. > > I want to stay light on the client side. An application that tails log > files and transmits new data is another component to develop and > manage. Had anyone had experience with moving this type of data? >
Re: LHadoop Server simple Hadoop input and output
Hey Edward, The application we used at Facebook to transmit new data is open source now and available at http://sourceforge.net/projects/scribeserver/. Later, Jeff On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > I came up with my line of thinking after reading this article: > > http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data > > As a guy that was intrigued by the java coffee cup in 95, that now > lives as a data center/noc jock/unix guy. Lets say I look at a log > management process from a data center prospective. I know: > > Syslog is a familiar model (human readable: UDP text) > INETD/XINETD is a familiar model (programs that do amazing things with > STD IN/STD OUT) > Variety of hardware and software > > I may be supporting an older Solaris 8, windows or Free BSD 5 for example. > > I want to be able to pipe apache custom log at HDFS, or forward > syslog. That is where LHadoop (or something like it) would come into > play. > > I am thinking to even accept raw streams and have the server side use > source-host/regex to determine what file the data should go to. > > I want to stay light on the client side. An application that tails log > files and transmits new data is another component to develop and > manage. Had anyone had experience with moving this type of data? >
Re: LHadoop Server simple Hadoop input and output
I came up with my line of thinking after reading this article: http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data As a guy that was intrigued by the java coffee cup in 95, that now lives as a data center/noc jock/unix guy. Lets say I look at a log management process from a data center prospective. I know: Syslog is a familiar model (human readable: UDP text) INETD/XINETD is a familiar model (programs that do amazing things with STD IN/STD OUT) Variety of hardware and software I may be supporting an older Solaris 8, windows or Free BSD 5 for example. I want to be able to pipe apache custom log at HDFS, or forward syslog. That is where LHadoop (or something like it) would come into play. I am thinking to even accept raw streams and have the server side use source-host/regex to determine what file the data should go to. I want to stay light on the client side. An application that tails log files and transmits new data is another component to develop and manage. Had anyone had experience with moving this type of data?
Re: LHadoop Server simple Hadoop input and output
Another way to do this is make thrift's java compiler generate REST bindings like its php compiler does and there is also libhdfs and http://wiki.apache.org/hadoop/MountableHDFS On 10/23/08 2:54 PM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: I had downloaded thrift and ran the example applications after the Hive meet up. It is very cool stuff. The thriftfs interface is more elegant than what I was trying to do, and that implementation is more complete. Still, someone might be interested in what I did if they want a super-light API :) I will link to http://wiki.apache.org/hadoop/HDFS-APIs from my page so people know the options.
Re: LHadoop Server simple Hadoop input and output
I had downloaded thrift and ran the example applications after the Hive meet up. It is very cool stuff. The thriftfs interface is more elegant than what I was trying to do, and that implementation is more complete. Still, someone might be interested in what I did if they want a super-light API :) I will link to http://wiki.apache.org/hadoop/HDFS-APIs from my page so people know the options.
Re: LHadoop Server simple Hadoop input and output
Hey Edward, The Thrift interface to HDFS allows clients to be developed in any Thrift-supported language: http://wiki.apache.org/hadoop/HDFS-APIs. Regards, Jeff On Thu, Oct 23, 2008 at 1:04 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > One of my first questions about hadoop was, "How do systems outside > the cluster interact with the file system?" I read several documents > that described streaming data into hadoop for processing, but I had > trouble finding examples. > > The goal of LHadoop Server (L stands for Lightweight) is to produce a > VERY simple interface to allow streaming READ and WRITE access to > hadoop. The client side of the connection interacts using a simple > text based protocol. Any type of client, perl,c++, telnet, can > interact with hadoop. There is no need to have Java on the client. > > The protocol works like this: > > bash-3.2# nc localhost 9090 > AUTH ecapriolo password > server>>OK:AUTH > READ /letsgo > server>>OK. > OMG. > Is this going to work > Lets see > ^C > > Site: > http://www.jointhegrid.com/jtgweb/lhadoopserver/ > SVN: > http://www.jointhegrid.com/jtgwebrepo/jtglhadoopserver > > I know several other methods exist to get access to a hadoop including > Fuse. Again, I could not find anyone doing something like this. Does > anyone have any ideas or think this is useful? > > Thank you, >