Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Edward Capriolo
I came up with my line of thinking after reading this article:

http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

As a guy that was intrigued by the java coffee cup in 95, that now
lives as a data center/noc jock/unix guy. Lets say I look at a log
management process from a data center prospective. I know:

Syslog is a familiar model (human readable: UDP text)
INETD/XINETD is a familiar model (programs that do amazing things with
STD IN/STD OUT)
Variety of hardware and software

I may be supporting an older Solaris 8, windows or  Free BSD 5 for example.

I want to be able to pipe apache custom log at HDFS, or forward
syslog. That is where LHadoop (or something like it) would come into
play.

I am thinking to even accept raw streams and have the server side use
source-host/regex to determine what file the data should go to.

I want to stay light on the client side. An application that tails log
files and transmits new data is another component to develop and
manage. Had anyone had experience with moving this type of data?


Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Jeff Hammerbacher
Hey Edward,

The application we used at Facebook to transmit new data is open
source now and available at
http://sourceforge.net/projects/scribeserver/.

Later,
Jeff

On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo [EMAIL PROTECTED] wrote:
 I came up with my line of thinking after reading this article:

 http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

 As a guy that was intrigued by the java coffee cup in 95, that now
 lives as a data center/noc jock/unix guy. Lets say I look at a log
 management process from a data center prospective. I know:

 Syslog is a familiar model (human readable: UDP text)
 INETD/XINETD is a familiar model (programs that do amazing things with
 STD IN/STD OUT)
 Variety of hardware and software

 I may be supporting an older Solaris 8, windows or  Free BSD 5 for example.

 I want to be able to pipe apache custom log at HDFS, or forward
 syslog. That is where LHadoop (or something like it) would come into
 play.

 I am thinking to even accept raw streams and have the server side use
 source-host/regex to determine what file the data should go to.

 I want to stay light on the client side. An application that tails log
 files and transmits new data is another component to develop and
 manage. Had anyone had experience with moving this type of data?



Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Pete Wyckoff

Chukwa also could be used here.


On 10/24/08 11:47 AM, Jeff Hammerbacher [EMAIL PROTECTED] wrote:

Hey Edward,

The application we used at Facebook to transmit new data is open
source now and available at
http://sourceforge.net/projects/scribeserver/.

Later,
Jeff

On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo [EMAIL PROTECTED] wrote:
 I came up with my line of thinking after reading this article:

 http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

 As a guy that was intrigued by the java coffee cup in 95, that now
 lives as a data center/noc jock/unix guy. Lets say I look at a log
 management process from a data center prospective. I know:

 Syslog is a familiar model (human readable: UDP text)
 INETD/XINETD is a familiar model (programs that do amazing things with
 STD IN/STD OUT)
 Variety of hardware and software

 I may be supporting an older Solaris 8, windows or  Free BSD 5 for example.

 I want to be able to pipe apache custom log at HDFS, or forward
 syslog. That is where LHadoop (or something like it) would come into
 play.

 I am thinking to even accept raw streams and have the server side use
 source-host/regex to determine what file the data should go to.

 I want to stay light on the client side. An application that tails log
 files and transmits new data is another component to develop and
 manage. Had anyone had experience with moving this type of data?





Re: LHadoop Server simple Hadoop input and output

2008-10-23 Thread Jeff Hammerbacher
Hey Edward,

The Thrift interface to HDFS allows clients to be developed in any
Thrift-supported language: http://wiki.apache.org/hadoop/HDFS-APIs.

Regards,
Jeff

On Thu, Oct 23, 2008 at 1:04 PM, Edward Capriolo [EMAIL PROTECTED] wrote:
 One of my first questions about hadoop was, How do systems outside
 the cluster interact with the file system? I read several documents
 that described streaming data into hadoop for processing, but I had
 trouble finding examples.

 The goal of LHadoop Server (L stands for Lightweight) is to produce a
 VERY simple interface to allow streaming READ and WRITE access to
 hadoop. The client side of the connection interacts using a simple
 text based protocol. Any type of client, perl,c++, telnet, can
 interact with hadoop. There is no need to have Java on the client.

 The protocol works like this:

 bash-3.2# nc localhost 9090
 AUTH ecapriolo password
 serverOK:AUTH
 READ /letsgo
 serverOK.
 OMG.
 Is this going to work
 Lets see
 ^C

 Site:
 http://www.jointhegrid.com/jtgweb/lhadoopserver/
 SVN:
 http://www.jointhegrid.com/jtgwebrepo/jtglhadoopserver

 I know several other methods exist to get access to a hadoop including
 Fuse. Again, I could not find anyone doing something like this. Does
 anyone have any ideas or think this is useful?

 Thank you,



Re: LHadoop Server simple Hadoop input and output

2008-10-23 Thread Edward Capriolo
I had downloaded thrift and ran the example applications after the
Hive meet up. It is very cool stuff. The thriftfs interface is more
elegant than what I was trying to do, and that implementation is more
complete.

Still, someone might be interested in what I did if they want a
super-light API :)

I will link to http://wiki.apache.org/hadoop/HDFS-APIs from my page so
people know the options.