Re: LHadoop Server simple Hadoop input and output

2008-10-28 Thread Ariel Rabkin
Chukwa is not quite ready for prime time.  The collection part works
OK and shouldn't be too evil to set up, but the analysis part, and the
data storage documentation isn't there yet.

On Mon, Oct 27, 2008 at 12:51 PM, Jeff Hammerbacher
<[EMAIL PROTECTED]> wrote:
> It could, but we have been unable to get Chukwa to run outside of Yahoo.
>
> On Fri, Oct 24, 2008 at 12:26 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
>>
>> Chukwa also could be used here.
>>
>>


-- 
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department


Re: LHadoop Server simple Hadoop input and output

2008-10-27 Thread Jeff Hammerbacher
It could, but we have been unable to get Chukwa to run outside of Yahoo.

On Fri, Oct 24, 2008 at 12:26 PM, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
>
> Chukwa also could be used here.
>
>
> On 10/24/08 11:47 AM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote:
>
> Hey Edward,
>
> The application we used at Facebook to transmit new data is open
> source now and available at
> http://sourceforge.net/projects/scribeserver/.
>
> Later,
> Jeff
>
> On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
>> I came up with my line of thinking after reading this article:
>>
>> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
>>
>> As a guy that was intrigued by the java coffee cup in 95, that now
>> lives as a data center/noc jock/unix guy. Lets say I look at a log
>> management process from a data center prospective. I know:
>>
>> Syslog is a familiar model (human readable: UDP text)
>> INETD/XINETD is a familiar model (programs that do amazing things with
>> STD IN/STD OUT)
>> Variety of hardware and software
>>
>> I may be supporting an older Solaris 8, windows or  Free BSD 5 for example.
>>
>> I want to be able to pipe apache custom log at HDFS, or forward
>> syslog. That is where LHadoop (or something like it) would come into
>> play.
>>
>> I am thinking to even accept raw streams and have the server side use
>> source-host/regex to determine what file the data should go to.
>>
>> I want to stay light on the client side. An application that tails log
>> files and transmits new data is another component to develop and
>> manage. Had anyone had experience with moving this type of data?
>>
>
>
>


Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Pete Wyckoff

Chukwa also could be used here.


On 10/24/08 11:47 AM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote:

Hey Edward,

The application we used at Facebook to transmit new data is open
source now and available at
http://sourceforge.net/projects/scribeserver/.

Later,
Jeff

On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
> I came up with my line of thinking after reading this article:
>
> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
>
> As a guy that was intrigued by the java coffee cup in 95, that now
> lives as a data center/noc jock/unix guy. Lets say I look at a log
> management process from a data center prospective. I know:
>
> Syslog is a familiar model (human readable: UDP text)
> INETD/XINETD is a familiar model (programs that do amazing things with
> STD IN/STD OUT)
> Variety of hardware and software
>
> I may be supporting an older Solaris 8, windows or  Free BSD 5 for example.
>
> I want to be able to pipe apache custom log at HDFS, or forward
> syslog. That is where LHadoop (or something like it) would come into
> play.
>
> I am thinking to even accept raw streams and have the server side use
> source-host/regex to determine what file the data should go to.
>
> I want to stay light on the client side. An application that tails log
> files and transmits new data is another component to develop and
> manage. Had anyone had experience with moving this type of data?
>




Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Jeff Hammerbacher
Hey Edward,

The application we used at Facebook to transmit new data is open
source now and available at
http://sourceforge.net/projects/scribeserver/.

Later,
Jeff

On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
> I came up with my line of thinking after reading this article:
>
> http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
>
> As a guy that was intrigued by the java coffee cup in 95, that now
> lives as a data center/noc jock/unix guy. Lets say I look at a log
> management process from a data center prospective. I know:
>
> Syslog is a familiar model (human readable: UDP text)
> INETD/XINETD is a familiar model (programs that do amazing things with
> STD IN/STD OUT)
> Variety of hardware and software
>
> I may be supporting an older Solaris 8, windows or  Free BSD 5 for example.
>
> I want to be able to pipe apache custom log at HDFS, or forward
> syslog. That is where LHadoop (or something like it) would come into
> play.
>
> I am thinking to even accept raw streams and have the server side use
> source-host/regex to determine what file the data should go to.
>
> I want to stay light on the client side. An application that tails log
> files and transmits new data is another component to develop and
> manage. Had anyone had experience with moving this type of data?
>


Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Edward Capriolo
I came up with my line of thinking after reading this article:

http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

As a guy that was intrigued by the java coffee cup in 95, that now
lives as a data center/noc jock/unix guy. Lets say I look at a log
management process from a data center prospective. I know:

Syslog is a familiar model (human readable: UDP text)
INETD/XINETD is a familiar model (programs that do amazing things with
STD IN/STD OUT)
Variety of hardware and software

I may be supporting an older Solaris 8, windows or  Free BSD 5 for example.

I want to be able to pipe apache custom log at HDFS, or forward
syslog. That is where LHadoop (or something like it) would come into
play.

I am thinking to even accept raw streams and have the server side use
source-host/regex to determine what file the data should go to.

I want to stay light on the client side. An application that tails log
files and transmits new data is another component to develop and
manage. Had anyone had experience with moving this type of data?


Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Pete Wyckoff

Another way to do this is make thrift's java compiler generate REST bindings 
like its php compiler does and there is also libhdfs and 
http://wiki.apache.org/hadoop/MountableHDFS


On 10/23/08 2:54 PM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote:

I had downloaded thrift and ran the example applications after the
Hive meet up. It is very cool stuff. The thriftfs interface is more
elegant than what I was trying to do, and that implementation is more
complete.

Still, someone might be interested in what I did if they want a
super-light API :)

I will link to http://wiki.apache.org/hadoop/HDFS-APIs from my page so
people know the options.




Re: LHadoop Server simple Hadoop input and output

2008-10-23 Thread Edward Capriolo
I had downloaded thrift and ran the example applications after the
Hive meet up. It is very cool stuff. The thriftfs interface is more
elegant than what I was trying to do, and that implementation is more
complete.

Still, someone might be interested in what I did if they want a
super-light API :)

I will link to http://wiki.apache.org/hadoop/HDFS-APIs from my page so
people know the options.


Re: LHadoop Server simple Hadoop input and output

2008-10-23 Thread Jeff Hammerbacher
Hey Edward,

The Thrift interface to HDFS allows clients to be developed in any
Thrift-supported language: http://wiki.apache.org/hadoop/HDFS-APIs.

Regards,
Jeff

On Thu, Oct 23, 2008 at 1:04 PM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
> One of my first questions about hadoop was, "How do systems outside
> the cluster interact with the file system?" I read several documents
> that described streaming data into hadoop for processing, but I had
> trouble finding examples.
>
> The goal of LHadoop Server (L stands for Lightweight) is to produce a
> VERY simple interface to allow streaming READ and WRITE access to
> hadoop. The client side of the connection interacts using a simple
> text based protocol. Any type of client, perl,c++, telnet, can
> interact with hadoop. There is no need to have Java on the client.
>
> The protocol works like this:
>
> bash-3.2# nc localhost 9090
> AUTH ecapriolo password
> server>>OK:AUTH
> READ /letsgo
> server>>OK.
> OMG.
> Is this going to work
> Lets see
> ^C
>
> Site:
> http://www.jointhegrid.com/jtgweb/lhadoopserver/
> SVN:
> http://www.jointhegrid.com/jtgwebrepo/jtglhadoopserver
>
> I know several other methods exist to get access to a hadoop including
> Fuse. Again, I could not find anyone doing something like this. Does
> anyone have any ideas or think this is useful?
>
> Thank you,
>