You could run the flume collectors on other machines and write a source which 
connects to the sockets on the data generators. 

-Joey



On Dec 15, 2011, at 21:27, "Periya.Data" <periya.d...@gmail.com> wrote:

> Sorry...misworded my statement. What I meant was that the sources are meant 
> to be untouched and admins do not want to mess with it and add more tools in 
> there. All I've got is source addresses, port numbers. Once I know what 
> technique(s) I will be using, accordingly, I will be given access via 
> firewalls and other access credentials.
> 
> 
> -PD
> 
> On Thu, Dec 15, 2011 at 5:05 PM, Russell Jurney <russell.jur...@gmail.com> 
> wrote:
> Just curious - what is the situation you're in where no collectors are
> possible?  Sounds interesting.
> 
> Russell Jurney
> twitter.com/rjurney
> russell.jur...@gmail.com
> datasyndrome.com
> 
> On Dec 15, 2011, at 5:01 PM, "Periya.Data" <periya.d...@gmail.com> wrote:
> 
> > Hi all,
> >     I would like to know what options I have to ingest terabytes of data
> > that are being generated very fast from a small set of sources. I have
> > thought about :
> >
> >   1. Flume
> >   2. Have an intermediate staging server(s) where you can offload data and
> >   from there use dfs -put to load into HDFS.
> >   3. Anything else??
> >
> > Suppose I am unable to use Flume (since the sources do not support their
> > installation) and suppose that I do not have the luxury of having an
> > intermediate staging place, what options do I have? In this case, I might
> > have to directly (preferably in parallel) ingest data into HDFS.
> >
> > I have read about a technique to use Map-Reduce where the map would read
> > data and use JAVA API to store in HDFS. We could have multiple threads of
> > maps to get parallel ingestion. It would be nice to know about ways to
> > ingest data "directly" into HDFS considering my assumptions.
> >
> > Suggestions are appreciated,
> >
> > /PD.
> 

Reply via email to