If your webpage is xml tagged and you are looking into using streaming. This might help http://hadoop.apache.org/core/docs/r0.18.0/streaming.html#How+do+I+parse+XML+documents+using+streaming%3F -Lohit
----- Original Message ---- From: Jim Twensky <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Tuesday, September 9, 2008 11:23:37 AM Subject: Re: Hadoop Streaming and Multiline Input If I understand your question correctly, you need to write your own FileInputFormat. Please see http://hadoop.apache.org/core/docs/r0.18.0/api/index.html for details. Regards, Tim On Sat, Sep 6, 2008 at 9:20 PM, Dennis Kubes <[EMAIL PROTECTED]> wrote: > Is is possible to set a multiline text input in streaming to be used as a > single record? For example say I wanted to scan a webpage for a specific > regex that is multiline, is this possible in streaming? > > Dennis >