If your webpage is xml tagged and you are looking into using streaming.
This might help 
http://hadoop.apache.org/core/docs/r0.18.0/streaming.html#How+do+I+parse+XML+documents+using+streaming%3F
-Lohit



----- Original Message ----
From: Jim Twensky <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, September 9, 2008 11:23:37 AM
Subject: Re: Hadoop Streaming and Multiline Input

If I understand your question correctly, you need to write your own
FileInputFormat. Please see
http://hadoop.apache.org/core/docs/r0.18.0/api/index.html for details.

Regards,
Tim

On Sat, Sep 6, 2008 at 9:20 PM, Dennis Kubes <[EMAIL PROTECTED]> wrote:

> Is is possible to set a multiline text input in streaming to be used as a
> single record?  For example say I wanted to scan a webpage for a specific
> regex that is multiline, is this possible in streaming?
>
> Dennis
>

Reply via email to