Michael,

Writing a full fledged HTML parser that can work with char streams is a
not trivial affair. If all you do is scanning the HTML content for
specific patterns, any decent tutorial on Java Reader class should do:

http://java.sun.com/docs/books/tutorial/essential/io/overview.html

Otherwise, you may be better off just using an HTML parsing library.
There are a few of them available

Oleg

On Sat, 2004-11-27 at 10:00 -0800, Michael Taft wrote:
> Duncan & Oleg -
> Thanks for the code frag. It's been inserted into my app, and is working 
> beautifully.
> But, of course, Oleg's comment begs the question. I'm now reading the 
> response as a stream, but still parsing it as "one huge String." I based 
> my HTML parser on the ones in The Java Tutorial, which all use a String 
> as input, so I assumed (ah... the Problem) that feeding in one huge 
> String was the way to go.
> 
> My method calls look like this:
> 
> setParser.parseSetPage(readFully(new 
> InputStreamReader(get.getResponseBodyAsStream(), 
> get.getRequestCharSet())));
> 
> where: parseSetPage(String str)
> 
> Can you point me in the direction of an online example where the page is 
> fed to the parser chunk by chunk?
> 
> M.
> 
> Oleg Kalnichevski wrote:
> 
> > Duncan & Michael,
> > 
> > This is precisely the way we recommend the response body be consumed.
> > 
> > The whole idea is that one should REALLY avoid converting the response
> > body to a String unless absolutely necessary. One should really be
> > consuming the response body as a byte or char stream, which will result
> > in much, much more memory efficient code. For instance, if the content
> > body ultimately gets fed to an HTML parser or a scanner, it is by far
> > more efficient to feed it through a Reader in smaller chunks rather than
> > as one huge String
> > 
> > There's one little change which I would have made, though:
> > 
> > readFully(
> > new InputStreamReader(
> >   get.getResponseBodyAsStream(), 
> >   get.getResponseCharSet()));
> > 
> > Otherwise, everything looks cool
> > 
> > Cheers,
> > 
> > Oleg
> > 
> > 
> > On Sat, 2004-11-27 at 10:05 +0000, Duncan McGregor wrote:
> > 
> >>It will kind of work, although readLine discards the line end character, 
> >>which
> >>you might well want when parsing the string. And you may want to consider 
> >>the
> >>character set used in the InputStreamReader.
> >>
> >>Coincidentally I wrote this code yesterday
> >>
> >>    public static String readFully(Reader input) throws IOException {
> >>        BufferedReader bufferedReader = input instanceof BufferedReader 
> >>            ? (BufferedReader) input
> >>            : new BufferedReader(input);
> >>        StringBuffer result = new StringBuffer();
> >>        char[] buffer = new char[4 * 1024];
> >>        int charsRead;
> >>        while ((charsRead = bufferedReader.read(buffer)) != -1) {
> >>            result.append(buffer, 0, charsRead);
> >>        }           
> >>        return result.toString();
> >>    }
> >>
> >>Call this with doc = readFully(new
> >>InputStreamReader(get.getResponseBodyAsStream(), YOURCHARSET));
> >>
> >>Another good bet would be Jakarta Commons IO  - IOUtils.toString(Reader)
> >>
> >>Duncan Mc^Gregor
> >>The name rings a bell
> >>www.oneeyedmen.com
> >> 
> >>
> >>-----Original Message-----
> >>From: Michael Taft [mailto:[EMAIL PROTECTED] 
> >>Sent: 27 November 2004 07:03
> >>To: HttpClient User Discussion
> >>Subject: getResponseBodyAsStream
> >>
> >>HttpClient keeps begging me to use getResponseBodyAsStream, rather than
> >>getResponseBodyAsString, due to the size of the response body. I'm willing 
> >>to do
> >>this, even if just to make it happy. However, as a total newbie, I'm not 
> >>clear
> >>about the best way to take a response stream and turn it into a string 
> >>(that I
> >>can then parse, which is what I'm up to).
> >>
> >>I realize this is a trivial task for most of you. Here is how I propose to 
> >>do
> >>it:
> >>
> >>------
> >>
> >>StringBuffer buffer = new StringBuffer(); try { InputStream is =
> >>get.getResponseBodyAsStream(); BufferedReader in = new BufferedReader(new
> >>InputStreamReader(is)); String str = "";
> >>    while(str != null)
> >>    {
> >>            str = in.readLine();
> >>            buffer.append(str);
> >>    }
> >>} catch(IOException e)
> >>(
> >>            ...etc.
> >>}
> >>
> >>------
> >>
> >>My questions about this are:
> >>1) Will this work?
> >>2) Is there a better way to do it?
> >>
> >>Thanks.
> >>M.
> >>
> >>
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to