Michael, Writing a full fledged HTML parser that can work with char streams is a not trivial affair. If all you do is scanning the HTML content for specific patterns, any decent tutorial on Java Reader class should do:
http://java.sun.com/docs/books/tutorial/essential/io/overview.html Otherwise, you may be better off just using an HTML parsing library. There are a few of them available Oleg On Sat, 2004-11-27 at 10:00 -0800, Michael Taft wrote: > Duncan & Oleg - > Thanks for the code frag. It's been inserted into my app, and is working > beautifully. > But, of course, Oleg's comment begs the question. I'm now reading the > response as a stream, but still parsing it as "one huge String." I based > my HTML parser on the ones in The Java Tutorial, which all use a String > as input, so I assumed (ah... the Problem) that feeding in one huge > String was the way to go. > > My method calls look like this: > > setParser.parseSetPage(readFully(new > InputStreamReader(get.getResponseBodyAsStream(), > get.getRequestCharSet()))); > > where: parseSetPage(String str) > > Can you point me in the direction of an online example where the page is > fed to the parser chunk by chunk? > > M. > > Oleg Kalnichevski wrote: > > > Duncan & Michael, > > > > This is precisely the way we recommend the response body be consumed. > > > > The whole idea is that one should REALLY avoid converting the response > > body to a String unless absolutely necessary. One should really be > > consuming the response body as a byte or char stream, which will result > > in much, much more memory efficient code. For instance, if the content > > body ultimately gets fed to an HTML parser or a scanner, it is by far > > more efficient to feed it through a Reader in smaller chunks rather than > > as one huge String > > > > There's one little change which I would have made, though: > > > > readFully( > > new InputStreamReader( > > get.getResponseBodyAsStream(), > > get.getResponseCharSet())); > > > > Otherwise, everything looks cool > > > > Cheers, > > > > Oleg > > > > > > On Sat, 2004-11-27 at 10:05 +0000, Duncan McGregor wrote: > > > >>It will kind of work, although readLine discards the line end character, > >>which > >>you might well want when parsing the string. And you may want to consider > >>the > >>character set used in the InputStreamReader. > >> > >>Coincidentally I wrote this code yesterday > >> > >> public static String readFully(Reader input) throws IOException { > >> BufferedReader bufferedReader = input instanceof BufferedReader > >> ? (BufferedReader) input > >> : new BufferedReader(input); > >> StringBuffer result = new StringBuffer(); > >> char[] buffer = new char[4 * 1024]; > >> int charsRead; > >> while ((charsRead = bufferedReader.read(buffer)) != -1) { > >> result.append(buffer, 0, charsRead); > >> } > >> return result.toString(); > >> } > >> > >>Call this with doc = readFully(new > >>InputStreamReader(get.getResponseBodyAsStream(), YOURCHARSET)); > >> > >>Another good bet would be Jakarta Commons IO - IOUtils.toString(Reader) > >> > >>Duncan Mc^Gregor > >>The name rings a bell > >>www.oneeyedmen.com > >> > >> > >>-----Original Message----- > >>From: Michael Taft [mailto:[EMAIL PROTECTED] > >>Sent: 27 November 2004 07:03 > >>To: HttpClient User Discussion > >>Subject: getResponseBodyAsStream > >> > >>HttpClient keeps begging me to use getResponseBodyAsStream, rather than > >>getResponseBodyAsString, due to the size of the response body. I'm willing > >>to do > >>this, even if just to make it happy. However, as a total newbie, I'm not > >>clear > >>about the best way to take a response stream and turn it into a string > >>(that I > >>can then parse, which is what I'm up to). > >> > >>I realize this is a trivial task for most of you. Here is how I propose to > >>do > >>it: > >> > >>------ > >> > >>StringBuffer buffer = new StringBuffer(); try { InputStream is = > >>get.getResponseBodyAsStream(); BufferedReader in = new BufferedReader(new > >>InputStreamReader(is)); String str = ""; > >> while(str != null) > >> { > >> str = in.readLine(); > >> buffer.append(str); > >> } > >>} catch(IOException e) > >>( > >> ...etc. > >>} > >> > >>------ > >> > >>My questions about this are: > >>1) Will this work? > >>2) Is there a better way to do it? > >> > >>Thanks. > >>M. > >> > >> > >> > >> > >>--------------------------------------------------------------------- > >>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > >>--------------------------------------------------------------------- > >>To unsubscribe, e-mail: [EMAIL PROTECTED] > >>For additional commands, e-mail: [EMAIL PROTECTED] > >> > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
