I am going to be using HTTPCLIENT to get the source of a web page and I am
hoping to be able to extract certain information from that webpage. It will
all be HTML and I am looking for all the information between these tags
//... HTML Stuff here
</td>
<td class="alt1">(Simple 2 digit number I need here)</td>
</tr><tr align="center">
//... More HTML Stuff after this as well
</td>
<td class="alt1">(Simple 2 digit number I need here)</td>
</tr><tr align="center">
//... HTML Stuff after this as well
Ect.
I am thinking I am going to have to search through the
method.getResponseBody() for text that begins with </td> <td class="alt1">
and ends in </tr><tr align="center"> and get the data in the middle of them.
Although am I right in thinking I can't search through a line at a time? I
have to wait till the entire source comes in and then search through a
massive string?
Anyway once I have the data I want it put into a text file for the sake of
it which I can do.
Here's the code so far
import java.io.*;
import java.net.*;
import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;
import java.io.*;
public class HttpClientTutorial {
private static String url = "http://www.youngcoders.com/memberlist.php";
public static void main(String[] args) {
// Create an instance of HttpClient.
HttpClient client = new HttpClient();
// Create a method instance.
GetMethod method = new GetMethod(url);
// Provide custom retry handler is necessary
method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(3, false));
try {
// Execute the method.
int statusCode = client.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: " + method.getStatusLine());
}
// Read the response body.
byte[] responseBody = method.getResponseBody();
// Deal with the response.
// Use caution: ensure correct character encoding and is not binary
data
File outFile = new File("age.html"); // name file
BufferedWriter writer = new BufferedWriter(new FileWriter(outFile));
String line = new String(responseBody);
writer.write(line);
writer.close();
System.out.println(line);
} catch (HttpException e) {
System.err.println("Fatal protocol violation: " + e.getMessage());
e.printStackTrace();
} catch (IOException e) {
System.err.println("Fatal transport error: " + e.getMessage());
e.printStackTrace();
} finally {
// Release the connection.
method.releaseConnection();
}
}
}
At the moment that just gets the entire web page and puts it in a .html file
but how do I just get certain bits from the page?
Thanks for your time and if you don't understand anything just tell me and
Ill try and explain better.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]