We're doing web scraping using node and coming across an issue that we
cannot fetch a particular URL on a particular web site, because it sends
back: "Content-Length: 1234,1234"
I totally understand that node's http parser doesn't deal with this, and
throws an error, but is there any way we can intercept this and fix it up?
The only way I can think of is using a proxy written in another language,
which seems like a sucky solution.
Thoughts?
Here's some test code to demonstrate this:
var assert = require('assert');
var http = require('http');
var seen_req = false;
var server = http.createServer(function(req, res) {
assert.equal('GET', req.method);
assert.equal('/foo?bar', req.url);
res.writeHead(200, {'Content-Type': 'text/plain', 'Content-Length':
'6,6'});
res.write('hello\n');
res.end();
server.close();
seen_req = true;
});
server.listen(12345, function() {
http.get('http://127.0.0.1:' + 12345 + '/foo?bar');
});
process.on('exit', function() {
assert(seen_req);
});
--
Job Board: http://jobs.nodejs.org/
Posting guidelines:
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en