REF: http://db.debian.net/lurker/message/20070707.195201.8e2c00a8.en.html Author: Varun Hiremath Date: 2007-07-07 12:52 -700 To: 423669 CC: control, Torsten Werner New-Topics: Processed: uscan: https support Subject: Bug#423669: uscan: https support
We noticed a wierd usage of our SiteTruth.com site mentioned in a Debian bug report. Bug report #423669 apparently patched a problem by using a link to a CGI script on our site. We have a system that rates web pages, and as a service for webmasters, we have a little utility, "viewer.cgi", which is used to show users how our crawler saw a page. Somebody stuck this into a Debian watchfile because it can be used to read a HTTPS page via HTTP, something they needed. But "viewer.cgi" does more than that. It's not a transparent proxy. It truncates pages at 1MB, parses the HTML into a tree, converts to Unicode/UTF-8, makes all the links absolute, removes embedded content (Javascript, Flash, etc.), and outputs the result as cleaned up and properly indented HTML. What you get out isn't quite what went in. So this probably isn't what you want. SiteTruth really shouldn't be part of some Debian build procedure. We suggest finding some other way to read HTTPS pages with HTTP. Wrong tool for the job. Thanks. John Nagle SiteTruth http://www.sitetruth.com [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]