"full-hypertext" publishing on the web

Kragen Javier Sitaker Thu, 16 Nov 2006 00:37:12 -0800

Eric Drexler published an article in 1987 (at Hypertext 87) entitled
"Hypertext Publishing and the Evolution of Knowledge", available on
the WWW at http://www.foresight.org/WebEnhance/HPEK1.html.


I haven't read the whole article yet, because the dope who put it on
the web thought it would be agood idea to split it into four pages,
and the dope who downloaded it to read it didn't think of the
possibility that it might be in multiple pieces like that.  But I can
quote some parts of it:

    This paper will use the terms "hypertext publishing" and
    "hypertext medium" as shorthand for filtered, fine-grained,
    full-hypertext publishing systems.  The lack of any of these
    characteristics would cripple the case made here for the value of
    hypertext in evolving knowledge. Lack of fine-grained linking
    would do injury; lack of any other characteristic would be
    grievous or fatal.

"Full-hypertext" means "with backlinks", "fine-grained" means links
can point to a small parts of the document, rather than merely
author-defined chunks of text; and "filtered" means that you don't
have to see all the backlinks.

Presumably, from Drexler's perspective, the WWW is grievously
inadequate, since it is none of the three.

Interestingly, in late 2005 and early 2006, I worked on projects that
add filtering, fine-grained linking, and backlinks to the WWW.  None
of those projects has yet been adopted, so I thought I should at least
describe the approaches in a way many people could understand.

Fine-grained linking
--------------------

CritLink used #'phrase+of+text to link to particular phrases in a
document, but that may not work well if the document changes.

Also, the CritLink solution requires that you do all your browsing
through an intermediary, which is subject to single points of failure.
Indeed, at the moment, the CritLink software doesn't scale well to
large numbers of concurrent users, is not running on the CritLink
server at the moment, and due in part to the grievously low priority
the current DNS governance system places on permanence, the crit.org
domain has lapsed due to nonpayment and been stolen by criminals.

I have not yet really tested the solution I proposed for fine-grained
linking: "Queer numbers: more aggressive purple numbers", 2005-10-10
    http://lists.canonical.org/pipermail/kragen-tol/2005-October/000796.html
It has the advantages of being more compact and (I think --- hard to
prove!) more stable across document edits than the CritLink approach
to fine-grained linking.

The queer-numbers approach also has the advantage that it's nonobvious
enough that it probably hasn't been patented.

Today it should be possible to run the queer-numbers code in
JavaScript inside a GreaseMonkey user script, although I haven't
tried it.

Filtered backlinks
------------------

Drexler doesn't distinguish between forward and backward links, but to
me it doesn't seem useful to remove links specified by web page
authors from the web pages --- it doesn't save any space, since you
have to include the text of the link anyway, and the links can't
overlap due to the nature of HTML.  So I'm satisfied with filtered
backlinks.

The backlinks project in question was called WowBar, which is a
terrible name for four reasons:
1. it's also the name of a piece of spyware;
2. it says nothing about the project;
3. it gives no credit to its ancestors (most immediately Wikalong, but
   also of course Crit, Third Voice, and Google Blog Comments);
4. it sounds dumb.

Jesse Andrews and I implemented WowBar over the course of a few
months, with one night of amazing graphical help from Kevin Hughes,
and lots of design discussions with Allan Schiffman.

The code for WowBar itself and most of the annotation-source gateways
is available, licensed under GNU GPL v2 from CommerceNet's web site.

In any case, the architecture is as follows:
1. when you open a web page, the user-agent sends a request to a
   "mixer" containing the web page URL and some configuration
   parameters.
2. the mixer forwards the web page URL to some set of annotation
   sources, possibly including configuration parameters for those
   annotation sources.
3. as each annotation source sends its response to the mixer, the
   mixer forwards it back to the user-agent;
4. the user-agent displays all the annotations outside the web page
   --- in the current implementation, to its left.

Empirically, this worked great at 1280x1024 and took up too much
screen space at 1024x768.

The idea is that the mixer is configured to get annotations (including
backlinks) only from sources you are interested in, and those sources
may themselves be mixers that aggregate and filter annotations from
some set of sources.

This compromises your privacy because you're telling the mixer every
web page you visit, and if that mixer server is used only by one or a
few people, all the sources will also know about every web page you
visit.  Our proposed, but unimplemented, solutions were as follows:
1. you use the same mixer as some number of other people, so that
   requests outbound from the mixer can only be identified as coming
   from a member of that group;
2. you can easily change to a different mixer, so you should be able
   to find a trustworthy mixer that doesn't keep logs;
3. sources can reduce the load on themselves and the mixer by
   publishing a Bloom filter of URLs for which they have annotations,
   also reducing the amount of information they get about you.

Clearly a better approach to #1 and #2 would use a MIX network to
route out requests to sources, and run the mixer on the machine with
the browser, in much the same way as AT&T's Crowds system.  As with
Tor, it would be best to use a single route through the MIX for some
period of time.

The other current problem with the system is that we use HTML for the
annotation format.  This has some benefits --- easier styling, your
own annotation store can easily provide you a user interface for
adding or editing comments --- but it creates security problems, both
due to inline images and so on tracing your IP, and due to JavaScript
same-domain policies.

Finally, at present, the mixers' return values do not conform to the
format expected from sources.

We implemented annotation gateways for Google backlinks, Technorati
blog post searches, Flickr images, and Wikalong, and several other
sources we didn't release: a graph of Alexa's traffic records for the
site, a search through Bloglines for blogposts linking to the page, a
display of Google PageRank for the site, and a source that links to
old versions of the page on archive.org; and a personal annotation
source.

Thoughts for now
----------------

It should be fairly straightforward for those interested in improving
discourse on the WWW to turn these ideas into reality.  They're
backward-compatible with the existing WWW, and they should enable
Drexler's (and Nelson's, and Licklider's, and Bush's) vision of
improving discourse through better hypertext, in ways the current WWW
can't.

For my part, I have other things I'm working on in the near term.

-- 
Kragen Javier Sitaker in Caracas, trying to get a clue

"full-hypertext" publishing on the web

Reply via email to