[Conkeror] [PATCH] index-webjump: New module to define webjumps for index pages.

David Kettler Tue, 17 Nov 2009 03:45:20 -0800

Index webjumps provide convenient access, with completion, to a set of
web pages that are indexed (referenced) from another page.  For
instance, webjumps for documentation (e.g., git, ImageMagick) are easy
to define.


A webjump to access URLs referenced from an index page is defined
using define_xpath_webjump.  An xpath expression is used to extract
the indexed URLs and the anchor text; this provides completion for the
webjump.  The completion must be enabled using webjump-get-index once
for each index webjump.

This module also subsumes define_gitweb_summary_webjump, which results
in changes to how gitweb webjumps are set up.
---

This patch only shows the new index-webjumps.js module.  The eventual
commit will also remove the existing gitweb-webjumps.js module.

This makes use of functionality provided by [[PATCH] Let completion
functions set the match_required state], but that is not vital.  It's
also helped by [[PATCH] Show errors from completers, etc].

Conkeror wiki pages will also be updated as follows.

= Writing Webjumps =

== Index webjumps ==

Index webjumps provide convenient access to a set of web pages that
are indexed (referenced) from another page.  Two kinds are provided;
xpath webjumps and gitweb summary webjumps.  Completions can be
provided for the webjump by saving a copy of the index page to
`index_webjumps_directory`, which can be set as follows.
{{{
require("index-webjump.js");
index_webjumps_directory = get_home_directory();
index_webjumps_directory.appendRelativePath(".conkerorrc/index-webjumps");
}}}

For each defined index webjump the index page can be saved using `M-x
webjump-get-index`.

=== Gitweb summary webjumps ===

These webjumps help you visit repositories at a gitweb server:
{{{
define_gitweb_summary_webjump("gitweb-ko", "http://git.kernel.org";);
define_gitweb_summary_webjump("gitweb-cz", "http://repo.or.cz/w";);
}}}
You can now use the following webjumps:
{{{
gitweb-cz conkeror
gitweb-ko git/git
}}}

To make completions available use `M-x webjump-get-index` and
select `gitweb-cz` then, once the download is finished, completions
will be available for that webjump.  Sites with many repositories
(such as the two given) can take many minutes to return the OPML data.

When defining the webjump, a default repository at the gitweb server
can be specified using the `$default` keyword.  An `$alternative` may
otherwise be given as usual.  If neither are given then the
alternative url for the webjump is defined to be the gitweb repository
list page.

=== XPath webjumps ===

An xpath webjump extracts the set of referenced web pages from an
index page using an [[http://www.w3.org/TR/xpath|XPath]] expression.
For these webjumps to work, the index must be downloaded using `M-x
webjump-get-index`.

Unfortunately, the xulrunner parser that is used is quite fussy and,
in particular, is an xml parser.  Many web pages fail to parse correctly.
To correct this problem the downloaded index page is automatically
cleaned up using `index_xpath_webjump_tidy_command`.  The html
[[http://tidy.sourceforge.net|tidy]] program should be installed for
this to work.

It can take a few attempts to figure out an appropriate XPath expression;
`index_webjump_try_xpath` is provided to help with that process.

Examples:
{{{
define_xpath_webjump(
    "gitdoc",
    "http://www.kernel.org/pub/software/scm/git/docs/";,
    '//xhtml:dt/xhtml:a',
    $description = "Git documentation");
}}}
The following examples require the html tidy program to be installed.
{{{
define_xpath_webjump(
    "conkerorwiki-page",
    "http://conkeror.org/";,
    '//xhtml:li/xhtml:p/xhtml:a[starts-with(@href,"/")]',
    $description = "Conkeror wiki pages linked from the front page");

define_xpath_webjump(
    "imagemagick-options",
    "http://imagemagick.org/script/command-line-options.php";,
    '//xhtml:p...@class="navigation-index"]/xhtml:a',
    $description = "Imagemagick command line options");
}}}

= BreakingChanges =

Gitweb summary webjumps are now implemented as index webjumps.  The
`webjump-get-index` command and `index_webjumps_directory` variable are
used rather than the previous gitweb equivalents.  Existing gitweb
opml files can be moved to the new locations using something like:

{{{
cd ~/.conkerorrc
mkdir index-webjumps
for f in gitweb-webjumps-opml/*.opml; do
  mv $f index-webjumps/$(basename $f .opml).index
done
rmdir gitweb-webjumps-opml
}}}

The `$completer` option is no longer available.

= User Variables =

index_webjumps_directory::
:: A directory for storing the index files corresponding to index
webjumps; the index data can be downloaded from the index URL using
`webjump-get-index`.  If the index file is available for an index
webjump then the webjump will provide completions for the indexed
URLs.

index_xpath_webjump_tidy_command::
:: A command to run on the downloaded index.  The xulrunner parser is
quite fussy and specifically requires xhtml (or other xml).  Running
something like html tidy can avoid parser problems.
---
 modules/gitweb-webjump.js |  175 -------------------------
 modules/index-webjump.js  |  312 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 312 insertions(+), 175 deletions(-)
 delete mode 100644 modules/gitweb-webjump.js
 create mode 100644 modules/index-webjump.js

diff --git a/modules/index-webjump.js b/modules/index-webjump.js
new file mode 100644
index 0000000..16d1a49
--- /dev/null
+++ b/modules/index-webjump.js
@@ -0,0 +1,312 @@
+/**
+ * (C) Copyright 2009 David Kettler
+ *
+ * Use, modification, and distribution are subject to the terms specified in 
the
+ * COPYING file.
+ *
+ * Construct a webjump (with completer) to visit URLs referenced from
+ * an index page.  An xpath expression is used to extract the indexed
+ * URLs.  A specialized form is also provided for gitweb summary
+ * pages.
+**/
+
+require("webjump.js");
+
+/* Objects with completion data for index webjumps. */
+index_webjumps = {};
+
+define_variable("index_webjumps_directory", null,
+                "A directory for storing the index files corresponding to " +
+                "index webjumps; the index data can be downloaded from the " +
+                "index URL using webjump-get-index.  " +
+                "If the index file is available for an index webjump then " +
+                "the webjump will provide completions for the indexed URLs.");
+
+define_variable("index_xpath_webjump_tidy_command",
+                "tidy -asxhtml -wrap 0 -modify -quiet --show-warnings no",
+                "A command to run on the downloaded index.  The xulrunner " +
+                "parser is quite fussy and specifically requires xhtml (or " +
+                "other xml).  Running something like html tidy can avoid " +
+                "parser problems.");
+
+function index_webjump(key, url, file) {
+    this.key = key;
+    this.url = url;
+    this.file = this.canonicalize_file(file);
+
+    if (this.require_completions && !this.file)
+        throw interactive_error("Index file not defined for " + this.key);
+}
+index_webjump.prototype = {
+    constructor : index_webjump,
+
+    mime_type : null,
+    xpath_expr : null,
+    make_completion : null,
+    require_completions : false,
+    completions : null,
+    file_time : 0,
+    tidy_command : null,
+
+    /* Extract full completion list from index file. */
+    extract_completions : function () {
+        /* Parse the index file. */
+        var stream = Cc["@mozilla.org/network/file-input-stream;1"]
+            .createInstance(Ci.nsIFileInputStream);
+        stream.init(this.file, MODE_RDONLY, 0644, false);
+        var parser = Cc["@mozilla.org/xmlextras/domparser;1"]
+            .createInstance(Ci.nsIDOMParser);
+        var doc = parser.parseFromStream(stream, null,
+                                         this.file.fileSize, this.mime_type);
+
+        /* Extract the completion items. */
+        var cmpl = [], node, res;
+        res = doc.evaluate(
+            this.xpath_expr, doc, xpath_lookup_namespace,
+            Ci.nsIDOMXPathResult.UNORDERED_NODE_ITERATOR_TYPE, null);
+        while ((node = res.iterateNext()))
+            cmpl.push(this.make_completion(node));
+
+        cmpl.sort(function(a, b) {
+            if (a[1] < b[1])  return -1;
+            if (a[1] > b[1])  return 1;
+            if (a[0] < b[0])  return -1;
+            if (a[0] > b[0])  return 1;
+            return 0;
+        });
+
+        this.completions = cmpl;
+    },
+
+    /* The guts of the completer. */
+    internal_completer : function (input, pos, conservative) {
+        if (pos == 0 && conservative)
+            yield co_return(undefined);
+
+        let require = this.require_completions;
+
+        /* Update full completion list if necessary. */
+        if (require && !this.file.exists())
+            throw interactive_error("Index file missing for " + this.key);
+        if (this.file.exists() &&
+            this.file.lastModifiedTime > this.file_time) {
+            this.file_time = this.file.lastModifiedTime;
+            this.extract_completions();
+        }
+        if (require && !this.completions)
+            throw interactive_error("No completions for " + this.key);
+        if (!this.completions)
+            yield co_return(null);
+
+        /* Match completions against input. */
+        let words = trim_whitespace(input.toLowerCase()).split(/\s+/);
+        let data = this.completions.filter(function (x) {
+            for (var i = 0; i < words.length; ++i)
+                if (x[0].toLowerCase().indexOf(words[i]) == -1 &&
+                    x[1].toLowerCase().indexOf(words[i]) == -1)
+                    return false;
+            return true;
+        });
+
+        let c = { count: data.length,
+                  get_string: function (i) data[i][0],
+                  get_description: function (i) data[i][1],
+                  get_input_state: function (i) [data[i][0]],
+                  get_match_required: function() require
+                };
+        yield co_return(c);
+    },
+
+    /* A completer suitable for supplying to define_webjump. */
+    make_completer : function() {
+        if (!this.file)
+            return null;
+        let jmp = this;
+        return function (input, pos, conservative) {
+            return jmp.internal_completer(input, pos, conservative);
+        };
+    },
+
+    /* Fetch and save the index for later use with completion.
+     * (buffer is used only to associate with the download) */
+    get_index : function (buffer) {
+        if (!this.file)
+            throw interactive_error("Index file not defined for " + this.key);
+
+        var cwd = null;
+        if (index_webjumps_directory instanceof Ci.nsILocalFile)
+            cwd = index_webjumps_directory.path;
+        else if (index_webjumps_directory)
+            cwd = index_webjumps_directory;
+
+        var info = save_uri(load_spec(this.url), this.file,
+                            $buffer = buffer, $use_cache = false,
+                            $temp_file = true);
+
+        // Note: it would be better to run this before the temp file
+        // is renamed; that requires support in save_uri.
+        if (this.tidy_command)
+            info.set_shell_command(this.tidy_command, cwd);
+    },
+
+    /* Try to make a suitable file object when the supplied file is a
+     * string or null. */
+    canonicalize_file : function (file) {
+        if (typeof file == 'string')
+            file = make_file(file);
+        if (!file && index_webjumps_directory) {
+            file = Cc["@mozilla.org/file/local;1"]
+                .createInstance(Ci.nsILocalFile);
+            if (index_webjumps_directory instanceof Ci.nsILocalFile)
+                file.initWithFile(index_webjumps_directory);
+            else
+                file.initWithPath(index_webjumps_directory);
+            file.appendRelativePath(this.key + ".index");
+        }
+        return file;
+    }
+}
+
+
+function index_webjump_xhtml(key, url, file, xpath_expr) {
+    index_webjump.call(this, key, url, file);
+    this.xpath_expr = xpath_expr;
+}
+index_webjump_xhtml.prototype = {
+    constructor : index_webjump_xhtml,
+
+    require_completions : true,
+    mime_type : "application/xhtml+xml",
+    tidy_command : index_xpath_webjump_tidy_command,
+
+    make_completion : function (node) {
+        return [makeURLAbsolute(this.url, node.href), node.text];
+    },
+
+    __proto__ : index_webjump.prototype
+}
+
+
+function index_webjump_gitweb(key, url, file) {
+    index_webjump.call(this, key, url, file);
+}
+index_webjump_gitweb.prototype = {
+    constructor : index_webjump_gitweb,
+
+    mime_type : "text/xml",
+    xpath_expr : '//outli...@type="rss"]',
+
+    make_completion : function (node) {
+        var name = node.getAttribute("text");
+        return [name.replace(/\.git$/, ""), ""];
+    },
+
+    __proto__ : index_webjump.prototype
+}
+
+
+interactive("webjump-get-index",
+            "Fetch and save the index URL corresponding to an index " +
+            "webjump.  It will then be available to the completer.",
+            function (I) {
+                var completions = [];
+                for (let i in index_webjumps)
+                    completions.push(i);
+                completions.sort();
+
+                var key = yield I.minibuffer.read(
+                    $prompt = "Fetch index for index webjump:",
+                    $history = "webjump",
+                    $completer =
+                        all_word_completer($completions = completions),
+                    $match_required = true);
+
+                var jmp = index_webjumps[key];
+                if (jmp)
+                    jmp.get_index(I.buffer);
+            });
+
+/**
+ * Construct a webjump to visit URLs referenced from an index page.
+ *
+ * The index page must be able to be parsed as xhtml.  The anchor
+ * nodes indexed are those that match the given xpath_expr.  Don't
+ * forget to use xhtml: prefixes on the xpath steps.
+ *
+ * If an alternative is not specified then it is set to the index page.
+ *
+ * A completer is provided that uses the index page.  A local file for
+ * the index must be specified either with $index_file or via
+ * index_webjumps_directory.  The index must be manually downloaded;
+ * eg. using webjump-get-index.  Each time the completer is used it
+ * will check if the file has been updated and reload if necessary.
+ * This kind of webjump is not useful without the completions.
+ */
+define_keywords("$alternative", "$index_file", "$description");
+function define_xpath_webjump(key, index_url, xpath_expr) {
+    keywords(arguments);
+    let alternative = arguments.$alternative || index_url;
+
+    var jmp = new index_webjump_xhtml(key, index_url, arguments.$index_file,
+                                      xpath_expr);
+    index_webjumps[key] = jmp;
+
+    define_webjump(key, function (term) {return term;},
+                   $completer = jmp.make_completer(),
+                   $alternative = alternative,
+                   $description = arguments.$description);
+}
+
+/**
+ * Modify the xpath for an index webjump and show the resulting
+ * completions.  Useful for figuring out an appropriate xpath.  Either
+ * run using mozrepl or eval in the browser with the dump parameter
+ * set.
+ */
+function index_webjump_try_xpath(key, xpath_expr, dump) {
+    jmp = index_webjumps[key];
+    if (xpath_expr)
+        jmp.xpath_expr = xpath_expr;
+    jmp.extract_completions();
+    if (dump)
+        dumpln(dump_obj(jmp.completions,
+                        "Completions for index webjump " + key));
+    return jmp.completions;
+}
+
+
+/**
+ * Construct a webjump to visit repository summary pages at a gitweb
+ * server.
+ *
+ * If a repository name is supplied as $default then the alternative
+ * url is set to that repository at the gitweb site.  If an
+ * alternative is not specified by either $default or $alternative
+ * then it is set to the repository list page of the gitweb site.
+ *
+ * A completer is provided that uses the list of repositories from the
+ * OPML data on the gitweb server.  The completer is setup in the same
+ * way as for define_xpath_webjump, but the webjump will work without
+ * the completions.
+ */
+define_keywords("$default", "$alternative", "$opml_file", "$description");
+function define_gitweb_summary_webjump(key, base_url) {
+    keywords(arguments);
+    let alternative = arguments.$alternative;
+    let gitweb_url = base_url + "/gitweb.cgi";
+    let summary_url = gitweb_url + "?p=%s.git;a=summary";
+    let opml_url = gitweb_url + "?a=opml";
+
+    if (arguments.$default)
+        alternative = summary_url.replace("%s", arguments.$default);
+    if (!alternative)
+        alternative = gitweb_url;
+
+    var jmp = new index_webjump_gitweb(key, opml_url, arguments.$opml_file);
+    index_webjumps[key] = jmp;
+
+    define_webjump(key, summary_url,
+                   $completer = jmp.make_completer(),
+                   $alternative = alternative,
+                   $description = arguments.$description);
+}
-- 
1.6.5

_______________________________________________
Conkeror mailing list
[email protected]
https://www.mozdev.org/mailman/listinfo/conkeror

[Conkeror] [PATCH] index-webjump: New module to define webjumps for index pages.

Reply via email to