On Mon, Mar 3, 2008 at 11:51 AM, Richard Quadling
<[EMAIL PROTECTED]> wrote:
> Hi all.
>
>  Is there any chance someone could detail what the indexer has to
>  provide and how it would be accessed/used?

The indexer should gather as much (format-irrelevant) info as it possibly can:
  - Map xml:id to chunks (i.e. filenames and anchors)
    - Storing its title and description
(<title><refname><refpurpose><refdescription><titleabbrev>...)
    - Store preferred xreflabel (<example xreflabel="List of all the
cool stuff" xml:id="foo"><title>Usage:</title>...</example>)
    - [preferably] If a title/description contains markup then mark it
specifically so the format renderer can render it correctly
    - Storing the element name (makes it possible to generate an
appendix for all <example>s for instance)
  - Map chunks to parents (i.e. "up")
  - Map chunks to siblings (i.e. "previous" and "next")
  - Map the first child to parent (i.e. "previous")
  - Map the last child to its parent next sibling (i.e. "next")

Thats all I can remember for now

>
>  I would like to have a go.
>
>  A few additional questions.
>
>  1 - What should the index be stored in?

Preferably SQLite database.


>  2 - Assuming an unchanged .manual.xml, does this mean no need to
>  re-index? - A hash (md5?) rather than datetime?

Only on user-request (or empty database) the indexer should be executed.

>  3 - At some stage are we expecting to not need .manual.xml and

Yes

>  therefore the indexer will have to work off the dom loaded xml?

No. PhD gets the root document. If that document referrers to
"children documents" it is automatically expanded (entity magic) or
XIncluded.

(Yes. That means the PhD does work with "manual.xml" (setting the
right properties during XMLReader->load()) and always has. it is
however much slower - and since we do it during validation (php
configure.php) anyway then why not use the already expanded tree?)

>  4 - How would you determine that reindexing is not required from the
>  dom loaded xml? (Don't think I can, but an md5 of the loaded xml would
>  probably be enough though?

The user knows best if his data should be reindexed or not. PhD should
not pretend to know better.

-Hannes

Reply via email to