I’ve worked out how to optimize my process that indexes DITA topics based on 
what top-level maps they are ultimately used from (turned out I needed to first 
index the maps in ref count order from least to most, which meant I could then 
just look up the top-level maps used by any direct-reference maps that 
reference a given topic—with that in place each topic only requires a single 
index lookup).

However, on my laptop these lookups still take about 0.1 second/topic so for 
1000s of topics it’s a long time (relatively speaking).

But the topic index process is 100% parallelizable, so I would be able to have 
at least 2 or 3 ingestion threads going on my 4-CPU server machine.

Note that my ingestion process is two-phased:

Phase 1: Construct an XQuery map with the index details for the input topics 
(the topics already exist in the database, only the index is new).
Phrase 2: Persist the map to the database as XML elements.

I do the map construction in order to both take advantage of map:merge() and 
because it’s the only way I can do indexing of the DITA maps and topics in one 
transaction: build the doc-to-root-map for the DITA maps and then use that data 
to build the doc-to-root-map entries for all the topics, then persist the lot 
to the database for future use. This is in the context of a one-time mass load 
of content from a new git work tree. Subsequent changes to the content database 
will be on individual files and the index can be easily updated incrementally.

So I’m just trying to optimize the startup time so that it doesn’t take two 
hours to load and index our typical content set.

I can also try to optimize the low-level operations, although they’re pretty 
simple so I don’t see much opportunity for significant improvement, but I also 
haven’t had time to try different options and measure them.

I must also say how useful the built-in unit testing framework is—that’s really 
made this work easier.

Cheers,

Eliot


_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> | 
Twitter<https://twitter.com/servicenow> | 
YouTube<https://www.youtube.com/user/servicenowinc> | 
Facebook<https://www.facebook.com/servicenow>

Reply via email to