[NTG-context] Re: Indexing with Luacode: an issue

Jean-Pierre Delange via ntg-context Tue, 30 Sep 2025 16:01:58 -0700

Dear List,

I am responding to myself for having found a solution. If there is abetter one, I would be happy to hear about it.

I finally understood what was wrong with the method I was using with Luacode, which was supposed to allow me to create a double index (theprevious method was 'input.processors').

I realized that I had to use a “buffer” after defining an ‘autoindex’ inLua (without using the “input.processors” method) and also defining thetwo registers that correspond to the two types of indexes (nominum,rerum). The Lua code allows the items to be indexed to be extracted,while the text is encapsulated by chapters and sections in a buffer.

Naturally, the two files ‘nomen’ and ‘rerum’ (which are *.txt files inUTF-8) must be fed with the terms that we want to see in the indexes (Iam writing this for complete beginners). Here below is the MWE code.


Best//JP

\setuplayout[width=middle, backspace=20mm, topspace=15mm, height=middle]

\setupinteraction[state=start]

\mainlanguage[en]


% === Registers ===

\defineregister[nomen]

\defineregister[rerum]

\setupregister[nomen][indicator=yes,balance=yes,compress=yes,state=stop]

\setupregister[rerum][indicator=yes,balance=yes,compress=yes,state=stop]


% === Auto-index (Lua): buffer processing, WITHOUT input.processors ====

\startluacode

local function loadlist(name)

local t={}

local found=(resolvers and resolvers.findfile andresolvers.findfile(name)) or name


local f=io.open(found,“r”); if not f then return t end

for line in f:lines() do

local s=line:gsub(“^%s+”,“”):gsub(“%s+$”,“”)

if s~=‘’ and s:sub(1,1)~=“#” then

local k,d=s:match(“^(.-)|(.+)$”); if not k then k,d=s,s end

t[#t+1]={key=k,disp=d}

end

end

f:close()

table.sort(t,function(a,b) return #a.disp>#b.disp end)

return t

end

local NOMEN=loadlist(“nomen-list.txt”)

local RERUM=loadlist(“rerum-list.txt”)

local function escpat(s)

local p=s:gsub(“([%%%^%$%*%(%)%+%-%[%]%?%.])”,“%%%1”)

p=p:gsub(“‘”, “[’’']”) -- ‘ or ’

p=p:gsub(“’”, “[’’']”)

p=p:gsub(“%%%-”,“[-–—]”) -- -, – or —

return p

end

% ===gsub function allows you to take into account the numerousdiacritical marks in French (apostrophes, accented vowels, etc.) ==


local function add_entries(str, list, reg)

for i=1,#list do

local key=list[i].key

local disp=list[i].disp

if disp and disp~=“” then

local pat=escpat(disp)

-- gsub function: we REGISTER the entry, we return the text unchanged

str = str:gsub(“(”. .pat..“)”, function(hit)

context.setregisterentry({reg}, { [‘keys’]=key, [“entries”]=disp })

return hit

end)

end

end

return str

end

userdata = userdata or {}

function userdata.typeset_with_autoindex (bufname)

local s = buffers.getcontent(bufname) or “”

s = add_entries(s, NOMEN, “nomen”)

s = add_entries(s, RERUM, “rerum”)

context(s) -- typeset the processed text, AT THE RIGHT TIME

end

\stopluacode


\starttext

\setupregister[nomen][state=start]

\setupregister[rerum][state=start]


% --- Example content (in a buffer) ---------------------------------------

\startbuffer[autotext]

… Après avoir été le maître d’Augustin d’Hippone, vraisemblablementavant son séjour à Cassiciacum (Ambroise de Milan). Augustin était passépar la philosophie (il avoue dans les {\em Confessions}) avoir lu les\quote{libri platonicorum}.

La réponse contemporaine est donnée par Émile Bréhier, protagoniste dudébat avec Étienne Gilson.

La {\em théologie} des premiers siècles a trouvé chez les stoïciens levocabulaire pour penser la grâce.


\stopbuffer


% Processing + composition of the buffer

\ctxlua{userdata.typeset_with_autoindex(“autotext”)}

\page

\startbackmatter

\startchapter[title={\sc Index}]

\startsection[title={{\em Index nominum}}]

\placeregister[nomen][columns=2,balance=yes,criterium=all,title={{Indexnominum}}]


\stopsection

\page

\startsection[title={{\em Index rerum}}]

\placeregister[rerum][columns=2,balance=yes,criterium=all,title={{Indexrerum}}]


\stopsection

\stopchapter

\stopbackmatter

\stoptext




Le 30/09/2025 à 21:33, Jean-Pierre Delange via ntg-context a écrit :


Sorry! The message was sent too early! Here is the MWE :


\setuplayout[width=middle, backspace=20mm, topspace=15mm, height=middle]

\setupinteraction[state=start]

\setuplanguage[fr][patterns={fr,agr}]

\mainlanguage[fr]

\setcharacterspacing[frenchpunctuation]

% --- Registres (définition + style + à l’arrêt au préambule)--------------


\defineregister[nomen]

\defineregister[rerum]

\setupregister[nomen][indicator=yes, balance=yes, compress=yes,state=stop]

\setupregister[rerum][indicator=yes, balance=yes, compress=yes,state=stop]

% === Auto-index (Lua) : détection & \setregisterentry======================


\startluacode

local collecting = false

local in_lua = false


-- Charger une liste "clé|affichage" (ou "terme" = clé=affichage)

local function loadlist(name)

local t = {}

local found = (resolvers and resolvers.findfile andresolvers.findfile(name)) or name


local f = io.open(found, "r")

if not f then return t end

for line in f:lines() do

local s = line:gsub("^%s+", ""):gsub("%s+$", "")

if s ~= "" and s:sub(1,1) ~= "#" then

local k, d = s:match("^(.-)|(.+)$"); if not k then k, d = s, s end

t[#t+1] = { key = k, disp = d }

end

end

f:close()

-- matcher d'abord les plus longs pour éviter les recouvrements

table.sort(t, function(a,b) return #a.disp > #b.disp end)

return t

end


local NOMEN = loadlist("nomen-list.txt")

local RERUM = loadlist("rerum-list.txt")


-- Échapper un motif Lua + tolérer quelques variantes

local function esc_pattern(s)

local p = s:gsub("([%%%^%$%*%(%)%+%-%[%]%?%.])","%%%1")

p = p:gsub("'", "['’']") -- apostrophes

p = p:gsub("’", "['’']")

p = p:gsub("%%%-", "[-–—]") -- tirets courts/longs

return p

end


-- Injecter une entrée \setregisterentry à chaque occurrence trouvée

local function add_entries(buf, list, reg)

for i=1,#list do

local key = list[i].key

local disp = list[i].disp

if disp and disp ~= "" then

local pat = esc_pattern(disp)

local cmd ="\\setregisterentry["..reg.."][keys:1={"..key.."}][entries:1={"..disp.."}]"


-- n'imprime rien : \setregisterentry ne produit pas de texte

buf = buf:gsub("("..pat..")", "%1"..cmd)

-- Debug (facultatif) :

-- local n; buf, n = buf:gsub("("..pat..")", "%1"..cmd); if n>0 thenlogs.report("autoindex","%s: %s -> %d", reg, disp, n) end


end

end

return buf

end

-- Filtre (processeur d’entrée) : travaille seulement entre\starttext...\stoptext et hors \start/stopluacode


local function filter(buf)

if buf:find("\\startluacode") then in_lua = true end

if buf:find("\\stopluacode") then in_lua = false end

if buf:find("\\starttext") then collecting = true end

if buf:find("\\stoptext") then collecting = false end

if not collecting or in_lua then return buf end

buf = add_entries(buf, NOMEN, "nomen")

buf = add_entries(buf, RERUM, "rerum")

return buf

end


-- Enregistrement du processeur (LMTX)

if input and input.processors and input.processors.add then

input.processors.add("autoindex", filter)

end

\stopluacode


% Activer le processeur (sinon, il ne tournera pas)

\enabledirectives[input.processors=autoindex]


\starttext

% Démarre l’accumulation des entrées à partir d’ici

\setupregister[nomen][state=start]

\setupregister[rerum][state=start]


\subject{Paragraph 1 (names : Bréhier, Cassiciacum, Augustin)}

Qui plus est, on peut ajouter qu'il (Cicéron) a été, comme Virgile,l'un des {\em éducateurs} de l'Europe,

après avoir été le maître d'Augustin d'Hippone — lequel l'a lu trèsjeune, vraisemblablement

avant son séjour à Cassiciacum, entouré de sa famille et de ses amis.En second lieu, la réponse

contemporaine est donnée par Émile Bréhier, historien de laphilosophie, protagoniste du débat


qu'il engagea avec Étienne Gilson au début du XX\high{e} siècle.


\blank[2*big]

\subject{Paragraph 2 (notions : hérétiques, stoïciens, théologie ; nom: Thomas d'Aquin)}

… la {\em théologie} chrétienne des premiers siècles a trouvé chez lespenseurs aristotéliciens et

stoïciens (stoïciens) le vocabulaire pour penser la grâce, la nature,la vertu. Cela pourrait s'entendre

si, par exemple, Thomas d'Aquin avait été le précurseur de «l'aristotélisme chrétien »…



\page

\startbackmatter

\startchapter[title={\sc Index}]

\startsection[title={{\em Index nominum}}]

\placeregister[nomen][columns=2,balance=yes,criterium=all,title={{Indexnominum}}]


\stopsection

\page

\startsection[title={{\em Index rerum}}]

\placeregister[rerum][columns=2,balance=yes,criterium=all,title={{Indexrerum}}]


\stopsection

\stopchapter

\stopbackmatter


\stoptext


Le 30/09/2025 à 21:24, Jean-Pierre Delange via ntg-context a écrit :


Hello everyone,

I am trying to automatically build *two indexes* (nominum & rerum)from two lists (|nomen-list.txt| and |rerum-list.txt|) *withouthaving to* manually tag the text.


  * *ConTeXt LMTX* (ConTeXt Process Management 1.06
  * mtx-context | current version: 2025.07.27 21:43
  * OS: Windows
  * Compile commands:

|context --purgeall MWE_setregisterentry.tex|

|context MWE_setregisterentry.tex|

Approach:

  * I use *|input.processors.add(“autoindex”, filter)|* +
    |\enabledirectives[input.processors=autoindex]| (no
    |callbacks.register|).
  * In the processor, I search for *literal* occurrences (accents OK;
    |'|/|’| and |-|/|–|/|—| tolerated).
  * Instead of injecting the classic command |\index[...]|, I call
    *|\setregisterentry|* to properly register the entry (key/display).
  * The registers are defined in the preamble, |state=stop| then
    *|state=start| just after |\starttext|*, and printed with
    |\placeregister[...]| in the backmatter.

Problem:

  * The MWE *compiles perfectly* (PDF produced, pages OK) but *both
    indexes remain empty*.
  * I wonder if the processor injects the command during a *preroll*
    (or “trial typesetting”), which would cause the entry to be
    “lost.” Is there a recommended *flag* (e.g., |\ifprerolling| or
    equivalent) to *neutralize* |\setregisterentry| during prerolls
    and execute it *only at the right time*?

Questions:

 1. Is the approach via |input.processors.add| +
    |\enabledirectives[input.processors=autoindex]| the *right* one
    for LMTX (rather than the old callbacks)?
 2. Is there a *recommended method* for calling |\setregisterentry|
    from an input processor in a *reliable* way?
 3. Is the signature
    |\setregisterentry[reg][keys={...}][entries={...}]| preferable to
    the /n/-level form (|keys:1|, |entries:1|) in this use case?
 4. Would you recommend *anchoring* detection on *short units* (e.g.,
    “Aquin,” “Augustin”) and then forcing the key/display for sorting
    (|Aquin, Thomas|Thomas d'Aquin|)? I know: French diacritical
    marks, apostrophes, and other characters add an additional
    constraint to the indexing of proper names and nouns (see the
    word “aujourd'hui” and others).

Thank you in advance for your insights!


___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist :[email protected] 
/https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  :https://www.pragma-ade.nl /https://context.aanhet.net (mirror)
archive  :https://github.com/contextgarden/context
wiki     :https://wiki.contextgarden.net
___________________________________________________________________________________


___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist :[email protected] 
/https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  :https://www.pragma-ade.nl /https://context.aanhet.net (mirror)
archive  :https://github.com/contextgarden/context
wiki     :https://wiki.contextgarden.net
___________________________________________________________________________________

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : [email protected] / 
https://mailman.ntg.nl/mailman3/lists/ntg-context.ntg.nl
webpage  : https://www.pragma-ade.nl / https://context.aanhet.net (mirror)
archive  : https://github.com/contextgarden/context
wiki     : https://wiki.contextgarden.net
___________________________________________________________________________________

[NTG-context] Re: Indexing with Luacode: an issue

Reply via email to