On Fri, 16 May 2025, Federico Miyara wrote:

I have an alphabet of n symbols and a dictionary with N words of m
symbols (n in the order of tens, N in the order of tens of thousands, m
= 4, say)

Assuming each symbol has a definite probability, I need to generate a
list of M words (M in the order of 100) taken from the dictionary in
which the proportion of each symbol matches as best as possible the
required probability.

Is this a problem that can be solved using GLPK?

Probably.
Depends on your cost function.

Your primary variables will be N bits, each of which is one
if the corresponding word is selected, zero otherwise.
You will likely want auxillary variables, e.g.,
count[j] = SUM selected[k]*contains[j, k]
            k
where content[j, k] is the number of times symbol j is contained in word k.

My recollection is that GLPK can handle quadratic cost functions.
One such is probably good enough for what you want,
otherwise you might have to use piecewise linear.

Whether GLPK can handle a problem of that size,
I do not know.

--
Michael   [email protected]
"Those parts of the system that you can hit with a hammer (not advised)
are called Hardware;  those program instructions that you can only
curse at are called Software."  --  Mi Neaugh Gno

Reply via email to