FWIW Below is my attempt at disambiguating the problem. I have *some*
confidence it is an accurate description, but assert no more. I've not had a
reply from Federico yet, but expect one soon.
Translating a word problem into a mathematical problem is fraught with peril.
Much of my career was spent helping people convert their word problem into a
mathematical problem. Most of the time they immediately knew how to solve the
problem after I asked some questions and gave them the proper mathematical
formulation. My English lit BA degree had broad application to PhD level
mathematics. Go figure. I'd never expected that.
>From a series of emails with Federico, I *think* this is an accurate
>description:
Given phoneme frequencies for all of a language, e.g. Spanish, how does one
select the best combination of repetitions of a subset of the entire lexicon to
best match the phoneme distribution of the entire language? The goal being
evaluating verbal intelligibility in speech communication. It looks to me to be
a straight forward linear error minimization problem.
This seems to me a classic sparse L1 program as described in a paper by Emanuel
Candes which he called it "The Dantzig Selector" in the early 2000s as an
application of sparse L1 pursuits. I am acutely interested in whether that is
correct. Not merely in the solution of Federico's problem, but my own
understanding of sparse L1 pursuits as I learned from Focuart and Rauhut's "A
Mathematical Introduction to Compressive Sensing." Having worked for 3 years
without almost no one with whom to converse, I'm less than confident I have all
the nuances correct. If I am wrong, I should very much appreciate an
explanation. I no longer have the pleasure of doing this as an occupation, but
money was never what motivated me.
My perception is that the solution is straight forward, but tedious to
implement because of size. The obvious solution to me is, write a program to
generate a GMPL file. I *think* it is a mixed integer problem, but not yet
convinced that's the best formulation.
I am very grateful for the assistance provided by this list 8-10 years ago and
should very much enjoy contributing something useful. GLPK is a true "tour de
force". Many thanks to Andrew et al.
Have Fun!
Reg
----- Forwarded Message ----- From: Reginald Beardsley
<[email protected]>To: Federico Miyara <[email protected]>Sent:
Monday, May 19, 2025 at 12:10:37 PM CDTSubject: Re: Question
Federico,
Is this a correct statement of your problem?
You have a dictionary of N words composed of various elements from a set of M
symbols each of which has a certain number of occurrences in the N words. The
number of symbols from M which form an word in the set N varies but is small.
You wish to determine the number of recurrences of a smaller subset of P words
which have the same proportion of the M symbols as the entire set of N words,
but with P << N. Further, you wish to be able to select different subsets of P
words that all have the best matched frequency of occurrence of the M symbols
in P as in N with Q selections from P. The particular sets of P words in each
case being chosen independently based on other criteria unrelated to the
frequency of occurrence of the M symbols in N.
The desired output is a list of the number of occurrences of each word in the
set P which best approximates the number of occurrences of the M symbols in the
set N for Q selections from the list M for the case Q >= P.
The fundamental problem is then constructing Ax=y where y is the vector of
probabilities of each element in M in N. The vector x is a integer valued
number of repetitions of words in P. The hard part is creating the correct A
matrix.
Have Fun!
Reg
On Friday, May 16, 2025 at 04:55:34 PM CDT, Federico Miyara
<[email protected]> wrote:
I need to solve the following problem:
I have an alphabet of n symbols and a dictionary with N words of m
symbols (n in the order of tens, N in the order of tens of thousands, m
= 4, say)
Assuming each symbol has a definite probability, I need to generate a
list of M words (M in the order of 100) taken from the dictionary in
which the proportion of each symbol matches as best as possible the
required probability.
Is this a problem that can be solved using GLPK?
Thanks.
Bes regards,
Federico Miyara
--
Este correo electrónico ha sido analizado en busca de virus por el software
antivirus de Avast.
www.avast.com