FWIW Below is my attempt at disambiguating the problem. I have *some* 
confidence it is an accurate description, but assert no more. I've not had a 
reply from Federico yet, but expect one soon.

Translating a word problem into a mathematical problem is fraught with peril. 
Much of my career was spent helping people convert their word problem into a 
mathematical problem. Most of the time they immediately knew how to solve the 
problem after I asked some questions and gave them the proper mathematical 
formulation. My English lit BA degree had broad application to PhD level 
mathematics. Go figure. I'd never expected that.

>From a series of emails with Federico, I *think* this is an accurate 
>description:

Given phoneme frequencies for all of a language, e.g. Spanish, how does one 
select the best combination of repetitions of a subset of the entire lexicon to 
best match the phoneme distribution of the entire language? The goal being 
evaluating verbal intelligibility in speech communication. It looks to me to be 
a straight forward linear error minimization problem.

This seems to me a classic sparse L1 program as described in a paper by Emanuel 
Candes which he called it "The Dantzig Selector" in the early 2000s as an 
application of sparse L1 pursuits. I am acutely interested in whether that is 
correct. Not merely in the solution of Federico's problem, but my own 
understanding of sparse L1 pursuits as I learned from Focuart and Rauhut's "A 
Mathematical Introduction to Compressive Sensing." Having worked for 3 years 
without almost no one with whom to converse, I'm less than confident I have all 
the nuances correct. If I am wrong, I should very much appreciate an 
explanation. I no longer have the pleasure of doing this as an occupation, but 
money was never what motivated me.

My perception is that the solution is straight forward, but tedious to 
implement because of size. The obvious solution to me is, write a program to 
generate a GMPL file. I *think* it is a mixed integer problem, but not yet 
convinced that's the best formulation.

I am very grateful for the assistance provided by this list 8-10 years ago and 
should very much enjoy contributing something useful. GLPK is a true "tour de 
force". Many thanks to Andrew et al.

Have Fun!
Reg
    ----- Forwarded Message ----- From: Reginald Beardsley 
<[email protected]>To: Federico Miyara <[email protected]>Sent: 
Monday, May 19, 2025 at 12:10:37 PM CDTSubject: Re: Question
  Federico,

Is this a correct statement of your problem?

You have a dictionary of N words composed of various elements from a set of M 
symbols each of which has a certain number of occurrences in the N words. The 
number of symbols from M which form an word in the set N varies but is small.

You wish to determine the number of recurrences of a smaller subset of P words 
which have the same proportion of the M symbols as the entire set of N words, 
but with P << N. Further, you wish to be able to select different subsets of P 
words that all have the best matched frequency of occurrence of the M symbols 
in P as in N with Q selections from P. The particular sets of P words in each 
case being chosen independently based on other criteria unrelated to the 
frequency of occurrence of the M symbols in N.

The desired output is a list of the number of occurrences of each word in the 
set P which best approximates the number of occurrences of the M symbols in the 
set N for Q selections from the list M for the case Q >= P. 

The fundamental problem is then constructing Ax=y where y is the vector of 
probabilities of each element in M in N. The vector x is a integer valued 
number of repetitions of words in P. The hard part is creating the correct A 
matrix.

Have Fun!
Reg
     On Friday, May 16, 2025 at 04:55:34 PM CDT, Federico Miyara 
<[email protected]> wrote:  
 
 
I need to solve the following problem:

I have an alphabet of n symbols and a dictionary with N words of m
symbols (n in the order of tens, N in the order of tens of thousands, m
= 4, say)

Assuming each symbol has a definite probability, I need to generate a
list of M words (M in the order of 100) taken from the dictionary in
which the proportion of each symbol matches as best as possible the
required probability.

Is this a problem that can be solved using GLPK?

Thanks.

Bes regards,

Federico Miyara

--
Este correo electrónico ha sido analizado en busca de virus por el software 
antivirus de Avast.
www.avast.com

    

Reply via email to