On 2022-Dec-23, Alvaro Herrera wrote:

> I wonder why do you have it return the multiple alternative codes as a
> space-separated string.  Maybe an array would be more appropriate.  Even
> on your documented example use, the first thing you do is split it on
> spaces.

I tried downloading a list of surnames from here
https://www.bibliotecadenombres.com/apellidos/apellidos-espanoles/
pasted that in a text file and \copy'ed it into a table.  Then I ran
this query

select string_agg(a, ' ' order by a), daitch_mokotoff(a), count(*)
from apellidos
group by daitch_mokotoff(a)
order by count(*) desc;

so I have a first entry like this

string_agg      │ Balasco Balles Belasco Belles Blas Blasco Fallas Feliz Palos 
Pelaez Plaza Valles Vallez Velasco Velez Veliz Veloz Villas
daitch_mokotoff │ 784000
count           │ 18

but then I have a bunch of other entries with the same code 784000 as
alternative codes,

string_agg      │ Velazco
daitch_mokotoff │ 784500 784000
count           │ 1

string_agg      │ Palacio
daitch_mokotoff │ 785000 784000
count           │ 1

I suppose I need to group these together somehow, and it would make more
sense to do that if the values were arrays.


If I scroll a bit further down and choose, say, 794000 (a relatively
popular one), then I have this

string_agg      │ Barraza Barrios Barros Bras Ferraz Frias Frisco Parras Peraza 
Peres Perez Porras Varas Veras
daitch_mokotoff │ 794000
count           │ 14

and looking for that code in the result I also get these three

string_agg      │ Barca Barco Parco
daitch_mokotoff │ 795000 794000
count           │ 3

string_agg      │ Borja
daitch_mokotoff │ 790000 794000
count           │ 1

string_agg      │ Borjas
daitch_mokotoff │ 794000 794400
count           │ 1

and then I see that I should also search for possible matches in codes
795000, 790000 and 794400, so that gives me

string_agg      │ Baria Baro Barrio Barro Berra Borra Feria Para Parra Perea 
Vera
daitch_mokotoff │ 790000
count           │ 11

string_agg      │ Barriga Borge Borrego Burgo Fraga
daitch_mokotoff │ 795000
count           │ 5

string_agg      │ Borjas
daitch_mokotoff │ 794000 794400
count           │ 1

which look closely related (compare "Veras" in the first to "Vera" in
the later set.  If you ignore that pseudo-match, you're likely to miss
possible family relationships.)


I suppose if I were a genealogy researcher, I would be helped by having
each of these codes behave as a separate unit, rather than me having to
split the string into the several possible contained values.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees."                                      (E. Dijkstra)


Reply via email to