On Tuesday, December 2, 2014, mongoose [via PostgreSQL] <
ml-node+s1045698n5829030...@n5.nabble.com> wrote:

> David,
>
> Thank you for your prompt reply. I believe your answer helped a lot but it
> seems I was not clear enough on my description. Basically I want a counter
> (id) to show if two or more names are similar (i.e. levenshtein distance
> less than 3) So in the previous example:
>
> From this table:
>
> Name, City
> "Booob", "NYC"
> "Alex", "Washington"
> "Alexj2", "Washington"
> "Bob", "NYC"
> "Aleex1", "Washington"
>
> to get this table:
>
> id, Name, City
> 1,"Alex", "Washington"
> 1,"Aleex1", "Washington"
> 1,"Alexj2", "Washington"
> 2,"Bob", "NYC"
> 2,"Booob", "NYC"
>
> So basically the id is a counter that starts from "1" and increments only
> when there is a different name. Please notice that the table has its names
> in a completely random order.
>
>
Write and combine a few subqueries that use window functions (namely lag
and row_number) to identify groups, label them, and assign rows to each
group (using a between condition on a join)

Pondering some (not tested) if you identify the boundary records in a
subquery you can assign them a value of 1 while all others take on null.
In the outer query you should be able to assign groups by simply
applying the sum function over the entire result such that at each boundary
value the presence of the 1 will increment the sum while the null rows will
use the sum value from the prior row.

David J.




--
View this message in context: 
http://postgresql.nabble.com/Merge-rows-based-on-Levenshtein-distance-tp5828841p5829041.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Reply via email to