McBooCzech wrote:
Tim,
do you think Ferbel can parse properly with non English data-sets?
The official name for the project is "Febrl" (freely-extensible
biomedical record linkage) but perhaps "Furball" would be better name,
given its focus on fuzziness (if that is not a contradiction in terms).
My cat could cough up a suitable graphical logo, no doubt.
I
mean do you think it will work properly with data they include non
English characters as well? As we live in Europe, we have to solve such
a problems here :)))) If the software needs some changes, I am ready,
according to your suggestions, to try to do it.
The underlying method (hidden Markov models) should work with any kind
of data. If the non-English characters can be read as normal Python
strings then it will definitely work. If you need to use Unicode
strings, it may work, but that is not something we have tested -
although it is on the TO-DO list. Peter Christen could comment further
on that (off-line). You would need to supply your own tagging look-up
tables of course, and bootstrap your own models as it unlikely that the
sample models for Australian addresses will work well for Czech
addresses. But none of that should be too hard.
Tim C
--
http://mail.python.org/mailman/listinfo/python-list