McBooCzech wrote:

Tim,

do you think Ferbel can parse properly with non English data-sets?

The official name for the project is "Febrl" (freely-extensible biomedical record linkage) but perhaps "Furball" would be better name, given its focus on fuzziness (if that is not a contradiction in terms). My cat could cough up a suitable graphical logo, no doubt.

I
mean do you think it will work properly with data they include non
English characters as well? As we live in Europe, we have to solve such
a problems here :)))) If the software needs some changes, I am ready,
according to your suggestions, to try to do it.


The underlying method (hidden Markov models) should work with any kind of data. If the non-English characters can be read as normal Python strings then it will definitely work. If you need to use Unicode strings, it may work, but that is not something we have tested - although it is on the TO-DO list. Peter Christen could comment further on that (off-line). You would need to supply your own tagging look-up tables of course, and bootstrap your own models as it unlikely that the sample models for Australian addresses will work well for Czech addresses. But none of that should be too hard.

Tim C

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to