Re: [Pharo-users] Parsing text to discover general data of interest (phone, email, address, ...)

Hernán Morales Durand Thu, 07 Mar 2019 18:35:44 -0800

Hi Cédrick,

I wrote some years ago an interface to a named-entity recognizer:
https://80738163270632.blogspot.com/2015/02/stner-interface-to-stanford-named.html


I think that was Pharo 5, so you may want to check if there are load
problems in current Pharo.

The blogger post didn't parsed correctly the output but for the input:

StSocketNERClient new
 tagText: 'Argentina President Kirchner has been asked to testify in
court on the death of Alberto Nisman the crusading prosecutor who had
accused her of conspiring to cover up involvement of Iran'


output would be:

'<location>Argentina</LOCATION> President <person>Kirchner</PERSON>
has been asked to testify in court on the death of <person>Alberto
Nisman</PERSON> the crusading prosecutor who had accused her of
conspiring to cover up involvement of <location>Iran</LOCATION>'

Cheers,

Hernán

El jue., 7 mar. 2019 a las 6:53, Cédrick Béler (<cdric...@gmail.com>) escribió:
>
> Hi all,
>
> I’ve often got the need to analyse some random unstructured text to discover 
> (structured) information (in email for instance), to extract :
> - emails
> - telephone numbers
> - addresses
> - events
> - person names (according to a list of known persons),
> - etc…
>
> Apple do it in email for instance (strangely, this is not generalized).
>
>
> So my questions are :
> - do we have something equivalent in Smalltalk/Pharo ? (I didn’t find)
> - if not, what strategy would you use ?
> => I do really stupid text analysis (substrings, finding @, …, parsing 
> according to the text structure when there is… kind of Soup parsing…)
> => I feel this is a job for PetitParser ? And would be a nice feet to the new 
> GToolkit.
>
> All ideas or suggestions are welcome ;-)
>
>
> TIA,
>
> Cédrick
>
>
>

Re: [Pharo-users] Parsing text to discover general data of interest (phone, email, address, ...)

Reply via email to