> 
> 
> Cédrick
> 
> In principle, what you are asking for is to identify 'islands' of structured 
> information in a 'sea' of otherwise unstructured material, which is now a 
> standard pattern in PetitParser.

Exactly :)


> You could imagine a parser spec of the form:
> 
> (sea optional, (email/phone/address/....), sea optional) plus
> 
> Where email etc are parsers for the individual structures. As a parser this 
> would probably lead to lots of backtracking and be hideously inefficient, but 
> for a short text like an e-mail it could be usable.

Yes this is only for shot text like email or say a text selection + shortcut.


> This also assumes that the items of interest are really structured; there 
> could be many ways of writing phone numbers, for instance.

Phone numbers are actually not easy… I see them as a limited sequence of number 
(if not well structure) + eventually the +contrycode).
I’d like fuzzy structuration actually, but would be perfectly ok with an 
initial crisp one.

I find this is a nice pet project to dive into PetitParser. When you say  
"unstructured material ... is now a standard pattern in PetitParser », how 
could I begin exploring that ? Any tutorials ?


Thanks Peter,

Cédrick

> 
> HTH
> 
> Peter Kenny
> 
> -----Original Message-----
> From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Cédrick 
> Béler
> Sent: 07 March 2019 09:52
> To: Any question about pharo is welcome <pharo-users@lists.pharo.org>
> Cc: Tudor Girba <tu...@tudorgirba.com>
> Subject: [Pharo-users] Parsing text to discover general data of interest 
> (phone, email, address, ...)
> 
> Hi all,
> 
> I’ve often got the need to analyse some random unstructured text to discover 
> (structured) information (in email for instance), to extract :
> - emails
> - telephone numbers
> - addresses
> - events
> - person names (according to a list of known persons),
> - etc… 
> 
> Apple do it in email for instance (strangely, this is not generalized).
> 
> 
> So my questions are :
> - do we have something equivalent in Smalltalk/Pharo ? (I didn’t find)
> - if not, what strategy would you use ?
> => I do really stupid text analysis (substrings, finding @, …, parsing 
> according to the text structure when there is… kind of Soup parsing…) => I 
> feel this is a job for PetitParser ? And would be a nice feet to the new 
> GToolkit.
> 
> All ideas or suggestions are welcome ;-)
> 
> 
> TIA,
> 
> Cédrick 
> 
> 
> 
> 


Reply via email to