> > > Cédrick > > In principle, what you are asking for is to identify 'islands' of structured > information in a 'sea' of otherwise unstructured material, which is now a > standard pattern in PetitParser.
Exactly :) > You could imagine a parser spec of the form: > > (sea optional, (email/phone/address/....), sea optional) plus > > Where email etc are parsers for the individual structures. As a parser this > would probably lead to lots of backtracking and be hideously inefficient, but > for a short text like an e-mail it could be usable. Yes this is only for shot text like email or say a text selection + shortcut. > This also assumes that the items of interest are really structured; there > could be many ways of writing phone numbers, for instance. Phone numbers are actually not easy… I see them as a limited sequence of number (if not well structure) + eventually the +contrycode). I’d like fuzzy structuration actually, but would be perfectly ok with an initial crisp one. I find this is a nice pet project to dive into PetitParser. When you say "unstructured material ... is now a standard pattern in PetitParser », how could I begin exploring that ? Any tutorials ? Thanks Peter, Cédrick > > HTH > > Peter Kenny > > -----Original Message----- > From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Cédrick > Béler > Sent: 07 March 2019 09:52 > To: Any question about pharo is welcome <pharo-users@lists.pharo.org> > Cc: Tudor Girba <tu...@tudorgirba.com> > Subject: [Pharo-users] Parsing text to discover general data of interest > (phone, email, address, ...) > > Hi all, > > I’ve often got the need to analyse some random unstructured text to discover > (structured) information (in email for instance), to extract : > - emails > - telephone numbers > - addresses > - events > - person names (according to a list of known persons), > - etc… > > Apple do it in email for instance (strangely, this is not generalized). > > > So my questions are : > - do we have something equivalent in Smalltalk/Pharo ? (I didn’t find) > - if not, what strategy would you use ? > => I do really stupid text analysis (substrings, finding @, …, parsing > according to the text structure when there is… kind of Soup parsing…) => I > feel this is a job for PetitParser ? And would be a nice feet to the new > GToolkit. > > All ideas or suggestions are welcome ;-) > > > TIA, > > Cédrick > > > >