>>> I need to take capped section headings and change them into >>> initial or tital case. I have coding that does this. >>> However, my logic also changes acronym names such as IBM and >>> PDF into Ibm and Pdf. >>> Is there a way to exempt certain words or configurations of letters >>> without building a dictionary or lookup table or whitelist? >> >> Just thinking about other rules that might apply. I would assume that >> generally the section headings consist of multiple tokens in >> uppercase, whereas an acronym would be a single uppercase token. >> Based on that, perhaps a "Section Heading" is two or more words, of >> two or more letters each, in all caps. That should be a reasonably >> easy regex to write. I'd need to see some examples to >> flesh it out further, but you might start here: > > I took your code and ran it. Everything went as expected. > When I added one line to __DATA__, however, it did not. > > I added "Introducing PDF SOLUTIONS, INC." > > This should result in "Introducing PDF Solutions, Inc." > > Can't make it give me that result. > > Also, I changed "IBM, International Business Machines, is a > good place to be." to "IBM, INTERNATIONAL BUSINESS MACHINES, > is a good place to be." > > I got "Ibm, International Business Machines, is a good place to be." > rather than "IBM, International Business Machines, is a good > place to be."
Perhaps you could provide us with a snippet of the actual document so that we can better define the patterns. My last attempt included some significant assumptions that were obviously not applicable. About 50% of writing a good regex is simply knowing what is and isn't safe to assume about your data. For example, is it true that uppercase words that are not acronyms will always also appear in your document in mixed case? That would be an easy one to implement. Chris LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. _______________________________________________ Perl-Win32-Users mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs _______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
