>>> I need to take capped section headings and change them into
>>> initial or tital case. I have coding that does this.
>>> However, my logic also changes acronym names such as IBM and
>>> PDF into Ibm and Pdf.
>>> Is there a way to exempt certain words or configurations of letters 
>>> without building a dictionary or lookup table or whitelist?
>> 
>> Just thinking about other rules that might apply.  I would assume that 
>> generally the section headings consist of multiple tokens in 
>> uppercase, whereas an acronym would be a single uppercase token.  
>> Based on that, perhaps a "Section Heading" is two or more words, of 
>> two or more letters each, in all caps.  That should be a reasonably 
>> easy regex to write.  I'd need to see some examples to 
>> flesh it out further, but you might start here:
> 
> I took your code and ran it. Everything went as expected. 
> When I added one line to __DATA__, however, it did not.
> 
> I added "Introducing PDF SOLUTIONS, INC."
> 
> This should result in "Introducing PDF Solutions, Inc."
> 
> Can't make it give me that result.
> 
> Also, I changed "IBM, International Business Machines, is a 
> good place to be." to "IBM, INTERNATIONAL BUSINESS MACHINES, 
> is a good place to be."
> 
> I got "Ibm, International Business Machines, is a good place to be." 
> rather than "IBM, International Business Machines, is a good 
> place to be."

Perhaps you could provide us with a snippet of the actual document so that
we can better define the patterns.  My last attempt included some
significant assumptions that were obviously not applicable.  About 50% of
writing a good regex is simply knowing what is and isn't safe to assume
about your data.  For example, is it true that uppercase words that are not
acronyms will always also appear in your document in mixed case?  That would
be an easy one to implement.

Chris


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be 
privileged. It is intended for the addressee(s) only. Access to this E-mail by 
anyone else is unauthorized. If you are not an addressee, any disclosure or 
copying of the contents of this E-mail or any action taken (or not taken) in 
reliance on it is unauthorized and may be unlawful. If you are not an 
addressee, please inform the sender immediately.
_______________________________________________
Perl-Win32-Users mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to