Case tagging and python

Fred Mangusta Thu, 31 Jul 2008 04:06:36 -0700

Hi,

I'm relatively new to programming in general, and totally new to python,
and I've been told that this language is particularly good for what I
need to do. Let me explain.
I have a large corpus of English text, in the form of several files.


First of all I would like to scan each file. Then, for each word I find,
I'd like to examine its case status, and write the (lower case) word back

to another text file - with, appended, a tag stating the case it had inthe original file.


An example. Suppose we have three possible "case conditions"
-all lowercase
-all uppercase
-initial uppercase only

Three corresponding tags for each of these might be, respectively:
-nocap
-allcaps
-cap

Therefore, given the string

"The Chairman of BP was asleep"

I would like to produce

"the/cap chairman/cap of/nocap /bp/allcaps was/nocap /asleep/nocap"

and writing this into a file.


I have the following algorithm in mind:

-open input file
-open output file
-get line of text
        -split line into words
        -for each word
                -tag = checkCase(word)
                -newword = lowercase(word) + append(tag)
        rejoin words into line
        write line into output file

Now, I managed to write the following initial code

   for s in file:
        lines += 1
        if lines % 1000 == 0:
            print '%d lines' % We print the total lines
        sent = s.split() #split string by spaces
#...

But then I don't quite know what would be the fastest/best way to dothis. Could I use the join function to reform the string? And, regardingthe casetest() function, what do you suggest to do? Should I test eachcharacter of each word or there are faster methods?


Thanks very much,

F.



--
http://mail.python.org/mailman/listinfo/python-list

Case tagging and python

Reply via email to