In analysing a very big application (pysol) made of almost 100 sources, I had the need to remove comments.
Removing the comments which take all the line is straightforward... Instead for the embedded comments I used the tokenize module. To my surprise the analysed output is different from the input (the last tuple element should exactly replicate the input line) The error comes out in correspondance of a triple string. I don't know if this has already been corrected (I use Python 2.3) or perhaps is a mistake on my part... Next you find the script I use to replicate the strange behaviour: import tokenize Input = "pippo1" Output = "pippo2" f = open(Input) fOut=open(Output,"w") nLastLine=0 for i in tokenize.generate_tokens(f.readline): . if nLastLine != (i[2])[0]: # the 3rd element of the tuple is . . nLastLine = (i[2])[0] # (startingRow, startingCol) . . fOut.write(i[4]) f.close() fOut.close() The file to be used (pippo1) contains an extract: class SelectDialogTreeData: . img = None . def __init__(self): . . self.tree_xview = (0.0, 1.0) . . self.tree_yview = (0.0, 1.0) . . if self.img is None: . . . SelectDialogTreeData.img = (makeImage(dither=0, data=""" R0lGODlhEAAOAPIFAAAAAICAgMDAwP//AP///4AAAAAAAAAAACH5BAEAAAUALAAAAAAQAA4AAAOL WLrcGxA6FoYYYoRZwhCDMAhDFCkBoa6sGgBFQAzCIAzCIAzCEACFAEEwEAwEA8FAMBAEAIUAYSAY CAaCgWAgGAQAhQBBMBAMBAPBQDAQBACFAGEgGAgGgoFgIBgEAAUBBAIDAgMCAwIDAgMCAQAFAQQD AgMCAwIDAgMCAwEABSaiogAKAKeoqakFCQA7"""), makeImage(dither=0, data=""" R0lGODlhEAAOAPIFAAAAAICAgMDAwP//AP///4AAAAAAAAAAACH5BAEAAAUALAAAAAAQAA4AAAN3 WLrcHBA6Foi1YZZAxBCDQESREhCDMAiDcFkBUASEMAiDMAiDMAgBAGlIGgQAgZeSEAAIAoAAQTAQ DAQDwUAwAEAAhQBBMBAMBAPBQBAABACFAGEgGAgGgoFgIAAEAAoBBAMCAwIDAgMCAwEAAApERI4L jpWWlgkAOw=="""), makeImage(dither=0, data=""" R0lGODdhEAAOAPIAAAAAAAAAgICAgMDAwP///wAAAAAAAAAAACwAAAAAEAAOAAADTii63DowyiiA GCHrnQUQAxcQAAEQgAAIg+MCwkDMdD0LgDDUQG8LAMGg1gPYBADBgFbs1QQAwYDWBNQEAMHABrAR BADBwOsVAFzoqlqdAAA7"""), makeImage(dither=0, data=""" R0lGODdhEAAOAPIAAAAAAAAAgICAgMDAwP8AAP///wAAAAAAACwAAAAAEAAOAAADVCi63DowyiiA GCHrnQUQAxcUQAEUgAAIg+MCwlDMdD0LgDDQBE3UAoBgUCMUCDYBQDCwEWwFAUAwqBEKBJsAIBjQ CDRCTQAQDKBQAcDFBrjf8Lg7AQA7""")) The output of tokenize (pippo2) gives instead: class SelectDialogTreeData: . img = None . def __init__(self): . . self.tree_xview = (0.0, 1.0) . . self.tree_yview = (0.0, 1.0) . . if self.img is None: . . . SelectDialogTreeData.img = (makeImage(dither=0, data=""" AgMCAwIDAgMCAwEABSaiogAKAKeoqakFCQA7"""), makeImage(dither=0, data=""" jpWWlgkAOw=="""), makeImage(dither=0, data=""" BADBwOsVAFzoqlqdAAA7"""), makeImage(dither=0, data=""" CDRCTQAQDKBQAcDFBrjf8Lg7AQA7""")) ... with a big difference! Why? -- http://mail.python.org/mailman/listinfo/python-list