On Aug 27, 12:59 pm, RyanL <[EMAIL PROTECTED]> wrote: > I'm a newbie! I have a non-delimited data file that I'd like to > convert to delimited. > > Example... > Line in non-delimited file: > 0139725635999992000010100534+42050-102800FM-15+1198KAIA > > Should be: > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA > > What is the best way to go about this? I've looked all over for > examples, help, suggestions, but have not found much. CSV module > doesn't seem to do exactly what I want. Maybe I'm just missing > something or not using the correct terminology in my searches. Any > assistance is greatly appreaciated! Using Python 2.4
I'm guessing that these lines *aren't* fixed-length, especially those signed integer fields. I used the patented Paul McGuire CrystalBall module to come up with this pyparsing rendition. (OP may adjust to suit.) -- Paul data = "0139725635999992000010100534+42050-102800FM-15+1198KAIA" """to be parsed as: 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA""" from pyparsing import * import time def convertTimeStamp(t): t["date"] = map(int,t.date) t["time"] = map(int,t.time) return time.strftime("%Y-%m-%dT%H:%M", tuple(t.date)+tuple(t.time)+(0,0,0,0)) yearMonthDay = Word(nums,exact=4) + Word(nums,exact=2) + Word(nums,exact=2) hourMinuteSecond = Word(nums,exact=2) + Word(nums,exact=2) timestamp = ( yearMonthDay("date") + hourMinuteSecond("time") ) timestamp.setParseAction(convertTimeStamp) signedInteger = Word("+-",nums) fieldA = Word(nums,exact=4)("A") fieldB = Word(nums,exact=6)("B") fieldC = Word(nums,exact=5)("C") fieldD = timestamp("timestamp") fieldE = Word(nums)("E") fieldF = signedInteger("latitude").setParseAction(lambda t : int(t[0])/ 1000.0) fieldG = signedInteger("longitude").setParseAction(lambda t : int(t[0])/1000.0) fieldH = Combine(Word(alphas,exact=2) + "-" + Word(nums,exact=2))("H") fieldI = signedInteger("I") fieldJ = Word(alphas)("J") dataFields = fieldA + fieldB + fieldC + fieldD + fieldE + \ fieldF + fieldG + fieldH + fieldI + fieldJ res = dataFields.parseString(data) print res.dump() prints: ['0139', '725635', '99999', '2000-01-01T00:53', '4', 42.049999999999997, -102.8, 'FM-15', '+1198', 'KAIA'] - A: 0139 - B: 725635 - C: 99999 - E: 4 - H: FM-15 - I: +1198 - J: KAIA - latitude: 42.05 - longitude: -102.8 - timestamp: 2000-01-01T00:53 -- http://mail.python.org/mailman/listinfo/python-list