Re: problem with strptime and time zone
On Aug 24, 4:16 pm, Alex Willmer wrote: > On Aug 24, 9:45 pm, m_ahlenius wrote: > > > > > whereas this fails: > > myStrA = 'Sun Aug 22 19:03:06 PDT' > > gTimeA = strptime( myStrA, '%a %b %d %H:%M:%S %Z') > > print "gTimeA = ",gTimeA > > > ValueError: time data 'Sun Aug 22 19:03:06 PDT' does not match format > > '%a %b %d %H:%M:%S %Z' > > Support for the %Z directive is based on the values contained in > tzname and whether daylight is true. Because of this, it is platform- > specific except for recognizing UTC and GMT which are always known > (and are considered to be non-daylight savings timezones). > > http://docs.python.org/library/time.html > > Dateutil has it's own timezone database, so should work > reliablyhttp://labix.org/python-dateutil Thanks much, I missed the directive settings. -- http://mail.python.org/mailman/listinfo/python-list
problem with strptime and time zone
Hi, perhaps I missed this posted already somewhere. I am got a program which reads time stings from some devices which are providing the time zones. I have to take this into account when doing some epoch time calculations. When I run the following code with the time zone string set to 'GMT' it works ok. This works: myStrA = 'Sun Aug 22 19:03:06 GMT' gTimeA = strptime( myStrA, '%a %b %d %H:%M:%S %Z') print "gTimeA = ",gTimeA --- But when its set to 'PDT' it fails. Any ideas? whereas this fails: myStrA = 'Sun Aug 22 19:03:06 PDT' gTimeA = strptime( myStrA, '%a %b %d %H:%M:%S %Z') print "gTimeA = ",gTimeA ValueError: time data 'Sun Aug 22 19:03:06 PDT' does not match format '%a %b %d %H:%M:%S %Z' thank you, -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with tarfile module to open *.tar.gz files - unreliable ?
On Aug 20, 12:55 pm, Peter Otten <__pete...@web.de> wrote: > m_ahlenius wrote: > > I am using Python 2.6.5. > > > Unfortunately I don't have other versions installed so its hard to > > test with a different version. > > > As for the log compression, its a bit hard to test. Right now I may > > process 100+ of these logs per night, and will get maybe 5 which are > > reported as corrupt (typically a bad CRC) and 2 which it reported as a > > bad tar archive. This morning I checked each of the 7 reported > > problem files by manually opening them with "tar -xzvof" and they were > > all indeed corrupt. Sign. > > So many corrupted files? I'd say you have to address the problem with your > infrastructure first. > > > Unfortunately due to the nature of our business, I can't post the data > > files online, I hope you can understand. But I really appreciate your > > suggestions. > > > The thing that gets me is that it seems to work just fine for most > > files, but then not others. Labeling normal files as corrupt hurts us > > as we then skip getting any log data from those files. > > > appreciate all your help. > > I've written an autocorruption script, > > import sys > import subprocess > import tarfile > > def process(source, dest, data): > for pos in range(len(data)): > for bit in range(8): > new_data = data[:pos] + chr(ord(data[pos]) ^ (1< data[pos+1:] > assert len(data) == len(new_data) > out = open(dest, "w") > out.write(new_data) > out.close() > try: > t = tarfile.open(dest) > for f in t: > t.extractfile(f) > except Exception, e: > if 0 == subprocess.call(["tar", "-xf", dest]): > return pos, bit > > if __name__ == "__main__": > source, dest = sys.argv[1:] > data = open(source).read() > print process(source, dest, data) > > and I can indeed construct an archive that is rejected by tarfile, but not > by tar. My working hypothesis is that the python library is a bit stricter > in what it accepts... > > Peter Thanks - that's cool. A friend of mine was suggesting that he's seen similar behaviour when he uses Perl on these types of files when the OS (Unix) has not finished writing them. We have an rsync process which sync's our servers for these files and then come down somewhat randomly. So its conceivable I think that this process could be trying to open a file as its being written. I know it sounds like a stretch but my guess is that its a possibility. I could verify that with the timestamps of the errors in my log and the mod time on the original file. 'mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with tarfile module to open *.tar.gz files - unreliable ?
On Aug 20, 9:25 am, Peter Otten <__pete...@web.de> wrote: > m_ahlenius wrote: > > On Aug 20, 6:57 am, m_ahlenius wrote: > >> On Aug 20, 5:34 am, Dave Angel wrote: > > >> > m_ahlenius wrote: > >> > > Hi, > > >> > > I am relatively new to doing serious work in python. I am using it > >> > > to access a large number of log files. Some of the logs get > >> > > corrupted and I need to detect that when processing them. This code > >> > > seems to work for quite a few of the logs (all same structure) It > >> > > also correctly identifies some corrupt logs but then it identifies > >> > > others as being corrupt when they are not. > > >> > > example error msg from below code: > > >> > > Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz' > >> > > Exception: CRC check\ > >> > > failed 0x8967e931 != 0x4e5f1036L > > >> > > When I manually examine the supposed corrupt log file and use > >> > > "tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz " on it, it opens > >> > > just fine. > > >> > > Is there anything wrong with how I am using this module? (extra code > >> > > removed for clarity) > > >> > > if tarfile.is_tarfile( file ): > >> > > try: > >> > > xf = tarfile.open( file, "r:gz" ) > >> > > for locFile in xf: > >> > > logfile = xf.extractfile( locFile ) > >> > > validFileFlag = True > >> > > # iterate through each log file, grab the first and > >> > > the last lines > >> > > lines = iter( logfile ) > >> > > firstLine = lines.next() > >> > > for nextLine in lines: > >> > > > >> > > continue > > >> > > logfile.close() > >> > > ... > >> > > xf.close() > >> > > except Exception, e: > >> > > validFileFlag = False > >> > > msg = "\nCould not open the log file: " + repr(file) + " > >> > > Exception: " + str(e) + "\n" > >> > > else: > >> > > validFileFlag = False > >> > > lTime = extractFileNameTime( file ) > >> > > msg = ">>>>>>> Warning " + file + " is NOT a valid tar archive > >> > > \n" > >> > > print msg > > >> > I haven't used tarfile, but this feels like a problem with the Win/Unix > >> > line endings. I'm going to assume you're running on Windows, which > >> > could trigger the problem I'm going to describe. > > >> > You use 'file' to hold something, but don't show us what. In fact, > >> > it's a lousy name, since it's already a Python builtin. But if it's > >> > holding fileobj, that you've separately opened, then you need to change > >> > that open to use mode 'rb' > > >> > The problem, if I've guessed right, is that occasionally you'll > >> > accidentally encounter a 0d0a sequence in the middle of the (binary) > >> > compressed data. If you're on Windows, and use the default 'r' mode, > >> > it'll be changed into a 0a byte. Thus corrupting the checksum, and > >> > eventually the contents. > > >> > DaveA > > >> Hi, > > >> thanks for the comments - I'll change the variable name. > > >> I am running this on linux so don't think its a Windows issue. So if > >> that's the case > >> is the 0d0a still an issue? > > >> 'mark > > > Oh and what's stored currently in > > The file var us just the unopened pathname to the > > Target file I want to open > > Random questions: > > What python version are you using? > If you have other python versions around, do they exhibit the same problem? > If you extract and compress your data using the external tool, does the > resulting file make problems in Python, too? > If so, can you reduce data size and put a small demo online for others to > experiment with? > > Peter Hi, I am using Python 2.6.5. Unfortunately I don't have other versions installed so its hard to test with a different version. As for the log compression, its a bit hard to test. Right now I may process 100+ of these logs per night, and will get maybe 5 which are reported as corrupt (typically a bad CRC) and 2 which it reported as a bad tar archive. This morning I checked each of the 7 reported problem files by manually opening them with "tar -xzvof" and they were all indeed corrupt. Sign. Unfortunately due to the nature of our business, I can't post the data files online, I hope you can understand. But I really appreciate your suggestions. The thing that gets me is that it seems to work just fine for most files, but then not others. Labeling normal files as corrupt hurts us as we then skip getting any log data from those files. appreciate all your help. 'mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with tarfile module to open *.tar.gz files - unreliable ?
On Aug 20, 9:10 am, Dave Angel wrote: > m_ahlenius wrote: > > On Aug 20, 6:57 am, m_ahlenius wrote: > > >> On Aug 20, 5:34 am, Dave Angel wrote: > > >>> m_ahlenius wrote: > > >>>> Hi, > > >>>> I am relatively new to doing serious work in python. I am using it to > >>>> access a large number of log files. Some of the logs get corrupted > >>>> and I need to detect that when processing them. This code seems to > >>>> work for quite a few of the logs (all same structure) It also > >>>> correctly identifies some corrupt logs but then it identifies others > >>>> as being corrupt when they are not. > > >>>> example error msg from below code: > > >>>> Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz' > >>>> Exception: CRC check\ > >>>> failed 0x8967e931 !=x4e5f1036L > > >>>> When I manually examine the supposed corrupt log file and use > >>>> "tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz " on it, it opens > >>>> just fine. > > >>>> Is there anything wrong with how I am using this module? (extra code > >>>> removed for clarity) > > >>>> if tarfile.is_tarfile( file ): > >>>> try: > >>>> xf =arfile.open( file, "r:gz" ) > >>>> for locFile in xf: > >>>> logfile =f.extractfile( locFile ) > >>>> validFileFlag =rue > >>>> # iterate through each log file, grab the first and > >>>> the last lines > >>>> lines =ter( logfile ) > >>>> firstLine =ines.next() > >>>> for nextLine in lines: > >>>> > >>>> continue > > >>>> logfile.close() > >>>> ... > >>>> xf.close() > >>>> except Exception, e: > >>>> validFileFlag =alse > >>>> msg =\nCould not open the log file: " + repr(file) + " > >>>> Exception: " + str(e) + "\n" > >>>> else: > >>>> validFileFlag =alse > >>>> lTime =xtractFileNameTime( file ) > >>>> msg =>>>>>>> Warning " + file + " is NOT a valid tar archive > >>>> \n" > >>>> print msg > > >>> I haven't used tarfile, but this feels like a problem with the Win/Unix > >>> line endings. I'm going to assume you're running on Windows, which > >>> could trigger the problem I'm going to describe. > > >>> You use 'file' to hold something, but don't show us what. In fact, it's > >>> a lousy name, since it's already a Python builtin. But if it's holding > >>> fileobj, that you've separately opened, then you need to change that > >>> open to use mode 'rb' > > >>> The problem, if I've guessed right, is that occasionally you'll > >>> accidentally encounter a 0d0a sequence in the middle of the (binary) > >>> compressed data. If you're on Windows, and use the default 'r' mode, > >>> it'll be changed into a 0a byte. Thus corrupting the checksum, and > >>> eventually the contents. > > >>> DaveA > > >> Hi, > > >> thanks for the comments - I'll change the variable name. > > >> I am running this on linux so don't think its a Windows issue. So if > >> that's the case > >> is the 0d0a still an issue? > > >> 'mark > > > Oh and what's stored currently in > > The file var us just the unopened pathname to the > > Target file I want to open > > No, on Linux, there should be no such problem. And I have to assume > that if you pass the filename as a string, the library would use 'rb' > anyway. It's just if you pass a fileobj, AND are on Windows. > > Sorry I wasted your time, but nobody else had answered, and I hoped it > might help. > > DaveA Hi Dave thanks for responding - you were not wasting my time but helping me to be aware of other potential issues. Appreciate it much. Its just weird that it works for most files and even finds corrupt ones, but some of the ones it marks as corrupt seem to be OK. thanks 'mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with tarfile module to open *.tar.gz files - unreliable ?
On Aug 20, 6:57 am, m_ahlenius wrote: > On Aug 20, 5:34 am, Dave Angel wrote: > > > > > > > m_ahlenius wrote: > > > Hi, > > > > I am relatively new to doing serious work in python. I am using it to > > > access a large number of log files. Some of the logs get corrupted > > > and I need to detect that when processing them. This code seems to > > > work for quite a few of the logs (all same structure) It also > > > correctly identifies some corrupt logs but then it identifies others > > > as being corrupt when they are not. > > > > example error msg from below code: > > > > Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz' > > > Exception: CRC check\ > > > failed 0x8967e931 != 0x4e5f1036L > > > > When I manually examine the supposed corrupt log file and use > > > "tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz " on it, it opens > > > just fine. > > > > Is there anything wrong with how I am using this module? (extra code > > > removed for clarity) > > > > if tarfile.is_tarfile( file ): > > > try: > > > xf = tarfile.open( file, "r:gz" ) > > > for locFile in xf: > > > logfile = xf.extractfile( locFile ) > > > validFileFlag = True > > > # iterate through each log file, grab the first and > > > the last lines > > > lines = iter( logfile ) > > > firstLine = lines.next() > > > for nextLine in lines: > > > > > > continue > > > > logfile.close() > > > ... > > > xf.close() > > > except Exception, e: > > > validFileFlag = False > > > msg = "\nCould not open the log file: " + repr(file) + " > > > Exception: " + str(e) + "\n" > > > else: > > > validFileFlag = False > > > lTime = extractFileNameTime( file ) > > > msg = ">>>>>>> Warning " + file + " is NOT a valid tar archive > > > \n" > > > print msg > > > I haven't used tarfile, but this feels like a problem with the Win/Unix > > line endings. I'm going to assume you're running on Windows, which > > could trigger the problem I'm going to describe. > > > You use 'file' to hold something, but don't show us what. In fact, it's > > a lousy name, since it's already a Python builtin. But if it's holding > > fileobj, that you've separately opened, then you need to change that > > open to use mode 'rb' > > > The problem, if I've guessed right, is that occasionally you'll > > accidentally encounter a 0d0a sequence in the middle of the (binary) > > compressed data. If you're on Windows, and use the default 'r' mode, > > it'll be changed into a 0a byte. Thus corrupting the checksum, and > > eventually the contents. > > > DaveA > > Hi, > > thanks for the comments - I'll change the variable name. > > I am running this on linux so don't think its a Windows issue. So if > that's the case > is the 0d0a still an issue? > > 'mark Oh and what's stored currently in The file var us just the unopened pathname to the Target file I want to open -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with tarfile module to open *.tar.gz files - unreliable ?
On Aug 20, 5:34 am, Dave Angel wrote: > m_ahlenius wrote: > > Hi, > > > I am relatively new to doing serious work in python. I am using it to > > access a large number of log files. Some of the logs get corrupted > > and I need to detect that when processing them. This code seems to > > work for quite a few of the logs (all same structure) It also > > correctly identifies some corrupt logs but then it identifies others > > as being corrupt when they are not. > > > example error msg from below code: > > > Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz' > > Exception: CRC check\ > > failed 0x8967e931 != 0x4e5f1036L > > > When I manually examine the supposed corrupt log file and use > > "tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz " on it, it opens > > just fine. > > > Is there anything wrong with how I am using this module? (extra code > > removed for clarity) > > > if tarfile.is_tarfile( file ): > > try: > > xf = tarfile.open( file, "r:gz" ) > > for locFile in xf: > > logfile = xf.extractfile( locFile ) > > validFileFlag = True > > # iterate through each log file, grab the first and > > the last lines > > lines = iter( logfile ) > > firstLine = lines.next() > > for nextLine in lines: > > > > continue > > > logfile.close() > > ... > > xf.close() > > except Exception, e: > > validFileFlag = False > > msg = "\nCould not open the log file: " + repr(file) + " > > Exception: " + str(e) + "\n" > > else: > > validFileFlag = False > > lTime = extractFileNameTime( file ) > > msg = ">>>>>>> Warning " + file + " is NOT a valid tar archive > > \n" > > print msg > > I haven't used tarfile, but this feels like a problem with the Win/Unix > line endings. I'm going to assume you're running on Windows, which > could trigger the problem I'm going to describe. > > You use 'file' to hold something, but don't show us what. In fact, it's > a lousy name, since it's already a Python builtin. But if it's holding > fileobj, that you've separately opened, then you need to change that > open to use mode 'rb' > > The problem, if I've guessed right, is that occasionally you'll > accidentally encounter a 0d0a sequence in the middle of the (binary) > compressed data. If you're on Windows, and use the default 'r' mode, > it'll be changed into a 0a byte. Thus corrupting the checksum, and > eventually the contents. > > DaveA Hi, thanks for the comments - I'll change the variable name. I am running this on linux so don't think its a Windows issue. So if that's the case is the 0d0a still an issue? 'mark -- http://mail.python.org/mailman/listinfo/python-list
Problem with tarfile module to open *.tar.gz files - unreliable ?
Hi, I am relatively new to doing serious work in python. I am using it to access a large number of log files. Some of the logs get corrupted and I need to detect that when processing them. This code seems to work for quite a few of the logs (all same structure) It also correctly identifies some corrupt logs but then it identifies others as being corrupt when they are not. example error msg from below code: Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz' Exception: CRC check\ failed 0x8967e931 != 0x4e5f1036L When I manually examine the supposed corrupt log file and use "tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz " on it, it opens just fine. Is there anything wrong with how I am using this module? (extra code removed for clarity) if tarfile.is_tarfile( file ): try: xf = tarfile.open( file, "r:gz" ) for locFile in xf: logfile = xf.extractfile( locFile ) validFileFlag = True # iterate through each log file, grab the first and the last lines lines = iter( logfile ) firstLine = lines.next() for nextLine in lines: continue logfile.close() ... xf.close() except Exception, e: validFileFlag = False msg = "\nCould not open the log file: " + repr(file) + " Exception: " + str(e) + "\n" else: validFileFlag = False lTime = extractFileNameTime( file ) msg = ">>> Warning " + file + " is NOT a valid tar archive \n" print msg -- http://mail.python.org/mailman/listinfo/python-list
question on storing dates in a tuple
Hi, I have a weird question about tuples. I am getting some dates from a mysql db, using the mysqldb interface. I am doing a standard query with several of the fields are "datetime" format in mysql. When I retrieve and print them in python, they look fine. eg. storeddate = 2010-02-07 12:03:41 But when I append these dates into a string to write to an ascii log file, they come out as: datetime.datetime(2010, 2, 7, 12, 03, 41)(or something along those lines). It appears that I am dealing with some sort of date object here. So I am just trying to understand what's going on, and whats the best work with such objects. thanks 'mark -- http://mail.python.org/mailman/listinfo/python-list
Re: question on using tarfile to read a *.tar.gzip file
On Feb 7, 5:01 pm, Tim Chase wrote: > > Is there a way to do this, without decompressing each file to a temp > > dir? Like is there a method using some tarfile interface adapter to > > read a compressed file? Otherwise I'll just access each file, extract > > it, grab the 1st and last lines and then delete the temp file. > > I think you're looking for the extractfile() method of the > TarFile object: > > from glob import glob > from tarfile import TarFile > for fname in glob('*.tgz'): > print fname > tf = TarFile.gzopen(fname) > for ti in tf: > print ' %s' % ti.name > f = tf.extractfile(ti) > if not f: continue > fi = iter(f) # f doesn't natively support next() > first_line = fi.next() > for line in fi: pass > f.close() > print " First line: %r" % first_line > print " Last line: %r" % line > tf.close() > > If you just want the first & last lines, it's a little more > complex if you don't want to scan the entire file (like I do with > the for-loop), but the file-like object returned by extractfile() > is documented as supporting seek() so you can skip to the end and > then read backwards until you have sufficient lines. I wrote a > "get the last line of a large file using seeks from the EOF" > function which you can find at [1] which should handle the odd > edge cases of $BUFFER_SIZE containing more or less than a full > line and then reading backwards in chunks (if needed) until you > have one full line, handling a one-line file, and other > odd/annoying edge-cases. Hope it helps. > > -tkc > > [1]http://mail.python.org/pipermail/python-list/2009-January/1186176.html Thanks Tim - this was very helpful. Just learning about tarfile. 'mark -- http://mail.python.org/mailman/listinfo/python-list
question on using tarfile to read a *.tar.gzip file
Hi, I have a number of relatively large number *tar.gzip files to process. With the py module tarfile, I see that I can access and extract them, one at a time to a temporary dir, but that of course takes time. All that I need to do is to read the first and last lines of each file and then move on to the next one. I am not changing anything in these files - just reading. The file lines are not fixed lengths either, which makes it a bit more fun. Is there a way to do this, without decompressing each file to a temp dir? Like is there a method using some tarfile interface adapter to read a compressed file? Otherwise I'll just access each file, extract it, grab the 1st and last lines and then delete the temp file. thx 'mark -- http://mail.python.org/mailman/listinfo/python-list