Re: [Tutor] A further question about opening and closing files
On 09/09/15 20:42, Laura Creighton wrote: In a message of Wed, 09 Sep 2015 20:25:06 +0100, Alan Gauld writes: On 09/09/15 19:20, Laura Creighton wrote: If you are working on a small platform - think mobile device - and it has a single channel bus to the storage area then one of the worst things you can do is write lots of small chunks of data to it. The overhead (in hardware) of opening and locking the bus is almost as much as the data transit time and so can choke the bus for a significant amount of time (I'm talking milliseconds here but in real-time that's significant). But if I shoot you with my laser cannon, I want you to get the message that you are dead _now_ and not when some buffer fills up ... There are two things about that: 1) human reaction time is measured in 100s of milliseconds so the delay is not likely to be meaningful. If you do the flushes every 10ms instead of every write (assuming you are writing frequently) nobody is likely to notice. 2) Gamers tend not to be doing other things while playing, so you can pretty much monopolize the bus if you want to, So if you know that you're the only game in town(sic) then go ahead and flush everything to disk. It won't do much harm. But... ..., if your game engine is running on a server shared by other users and some of them are running critical apps (think a businesses billing or accounting suite that must complete its run within a 1 hour window say) then you become very unpopular quickly. In practice that means the sys admin will see who is flattening the bus and nice that process down till it stops hurting the others. That means your game now runs at 10% the CPU power it had a while ago... As programmers we very rarely have the control over our environment that we like to think we do. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A further question about opening and closing files
In a message of Wed, 09 Sep 2015 20:25:06 +0100, Alan Gauld writes: >On 09/09/15 19:20, Laura Creighton wrote: >If you are working on a small platform - think mobile device - and it has >a single channel bus to the storage area then one of the worst things >you can do is write lots of small chunks of data to it. The overhead >(in hardware) of opening and locking the bus is almost as much as >the data transit time and so can choke the bus for a significant amount >of time (I'm talking milliseconds here but in real-time that's significant). But if I shoot you with my laser cannon, I want you to get the message that you are dead _now_ and not when some buffer fills up ... Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A further question about opening and closing files
On 09/09/15 19:20, Laura Creighton wrote: In a message of Wed, 09 Sep 2015 17:42:05 +0100, Alan Gauld writes: You can force the writes (I see Laura has shown how) but mostly you should just let the OS do it's thing. Otherwise you risk cluttering up the IO bus and preventing other programs from writing their files. Is this something we have to worry about these days? I haven't worried about it for a long time, and write real time multiplayer games which demand unbuffered writes Of course, things would be different if I were sending gigabytes of video down the pipe, but for the sort of small writes I am doing, I don't think there is any performance problem at all. Anybody got some benchmarks so we can find out? Laura If you are working on a small platform - think mobile device - and it has a single channel bus to the storage area then one of the worst things you can do is write lots of small chunks of data to it. The overhead (in hardware) of opening and locking the bus is almost as much as the data transit time and so can choke the bus for a significant amount of time (I'm talking milliseconds here but in real-time that's significant). But even on a major OS platform bus contention does occasionally rear its head. I've seen multi-processor web servers "lock up" due to too many threads dumping data at once. Managing the data bus is (part of) what the OS is there to do, it's best to let it do its job, second guessing it is rarely the right thing. Remember, the impact is never on your own program it's on all the other processes running on the same platform. There are usually tools to monitor the IO bus performance though, so it's fairly easy to diagnose/check. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] iterating through a directory
On 09/09/2015 14:32, richard kappler wrote: Yes, many questions today. I'm working on a data feed script that feeds 'events' into our test environment. In production, we monitor a camera that captures an image as product passes by, gathers information such as barcodes and package ID from the image, and then sends out the data as a line of xml to one place for further processing and sends the image to another place for storage. Here is where our software takes over, receiving the xml data and images from vendor equipment. Our software then processes the xml data and allows retrieval of specific images associated with each line of xml data. Our test environment must simulate the feed from the vendor equipment, so I'm writing a script that feeds the xml from an actual log to one place for processing, and pulls images from a pool of 20, which we recycle, to associate with each event and sends them to another dir for saving and searching. As my script iterates through each line of xml data (discussed yesterday) to simulate the feed from the vendor camera equipment, it parses ID information about the event then sends the line of data on. As it does so, it needs to pull the next image in line from the image pool directory, rename it, send it to a different directory for saving. I'm pretty solid on all of this except iterating through the image pool. My idea is to just keep looping through the image pool, as each line of xmldata is parsed, the next image in line gets pulled out, renamed with the identifying information from the xml data, and both are sent on to different places. I only have one idea for doing this iterating through the image pool, and frankly I think the idea is pretty weak. The only thing I can come up with is to change the name of each image in the pool from something like 0219PS01CT1_2029_04_00044979.jpg to 1.jpg, 2.jpg, 3.jpg etc., then doing something like: i = 0 here begins my for line in file loop: if i == 20: i = 1 else: i += 1 do stuff with the xml file including get ID info rename i.jpg to IDinfo.jpg send it all on That seems pretty barbaric, any thoughts on where to look for better ideas? I'm presuming there are modules out there that I am unaware of or capabilities I am unaware of in modules I do know a little about that I am missing. I'm not sure what you're trying to do so maybe https://docs.python.org/3/library/collections.html#collections.deque or https://docs.python.org/3/library/itertools.html#itertools.cycle -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] More Pythonic?
Timo wrote: > Op 09-09-15 om 15:41 schreef Steven D'Aprano: >> On Wed, Sep 09, 2015 at 09:05:23AM -0400, richard kappler wrote: >>> Would either or both of these work, if both, which is the better or more >>> Pythonic way to do it, and why? >> The first works, but isn't really the best way to do it: >> >>> ### >>> >>> import whatIsNeeded >>> >>> writefile = open("writefile", 'a') >>> >>> with open(readfile, 'r') as f: >>> for line in f: >>> if keyword in line: >>> do stuff >>> f1.write(line) >>> else: >>> f1.write(line) >>> >>> writefile.close() >>> >>> ## >> Better would be this: >> >> with open("writefile", 'a') as outfile: >> with open("readfile", 'r') as infile: >> for line in infile: >> if keyword in line: >> do stuff >> outfile.write(line) >> > It's also possible to use multiple with statements on the same line. Can > someone with more expert Python knowledge than me comment on whether > it's different from using them separate as mentioned by Steven? > > This is what I had in mind: > > with open("writefile", 'a') as outfile, open("readfile", 'r') as infile: > pass # Rest of the code here This requires Python 2.7 or higher. Other than that the choice is merely a matter of taste. Both versions even produce the same bytecode: $ cat nested_with.py def f(): with open("writefile", 'a') as outfile, open("readfile", 'r') as infile: pass # Rest of the code here def g(): with open("writefile", 'a') as outfile: with open("readfile", 'r') as infile: pass # Rest of the code here print(f.__code__.co_code == g.__code__.co_code) $ python nested_with.py True Personally I find one item per with statement more readable and don't care about the extra indentation level. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A further question about opening and closing files
On Wed, Sep 09, 2015 at 08:20:44PM +0200, Laura Creighton wrote: > In a message of Wed, 09 Sep 2015 17:42:05 +0100, Alan Gauld writes: > >You can force the writes (I see Laura has shown how) but > >mostly you should just let the OS do it's thing. Otherwise > >you risk cluttering up the IO bus and preventing other > >programs from writing their files. > > Is this something we have to worry about these days? I haven't > worried about it for a long time, and write real time multiplayer > games which demand unbuffered writes Of course, things > would be different if I were sending gigabytes of video down the > pipe, but for the sort of small writes I am doing, I don't think > there is any performance problem at all. > > Anybody got some benchmarks so we can find out? Good question! There's definitely a performance hit, but it's not as big as I expected: py> with Stopwatch(): ... with open("/tmp/junk", "w") as f: ... for i in range(10): ... f.write("a") ... time taken: 0.129952 seconds py> with Stopwatch(): ... with open("/tmp/junk", "w") as f: ... for i in range(10): ... f.write("a") ... f.flush() ... time taken: 0.579273 seconds What really gets expensive is doing a sync. py> with Stopwatch(): ... with open("/tmp/junk", "w") as f: ... fid = f.fileno() ... for i in range(10): ... f.write("a") ... f.flush() ... os.fsync(fid) ... time taken: 123.283973 seconds Yes, that's right. From half a second to two minutes. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A further question about opening and closing files
On Wed, Sep 09, 2015 at 10:24:57AM -0400, richard kappler wrote: > Under a different subject line (More Pythonic?) Steven D'Aprano commented: > > > And this will repeatedly open the file, append one line, then close it > > again. Almost certainly not what you want -- it's wasteful and > > potentially expensive. > > And I get that. It does bring up another question though. When using > > with open(somefile, 'r') as f: > with open(filename, 'a') as f1: > for line in f: > > the file being appended is opened and stays open while the loop iterates, > then the file closes when exiting the loop, yes? The file closes when exiting the *with block*, not necessarily the loop. Consider: with open(blah blah blah) as f: for line in f: pass time.sleep(120) # file isn't closed until we get here Even if the file is empty, and there are no lines, it will be held open for two minutes. > Does this not have the > potential to be expensive as well if you are writing a lot of data to the > file? Er, expensive in what way? Yes, I suppose it is more expensive to write 1 gigabyte of data to a file than to write 1 byte. What's your point? If you want to write 1 GB, then you have to write 1 GB, and it will take as long as it takes. Look at it this way: suppose you have to hammer 1000 nails into a fence. You can grab your hammer out of your tool box, hammer one nail, put the hammer back in the tool box and close the lid, open the lid, take the hammer out again, hammer one nail, put the hammer back in the tool box, close the lid, open the lid again, take out the hammer... Or you take the hammer out, hammer 1000 nails, then put the hammer away. Sure, while you are hammering those 1000 nails, you're not mowing the lawn, painting the porch, walking the dog or any of the dozen other jobs you have to do, but you have to hammer those nails eventually. > I did a little experiment: > > >>> f1 = open("output/test.log", 'a') > >>> f1.write("this is a test") > >>> f1.write("this is a test") > >>> f1.write('why isn\'t this writing') > >>> f1.close() > > monitoring test.log as I went. Nothing was written to the file until I > closed it, or at least that's the way it appeared to the text editor in > which I had test.log open (gedit). In gedit, when a file changes it tells > you and gives you the option to reload the file. This didn't happen until I > closed the file. So I'm presuming all the writes sat in a buffer in memory > until the file was closed, at which time they were written to the file. Correct. All modern operating systems do that. Writing to disk is slow, *hundreds of thousands of times slower* than writing to memory, so the operating system will queue up a reasonable amount of data before actually forcing it to the disk drive. > Is that actually how it happens, and if so does that not also have the > potential to cause problems if memory is a concern? No. The operating system is not stupid enough to queue up gigabytes of data. Typically the buffer is a something like 128 KB of data (I think), or maybe a MB or so. Writing a couple of short lines of text won't fill it, which is why you don't see any change until you actually close the file. Try writing a million lines, and you'll see something different. The OS will flush the buffer when it is full, or when you close the file, whichever happens first. If you know that you're going to take a long time to fill the buffer, say you're performing a really slow calculation, and your data is trickling in really slowly, then you might do a file.flush() every few seconds or so. Or if you're writing an ACID database. But for normal use, don't try to out-smart the OS, because you will fail. This is really specialised know-how. Have you noticed how slow gedit is to save files? That's because the gedit programmers thought they were smarter than the OS, so every time they write a file, they call flush() and sync(). Possibly multiple times. All that happens is that they slow the writing down greatly. Other text editors let the OS manage this process, and saving is effectively instantaneous. With gedit, there's a visible pause when it saves. (At least in all the versions of gedit I've used.) And the data is not any more safe than the other text editors, because when the OS has written to the hard drive, there is no guarantee that the data has hit the platter yet. Hard drives themselves contain buffers, and they won't actually write data to the platter until they are good and ready. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] More Pythonic?
Op 09-09-15 om 15:41 schreef Steven D'Aprano: On Wed, Sep 09, 2015 at 09:05:23AM -0400, richard kappler wrote: Would either or both of these work, if both, which is the better or more Pythonic way to do it, and why? The first works, but isn't really the best way to do it: ### import whatIsNeeded writefile = open("writefile", 'a') with open(readfile, 'r') as f: for line in f: if keyword in line: do stuff f1.write(line) else: f1.write(line) writefile.close() ## Better would be this: with open("writefile", 'a') as outfile: with open("readfile", 'r') as infile: for line in infile: if keyword in line: do stuff outfile.write(line) It's also possible to use multiple with statements on the same line. Can someone with more expert Python knowledge than me comment on whether it's different from using them separate as mentioned by Steven? This is what I had in mind: with open("writefile", 'a') as outfile, open("readfile", 'r') as infile: pass # Rest of the code here Timo ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A further question about opening and closing files
In a message of Wed, 09 Sep 2015 17:42:05 +0100, Alan Gauld writes: >You can force the writes (I see Laura has shown how) but >mostly you should just let the OS do it's thing. Otherwise >you risk cluttering up the IO bus and preventing other >programs from writing their files. Is this something we have to worry about these days? I haven't worried about it for a long time, and write real time multiplayer games which demand unbuffered writes Of course, things would be different if I were sending gigabytes of video down the pipe, but for the sort of small writes I am doing, I don't think there is any performance problem at all. Anybody got some benchmarks so we can find out? Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Creating lists with definite (n) items without repetitions
On 09/09/2015 18:59, Oscar Benjamin wrote: On 9 September 2015 at 12:05, Francesco Loffredo via Tutor wrote: A quick solution is to add one "dummy" letter to the pool of the OP's golfers. I used "!" as the dummy one. This way, you end up with 101 triples, 11 of which contain the dummy player. But this is better than the 25-item pool, that resulted in an incomplete set of triples (for example, A would never play with Z) So, in your real-world problem, you will have 11 groups of 2 people instead of 3. Is this a problem? import pprint, itertools pool = "abcdefghijklmnopqrstuvwxyz!" def maketriples(tuplelist): final = [] used = set() for a, b, c in tuplelist: if ( ((a,b) in used) or ((b,c) in used) or ((a,c) in used) or ((b,a) in used) or ((c,b) in used) or ((c,a) in used) ): continue else: final.append((a, b, c)) used.add((a,b)) used.add((a,c)) used.add((b,c)) used.add((b,a)) used.add((c,a)) used.add((c,b)) return final combos = list(itertools.combinations(pool, 3)) print("combos contains %s triples." % len(combos)) triples = maketriples(combos) print("maketriples(combos) contains %s triples." % len(triples)) pprint.pprint(triples) I don't think the code above works. For n=27 it should count 117 (according to the formula I showed) but instead it comes up with 101. I tried it with a smaller n by setting pool to range(1, 9+1) meaning that n=9. The output is: combos contains 84 triples. maketriples(combos) contains 8 triples. [(1, 2, 3), (1, 4, 5), (1, 6, 7), (1, 8, 9), (2, 4, 6), (2, 5, 7), (3, 4, 7), (3, 5, 6)] However I can construct a set of 12 triples containing each number exactly 4 times which is the exact Steiner triple system: 1 6 8 1 2 3 1 5 9 1 4 7 2 6 7 2 4 9 2 5 8 3 5 7 3 6 9 3 8 4 4 5 6 7 8 9 This is the number of triples predicted by the formula: 9*(9-1)/6 = 12 -- Oscar That's very interesting! This takes me to my question to Tutors: what's wrong with the above code? Francesco ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Creating lists with definite (n) items without repetitions
On 9 September 2015 at 12:05, Francesco Loffredo via Tutor wrote: > Oscar Benjamin wrote: > > The problem is that there are 26 people and they are divided into > groups of 3 each day. We would like to know if it is possible to > arrange it so that each player plays each other player exactly once > over some period of days. > > It is not exactly possible to do this with 26 people in groups of 3. > Think about it from the perspective of 1 person. They must play > against all 25 other people in pairs with neither of the other people > repeated: the set of pairs they play against must partition the set of > other players. Clearly it can only work if the number of other players > is even. > > I'm not sure but I think that maybe for an exact solution you need to > have n=1(mod6) or n=3(mod6) which gives: > n = 1, 3, 7, 9, 13, 15, 19, 21, 25, 27, ... > > The formula for the number of triples if the exact solution exists is > n*(n-1)/6 which comes out as 26*25/6 = 108.3 (the formula doesn't > give an integer because the exact solution doesn't exist). > > > A quick solution is to add one "dummy" letter to the pool of the OP's > golfers. > I used "!" as the dummy one. This way, you end up with 101 triples, 11 of > which contain the dummy player. > But this is better than the 25-item pool, that resulted in an incomplete set > of triples (for example, A would never play with Z) > So, in your real-world problem, you will have 11 groups of 2 people instead > of 3. Is this a problem? > > > import pprint, itertools > pool = "abcdefghijklmnopqrstuvwxyz!" > > def maketriples(tuplelist): > final = [] > used = set() > for a, b, c in tuplelist: > if ( ((a,b) in used) or ((b,c) in used) or ((a,c) in used) or ((b,a) > in used) or ((c,b) in used) or ((c,a) in used) ): > continue > else: > final.append((a, b, c)) > used.add((a,b)) > used.add((a,c)) > used.add((b,c)) > used.add((b,a)) > used.add((c,a)) > used.add((c,b)) > return final > > combos = list(itertools.combinations(pool, 3)) > print("combos contains %s triples." % len(combos)) > > triples = maketriples(combos) > > print("maketriples(combos) contains %s triples." % len(triples)) > pprint.pprint(triples) I don't think the code above works. For n=27 it should count 117 (according to the formula I showed) but instead it comes up with 101. I tried it with a smaller n by setting pool to range(1, 9+1) meaning that n=9. The output is: combos contains 84 triples. maketriples(combos) contains 8 triples. [(1, 2, 3), (1, 4, 5), (1, 6, 7), (1, 8, 9), (2, 4, 6), (2, 5, 7), (3, 4, 7), (3, 5, 6)] However I can construct a set of 12 triples containing each number exactly 4 times which is the exact Steiner triple system: 1 6 8 1 2 3 1 5 9 1 4 7 2 6 7 2 4 9 2 5 8 3 5 7 3 6 9 3 8 4 4 5 6 7 8 9 This is the number of triples predicted by the formula: 9*(9-1)/6 = 12 -- Oscar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A further question about opening and closing files
On 09/09/15 15:24, richard kappler wrote: f1 = open("output/test.log", 'a') f1.write("this is a test") f1.write("this is a test") f1.write('why isn\'t this writing') f1.close() monitoring test.log as I went. Nothing was written to the file until I closed it, or at least that's the way it appeared to the text editor For a short example like this its true, for a bigger example the buffer will be flushed periodically, as it fills up. This is not a Python thing it's an OS feature, the same is true for any program. Its much more efficient use of the IO bus. (Its also why you should always explicitly close a file opened for writing - unless using with which does it for you) You can force the writes (I see Laura has shown how) but mostly you should just let the OS do it's thing. Otherwise you risk cluttering up the IO bus and preventing other programs from writing their files. HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] iterating through a directory
On 09/09/15 15:29, richard kappler wrote: Still not sure how to efficiently get the script to keep moving to the next file in the directory though, in other words, for each iteration in the loop, I want it to fetch, rename and send/save the next image in line. Hope that brings better understanding. Sounds like you want a circular list. The traditional way to generate a circular index into a list is to use the modulo (%) operator But the itertools module gives you a better option with the cycle function: import itertools as it for img in it.cycle(os.listdir(my_img_path)): process(img) HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A further question about opening and closing files
Thanks, tried them both, both work great on Linux. Now I understand better. regards, Richard On Wed, Sep 9, 2015 at 11:28 AM, Laura Creighton wrote: > >I did a little experiment: > > > f1 = open("output/test.log", 'a') > f1.write("this is a test") > f1.write("this is a test") > f1.write('why isn\'t this writing') > f1.close() > > If you want the thing written out, use f1.flush() whenever you want to > make sure this happens. > > If you want completely unbuffered writing, then you can open your file > this way, with f1 = open("output/test.log", 'a', 0) I think if you are > on windows you can only get unbuffered writing if you open your file > in binary mode. > > Laura > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > -- All internal models of the world are approximate. ~ Sebastian Thrun ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A further question about opening and closing files
>I did a little experiment: > f1 = open("output/test.log", 'a') f1.write("this is a test") f1.write("this is a test") f1.write('why isn\'t this writing') f1.close() If you want the thing written out, use f1.flush() whenever you want to make sure this happens. If you want completely unbuffered writing, then you can open your file this way, with f1 = open("output/test.log", 'a', 0) I think if you are on windows you can only get unbuffered writing if you open your file in binary mode. Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Fwd: find second occurance of string in line
Peter Otten >Those who regularly need different configurations probably use virtualenv, >or virtual machines when the differences are not limited to Python. Use tox for this. https://testrun.org/tox/latest/ However for development purposes it often helps to have a --force the_one_that_I_want option (for command lines) or a global variable, or a config file for modules. How badly you want this depends on your own personal development style, and how happy you are popping in and out of virtualenvs. Many people prefer to write their whole new thing for one library (say elementtree) and then test it/port it against the other 2, one at a time, making a complete set of patches for one adaptation at a time. Other people prefer to write their code so that, feature by feature they first get it to work with one library, and then with another, and then with the third, and then they write the next new bit of code, so that they never have to do a real port. Life is messy enough that you often do a bit of this and a bit of the other thing, even if you would prefer to not need to, especially if hungry customers are demanding exactly what they need (and we don't care about the other ways it will eventually work for other people). Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] iterating through a directory
Albert-Jan, thanks for the response. shutil.copyfile does seem to be one of the tools I need to make the copying, renaming the copy and saving it elsewhere in one line instead of three or more. Still not sure how to efficiently get the script to keep moving to the next file in the directory though, in other words, for each iteration in the loop, I want it to fetch, rename and send/save the next image in line. Hope that brings better understanding. Thanks for the tip! regards, Richard On Wed, Sep 9, 2015 at 9:46 AM, Albert-Jan Roskam wrote: > > Date: Wed, 9 Sep 2015 09:32:34 -0400 > > From: richkapp...@gmail.com > > To: tutor@python.org > > Subject: [Tutor] iterating through a directory > > > > > Yes, many questions today. I'm working on a data feed script that feeds > > 'events' into our test environment. In production, we monitor a camera > that > > captures an image as product passes by, gathers information such as > > barcodes and package ID from the image, and then sends out the data as a > > line of xml to one place for further processing and sends the image to > > another place for storage. Here is where our software takes over, > receiving > > the xml data and images from vendor equipment. Our software then > processes > > the xml data and allows retrieval of specific images associated with each > > line of xml data. Our test environment must simulate the feed from the > > vendor equipment, so I'm writing a script that feeds the xml from an > actual > > log to one place for processing, and pulls images from a pool of 20, > which > > we recycle, to associate with each event and sends them to another dir > for > > saving and searching. > > > > As my script iterates through each line of xml data (discussed yesterday) > > to simulate the feed from the vendor camera equipment, it parses ID > > information about the event then sends the line of data on. As it does > so, > > it needs to pull the next image in line from the image pool directory, > > rename it, send it to a different directory for saving. I'm pretty solid > on > > all of this except iterating through the image pool. My idea is to just > > keep looping through the image pool, as each line of xmldata is parsed, > the > > next image in line gets pulled out, renamed with the identifying > > information from the xml data, and both are sent on to different places. > > > > I only have one idea for doing this iterating through the image pool, and > > frankly I think the idea is pretty weak. The only thing I can come up > with > > is to change the name of each image in the pool from something > > like 0219PS01CT1_2029_04_00044979.jpg to 1.jpg, 2.jpg, 3.jpg > etc., > > then doing something like: > > > > i = 0 > > > > here begins my for line in file loop: > > if i == 20: > > i = 1 > > else: > > i += 1 > > do stuff with the xml file including get ID info > > rename i.jpg to IDinfo.jpg > > send it all on > > > > That seems pretty barbaric, any thoughts on where to look for better > ideas? > > I'm presuming there are modules out there that I am unaware of or > > capabilities I am unaware of in modules I do know a little about that I > am > > missing. > > I do not really understand what you intend to do, but the following > modules might come in handy > -os (os.rename, os.listdir) > -glob (glob.iglob or glob.glob) > -shutil (shutil.copy) > > > > -- All internal models of the world are approximate. ~ Sebastian Thrun ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] A further question about opening and closing files
Under a different subject line (More Pythonic?) Steven D'Aprano commented: > And this will repeatedly open the file, append one line, then close it > again. Almost certainly not what you want -- it's wasteful and > potentially expensive. And I get that. It does bring up another question though. When using with open(somefile, 'r') as f: with open(filename, 'a') as f1: for line in f: the file being appended is opened and stays open while the loop iterates, then the file closes when exiting the loop, yes? Does this not have the potential to be expensive as well if you are writing a lot of data to the file? I did a little experiment: >>> f1 = open("output/test.log", 'a') >>> f1.write("this is a test") >>> f1.write("this is a test") >>> f1.write('why isn\'t this writing') >>> f1.close() monitoring test.log as I went. Nothing was written to the file until I closed it, or at least that's the way it appeared to the text editor in which I had test.log open (gedit). In gedit, when a file changes it tells you and gives you the option to reload the file. This didn't happen until I closed the file. So I'm presuming all the writes sat in a buffer in memory until the file was closed, at which time they were written to the file. Is that actually how it happens, and if so does that not also have the potential to cause problems if memory is a concern? regards, Richard -- All internal models of the world are approximate. ~ Sebastian Thrun ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Fwd: find second occurance of string in line
Albert-Jan Roskam wrote: >> To: tutor@python.org >> From: __pete...@web.de >> Date: Tue, 8 Sep 2015 21:37:07 +0200 >> Subject: Re: [Tutor] Fwd: find second occurance of string in line >> >> Albert-Jan Roskam wrote: >> >> >> import lxml.etree >> >> >> >> tree = lxml.etree.parse("example.xml") >> >> print tree.xpath("//objectdata/general/timestamp/text()") >> > >> > Nice. I do need to try lxml some time. Is the "text()" part xpath as >> > well? >> >> Yes. I think ElementTree supports a subset of XPath. > > aha, I see. I studied lxml.de a bit last night and it seems to be better > in many ways. Writing appears to be mmmuch faster than cElementtree while > many other situations are comparable to cElementtree. I love the objectify > part of the package. Would you say that (given iterparse) lxml is also the > module to process giant (ie. larger than RAM) xml files)? Yes; but that would not be backed by experience ;) > The webpage recommends a cascade of try-except ImportError statementsL > first lxml, then cElementtree, then elementtree. But given that there are > slight API differences, is that really a good idea? I tend to shy away from such complications, just as I write Python-3-only code now. > How would you test > whether the code runs under both lxml and under the alternatives? Would > you uninstall lxml to force that one of the alternatives is used? Again, I have not much experience with code that must cope with different environments. I have simulated a missing module (not lxml) with the following trick: $ mkdir missing_lxml $ echo 'raise ImportError' > missing_lxml/lxml.py $ python3 -c 'import lxml' $ PYTHONPATH=missing_lxml python3 -c 'import lxml' Traceback (most recent call last): File "", line 1, in File "/home/peter/missing_lxml/lxml.py", line 1, in raise ImportError ImportError Those who regularly need different configurations probably use virtualenv, or virtual machines when the differences are not limited to Python. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] iterating through a directory
> Date: Wed, 9 Sep 2015 09:32:34 -0400 > From: richkapp...@gmail.com > To: tutor@python.org > Subject: [Tutor] iterating through a directory > > Yes, many questions today. I'm working on a data feed script that feeds > 'events' into our test environment. In production, we monitor a camera that > captures an image as product passes by, gathers information such as > barcodes and package ID from the image, and then sends out the data as a > line of xml to one place for further processing and sends the image to > another place for storage. Here is where our software takes over, receiving > the xml data and images from vendor equipment. Our software then processes > the xml data and allows retrieval of specific images associated with each > line of xml data. Our test environment must simulate the feed from the > vendor equipment, so I'm writing a script that feeds the xml from an actual > log to one place for processing, and pulls images from a pool of 20, which > we recycle, to associate with each event and sends them to another dir for > saving and searching. > > As my script iterates through each line of xml data (discussed yesterday) > to simulate the feed from the vendor camera equipment, it parses ID > information about the event then sends the line of data on. As it does so, > it needs to pull the next image in line from the image pool directory, > rename it, send it to a different directory for saving. I'm pretty solid on > all of this except iterating through the image pool. My idea is to just > keep looping through the image pool, as each line of xmldata is parsed, the > next image in line gets pulled out, renamed with the identifying > information from the xml data, and both are sent on to different places. > > I only have one idea for doing this iterating through the image pool, and > frankly I think the idea is pretty weak. The only thing I can come up with > is to change the name of each image in the pool from something > like 0219PS01CT1_2029_04_00044979.jpg to 1.jpg, 2.jpg, 3.jpg etc., > then doing something like: > > i = 0 > > here begins my for line in file loop: > if i == 20: > i = 1 > else: > i += 1 > do stuff with the xml file including get ID info > rename i.jpg to IDinfo.jpg > send it all on > > That seems pretty barbaric, any thoughts on where to look for better ideas? > I'm presuming there are modules out there that I am unaware of or > capabilities I am unaware of in modules I do know a little about that I am > missing. I do not really understand what you intend to do, but the following modules might come in handy -os (os.rename, os.listdir) -glob (glob.iglob or glob.glob) -shutil (shutil.copy) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] More Pythonic?
> It's not clear why you need the try...except: pass. Please provide some more background information. I don't need the try, this was more of a "are there different ways to do this, which is better and why?" experiment. I am learning, so tend to write script that is more brute force than elegant and pythonic, wish to write better code. I do okay, but there are many nuances to Python that I just haven't run across. For example: > with open(sourcefile) as instream: >with open(destfile, "a") as outstream: >outstream.writelines(process_lines(instream)) I had no idea I could nest with statements like that. It seems obvious now, but I didn't know. For the record, I have made a couple other posts this morning that explain the script constraints far better than I did here. For the sake of brevity I shant repeat the info here other than to say it's not reading from stdin, but from a log file to simulate stdin in a test environment. regards, Richard On Wed, Sep 9, 2015 at 9:37 AM, Peter Otten <__pete...@web.de> wrote: > richard kappler wrote: > > > Would either or both of these work, if both, which is the better or more > > Pythonic way to do it, and why? > > > > ### > > > > import whatIsNeeded > > > > writefile = open("writefile", 'a') > > > > with open(readfile, 'r') as f: > > for line in f: > > if keyword in line: > > do stuff > > f1.write(line) > > else: > > f1.write(line) > > Why do you invoke f1.write() twice? > > > writefile.close() > > > > ## > > > > import whatIsNeeded > > > > with open(readfile, 'r') as f: > > for line in f: > > try: > > if keyword in line: > > do stuff > > except: > > What exceptions are you expecting here? Be explicit. You probably don't > want > to swallow a KeyboardInterrupt. And if something unexpected goes wrong a > noisy complaint gives you the chance to either fix an underlying bug or > explicitly handle the exception in future runs of the script. > > > do nothing > > That's spelt > pass > > > with open(writefile, 'a') as f1: > > f1.write(line) > > Opening the file once per line written seems over-the-top to me. > > > ## > > > > or something else altogether? > > I tend to put the processing into into a generator. That makes it easy to > replace the source or the consumer: > > def process_lines(instream): > for line in instream: > if keyword in line: > do stuff > yield line > > with open(sourcefile) as instream: > with open(destfile, "a") as outstream: > outstream.writelines(process_lines(instream)) > > Now if you want to read from stdin and print to stdout: > > sys.stdout.writelines(process_lines(sys.stdin)) > > > I'm thinking the first way is better as it only opens the files once > > whereas it seems to me the second script would open and close the > > writefile once per iteration, and the do nothing in the except seems just > > wrong to me. > > It's not clear why you need the try...except: pass. > Please provide some more background information. > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > -- All internal models of the world are approximate. ~ Sebastian Thrun ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] More Pythonic?
On Wed, Sep 09, 2015 at 09:05:23AM -0400, richard kappler wrote: > Would either or both of these work, if both, which is the better or more > Pythonic way to do it, and why? The first works, but isn't really the best way to do it: > ### > > import whatIsNeeded > > writefile = open("writefile", 'a') > > with open(readfile, 'r') as f: > for line in f: > if keyword in line: > do stuff > f1.write(line) > else: > f1.write(line) > > writefile.close() > > ## Better would be this: with open("writefile", 'a') as outfile: with open("readfile", 'r') as infile: for line in infile: if keyword in line: do stuff outfile.write(line) (I think your intention is to always write the lines into the output file, but there are enough typos in your version that I can't be completely sure.) This, on the other hand, is certainly not what you want: > import whatIsNeeded > > with open(readfile, 'r') as f: > for line in f: > try: > if keyword in line: > do stuff > except: > do nothing Why are you ignoring *all errors*? That will make it impossible (or at least very hard) to cancel the script with Ctrl-C, and it will cover up programming errors. Apart from a very few expert uses, you should never use a bare except. If you really want to "ignore all errors", use `except Exception`, but even that is not good practice. You should list and catch only the precise errors that you know you wish to ignore and can safely handle. Everything else indicates a bug that needs fixing. By the way, "do nothing" in Python is spelled "pass". > with open(writefile, 'a') as f1: > f1.write(line) And this will repeatedly open the file, append one line, then close it again. Almost certainly not what you want -- it's wasteful and potentially expensive. > ## > > or something else altogether? > > I'm thinking the first way is better as it only opens the files once > whereas it seems to me the second script would open and close the writefile > once per iteration, and the do nothing in the except seems just wrong to > me. Is my thinking on target here? Spot on target. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Fwd: Fwd: find second occurance of string in line
> It looks likes I was not clear enough: XML doesn't have the concept of lines. When you process XML "by line" you have buggy code. No Peter, I'm pretty sure it was I who was less than clear. The xml data is generated by events, one line in a log for each event, so while xml doesn't have the concept of lines, each event creates a new line of xml data in the log from which I will be reading. Sorry about the confusion. I should have phrased it better, perhaps calling them events instead of lines? > Just for fun take the five minutes to install lxml and compare the output of the two scripts. If it's the same now there's no harm switching to lxml, and you are making future failures less likely. I'll take a look at it, thanks. regards, Richard ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] More Pythonic?
richard kappler wrote: > Would either or both of these work, if both, which is the better or more > Pythonic way to do it, and why? > > ### > > import whatIsNeeded > > writefile = open("writefile", 'a') > > with open(readfile, 'r') as f: > for line in f: > if keyword in line: > do stuff > f1.write(line) > else: > f1.write(line) Why do you invoke f1.write() twice? > writefile.close() > > ## > > import whatIsNeeded > > with open(readfile, 'r') as f: > for line in f: > try: > if keyword in line: > do stuff > except: What exceptions are you expecting here? Be explicit. You probably don't want to swallow a KeyboardInterrupt. And if something unexpected goes wrong a noisy complaint gives you the chance to either fix an underlying bug or explicitly handle the exception in future runs of the script. > do nothing That's spelt pass > with open(writefile, 'a') as f1: > f1.write(line) Opening the file once per line written seems over-the-top to me. > ## > > or something else altogether? I tend to put the processing into into a generator. That makes it easy to replace the source or the consumer: def process_lines(instream): for line in instream: if keyword in line: do stuff yield line with open(sourcefile) as instream: with open(destfile, "a") as outstream: outstream.writelines(process_lines(instream)) Now if you want to read from stdin and print to stdout: sys.stdout.writelines(process_lines(sys.stdin)) > I'm thinking the first way is better as it only opens the files once > whereas it seems to me the second script would open and close the > writefile once per iteration, and the do nothing in the except seems just > wrong to me. It's not clear why you need the try...except: pass. Please provide some more background information. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] iterating through a directory
Yes, many questions today. I'm working on a data feed script that feeds 'events' into our test environment. In production, we monitor a camera that captures an image as product passes by, gathers information such as barcodes and package ID from the image, and then sends out the data as a line of xml to one place for further processing and sends the image to another place for storage. Here is where our software takes over, receiving the xml data and images from vendor equipment. Our software then processes the xml data and allows retrieval of specific images associated with each line of xml data. Our test environment must simulate the feed from the vendor equipment, so I'm writing a script that feeds the xml from an actual log to one place for processing, and pulls images from a pool of 20, which we recycle, to associate with each event and sends them to another dir for saving and searching. As my script iterates through each line of xml data (discussed yesterday) to simulate the feed from the vendor camera equipment, it parses ID information about the event then sends the line of data on. As it does so, it needs to pull the next image in line from the image pool directory, rename it, send it to a different directory for saving. I'm pretty solid on all of this except iterating through the image pool. My idea is to just keep looping through the image pool, as each line of xmldata is parsed, the next image in line gets pulled out, renamed with the identifying information from the xml data, and both are sent on to different places. I only have one idea for doing this iterating through the image pool, and frankly I think the idea is pretty weak. The only thing I can come up with is to change the name of each image in the pool from something like 0219PS01CT1_2029_04_00044979.jpg to 1.jpg, 2.jpg, 3.jpg etc., then doing something like: i = 0 here begins my for line in file loop: if i == 20: i = 1 else: i += 1 do stuff with the xml file including get ID info rename i.jpg to IDinfo.jpg send it all on That seems pretty barbaric, any thoughts on where to look for better ideas? I'm presuming there are modules out there that I am unaware of or capabilities I am unaware of in modules I do know a little about that I am missing. -- All internal models of the world are approximate. ~ Sebastian Thrun ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] More Pythonic?
Would either or both of these work, if both, which is the better or more Pythonic way to do it, and why? ### import whatIsNeeded writefile = open("writefile", 'a') with open(readfile, 'r') as f: for line in f: if keyword in line: do stuff f1.write(line) else: f1.write(line) writefile.close() ## import whatIsNeeded with open(readfile, 'r') as f: for line in f: try: if keyword in line: do stuff except: do nothing with open(writefile, 'a') as f1: f1.write(line) ## or something else altogether? I'm thinking the first way is better as it only opens the files once whereas it seems to me the second script would open and close the writefile once per iteration, and the do nothing in the except seems just wrong to me. Is my thinking on target here? regards, Richard -- All internal models of the world are approximate. ~ Sebastian Thrun ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Creating lists with definite (n) items without repetitions
Oscar Benjamin wrote: The problem is that there are 26 people and they are divided into groups of 3 each day. We would like to know if it is possible to arrange it so that each player plays each other player exactly once over some period of days. It is not exactly possible to do this with 26 people in groups of 3. Think about it from the perspective of 1 person. They must play against all 25 other people in pairs with neither of the other people repeated: the set of pairs they play against must partition the set of other players. Clearly it can only work if the number of other players is even. I'm not sure but I think that maybe for an exact solution you need to have n=1(mod6) or n=3(mod6) which gives: n = 1, 3, 7, 9, 13, 15, 19, 21, 25, 27, ... The formula for the number of triples if the exact solution exists is n*(n-1)/6 which comes out as 26*25/6 = 108.3 (the formula doesn't give an integer because the exact solution doesn't exist). A quick solution is to add one "dummy" letter to the pool of the OP's golfers. I used "!" as the dummy one. This way, you end up with 101 triples, 11 of which contain the dummy player. But this is better than the 25-item pool, that resulted in an incomplete set of triples (for example, A would never play with Z) So, in your real-world problem, you will have 11 groups of 2 people instead of 3. Is this a problem? import pprint, itertools pool = "abcdefghijklmnopqrstuvwxyz!" def maketriples(tuplelist): final = [] used = set() for a, b, c in tuplelist: if ( ((a,b) in used) or ((b,c) in used) or ((a,c) in used) or ((b,a) in used) or ((c,b) in used) or ((c,a) in used) ): continue else: final.append((a, b, c)) used.add((a,b)) used.add((a,c)) used.add((b,c)) used.add((b,a)) used.add((c,a)) used.add((c,b)) return final combos = list(itertools.combinations(pool, 3)) print("combos contains %s triples." % len(combos)) triples = maketriples(combos) print("maketriples(combos) contains %s triples." % len(triples)) pprint.pprint(triples) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] find second occurance of string in line
richard kappler wrote: > On Tue, Sep 8, 2015 at 1:40 PM, Peter Otten <__pete...@web.de> wrote: >> I'm inferring from the above that you do not really want the "second" >> timestamp in the line -- there is no line left intace anyway;) -- but >> rather >> the one in the ... part. >> >> Here's a way to get these (all of them) with lxml: >> >> import lxml.etree >> >> tree = lxml.etree.parse("example.xml") >> print tree.xpath("//objectdata/general/timestamp/text()") > No no, I'm not sure how I can be much more clear, that is one (1) line of > xml that I provided, not many, and I really do want what I said in the > very beginning, the second instance of for each of those > lines. It looks likes I was not clear enough: XML doesn't have the concept of lines. When you process XML "by line" you have buggy code. > Got it figured out with guidance from Alan's response though: > > #!/usr/bin/env python > > with open("example.xml", 'r') as f: > for line in f: > if "objectdata" in line: > if "" in line: > x = "" > a = "" > first = line.index(x) > second = line.index(x, first+1) > b = line.index(a) > c = line.index(a, b+1) > y = second + 11 > timestamp = line[y:c] > print timestamp Just for fun take the five minutes to install lxml and compare the output of the two scripts. If it's the same now there's no harm switching to lxml, and you are making future failures less likely. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor