Re: [Tutor] scratching my head
In a message of Wed, 05 Aug 2015 08:43:45 +0200, Peter Otten writes: >Laura Creighton wrote: >but I don't think that's simpler. Can you enlighten me? When I got here, I landed in the middle of a discussion on how to use regexps for solving this. Plus a slew of string handling functions, none of which included endswith, which I think is a fine idea as well. The nice thing about fname is that it handles all the 'are your file names case insensitive' stuff for you, which can be a problem. Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head - still
Cameron Simpson wrote: > On 05Aug2015 12:46, Steven D'Aprano wrote: >>On Tue, Aug 04, 2015 at 05:52:15PM -0700, Clayton Kirkwood wrote: >>> As seen below (closely), some filenames are not being removed while >>> others are, such as in the first stanza, some pdfs are removed, some >>> aren't. In the second stanza, Thumbs.db makes it through, but was caught >>> in the first stanza. (Thanks for those who have proffered solutions to >>> date!) I see no logic in the results. What am I missing??? >> >>You are modifying the list of files while iterating over it, which plays >>all sorts of hell with the process. Watch this: > [... detailed explaination ...] >>The lesson here is that you should never modify a list while iterating >>over it. Instead, make a copy, and modify the copy. > > What Steven said. Yes indeed. > > Untested example suggestion: > > all_filenames = set(filenames) > for filename in filenames: > if .. test here ...: > all_filenames.remove(filename) > print(all_filenames) > > You could use a list instead of a set and for small numbers of files be > fine. With large numbers of files a set is far faster to remove things > from. If the list size is manageable, usually the case for the names of files in one directory, you should not bother about removing items. Just build a new list: all_filenames = [...] matching_filenames = [name for name in all_filenames if test(name)] If the list is huge and you expect that most items will be kept you might try reverse iteration: for i in reversed(range(len(all_filenames))): name = all_filenames[i] if test(name): del all_filenames[i] This avoids both copying the list and the linear search performed by list.remove(). ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
Laura Creighton wrote: > In a message of Mon, 03 Aug 2015 18:22:32 +1000, Cameron Simpson writes: > >>That depends. This is the tutor list; we're helping Clayton debug his code >>as an aid to learning. While it's good to know about the facilities in the >>standard library, pointing him directly at fnmatch (which I'd entirely >>forgotten) is the "give a man a fish" approach to help; a magic black box >>to do the job for him. >> >>Besides, I'm not sure fnmatch is much better for his task than the more >>direct methods being discussed. > > And I am certain. It works exactly as he said he wanted -- a less > cumbersome way to solve this problem, which he thought would be done > some way with a for loop, looping over extensions, instead of the > cumbersome way he is doing things. I suppose you have some way in mind to simplify # version 1, splitext() import os filenames = ["foo.jpg", "bar.PNG", "baz.txt"] EXTENSIONS = {".jpg", ".png"} matching_filenames = [ name for name in filenames if os.path.splitext(name)[1].lower() in EXTENSIONS] print(matching_filenames) with fnmatch. I can only come up with # version 2, fnmatch() import fnmatch filenames = ["foo.jpg", "bar.PNG", "baz.txt"] GLOBS = ["*.jpg", "*.png"] matching_filenames = [ name for name in filenames if any(fnmatch.fnmatch(name.lower(), pat) for pat in GLOBS)] print(matching_filenames) but I don't think that's simpler. Can you enlighten me? Digression: I don't know if str.endswith() was already suggested. I think that is a (small) improvement over the first version # version 3, endswith() filenames = ["foo.jpg", "bar.PNG", "baz.txt"] EXTENSIONS = (".jpg", ".png") matching_filenames = [ name for name in filenames if name.lower().endswith(EXTENSIONS)] print(matching_filenames) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head - still
On 05Aug2015 12:46, Steven D'Aprano wrote: On Tue, Aug 04, 2015 at 05:52:15PM -0700, Clayton Kirkwood wrote: As seen below (closely), some filenames are not being removed while others are, such as in the first stanza, some pdfs are removed, some aren't. In the second stanza, Thumbs.db makes it through, but was caught in the first stanza. (Thanks for those who have proffered solutions to date!) I see no logic in the results. What am I missing??? You are modifying the list of files while iterating over it, which plays all sorts of hell with the process. Watch this: [... detailed explaination ...] The lesson here is that you should never modify a list while iterating over it. Instead, make a copy, and modify the copy. What Steven said. Yes indeed. Untested example suggestion: all_filenames = set(filenames) for filename in filenames: if .. test here ...: all_filenames.remove(filename) print(all_filenames) You could use a list instead of a set and for small numbers of files be fine. With large numbers of files a set is far faster to remove things from. Cheers, Cameron Simpson In the desert, you can remember your name, 'cause there ain't no one for to give you no pain. - America ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head - still
On Tue, Aug 04, 2015 at 05:52:15PM -0700, Clayton Kirkwood wrote: > As seen below (closely), some filenames are not being removed while others > are, such as in the first stanza, some pdfs are removed, some aren't. In the > second stanza, Thumbs.db makes it through, but was caught in the first > stanza. (Thanks for those who have proffered solutions to date!) > I see no logic in the results. What am I missing??? You are modifying the list of files while iterating over it, which plays all sorts of hell with the process. Watch this: py> alist = [1, 2, 3, 4, 5, 6, 7, 8] py> for n in alist: ... if n%2 == 0: # even number ... alist.remove(n) ... print(n) ... 1 2 4 6 8 py> print(alist) [1, 3, 5, 7] If you pretend to be the Python interpreter, and simulate the process yourself, you'll see the same thing. Imagine that there is a pointer to the current item in the list. First time through the loop, it points to the first item, and you print the value and move on: [>1, 2, 3, 4, 5, 6, 7, 8] print 1 The second time through the loop: [1, >2, 3, 4, 5, 6, 7, 8] remove the 2, leaves the pointer pointing at three: [1, >3, 4, 5, 6, 7, 8] print 2 Third time through the loop, we move on to the next value: [1, 3, >4, 5, 6, 7, 8] remove the 4, leaves the pointer pointing at five: [1, 3, >5, 6, 7, 8] print 4 and so on. The lesson here is that you should never modify a list while iterating over it. Instead, make a copy, and modify the copy. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head - still
As seen below (closely), some filenames are not being removed while others are, such as in the first stanza, some pdfs are removed, some aren't. In the second stanza, Thumbs.db makes it through, but was caught in the first stanza. (Thanks for those who have proffered solutions to date!) I see no logic in the results. What am I missing??? TIA, Clayton import os from os.path import join, getsize, splitext main_dir = "/users/Clayton/Pictures" directory_file_list = {} duplicate_files = 0 top_directory_file_list = 0 for dir_path, directories, filenames in os.walk(main_dir): print( "filenames = ", filenames, "\n" ) for filename in filenames: prefix, ext = splitext(filename) if not (ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4', 'mov', 'bmp') ): print( "deleting filename ", filename, " because ", ext[1:].lower(), " doesn't contain .jpg or .png or .avi or .mp4 or .bmp" ) filenames.remove(filename) print("\nfilenames - bad exts:\n", filenames ) produces: filenames = ['.picasa.ini', '2010-11-02 15.58.30.jpg', '2010-11-02 15.58.45.jpg', '2010-11-25 09.42.59.jpg', '2011-03-19 19.32.09.jpg', '2011-05-28 17.13.38.jpg', '2011-05-28 17.26.37.jpg', '2012-02-02 20.16.46.jpg', '218.JPG', 'desktop.ini', 'Guide ENG.pdf', 'Guide FRE.pdf', 'Guide GER.pdf', 'Guide Ita.pdf', 'Guide Spa.pdf', 'honda accident 001.jpg', 'honda accident 002.jpg', 'honda accident 003.jpg', 'honda accident 004.jpg', 'honda accident 005.jpg', 'honda accident 006.jpg', 'honda accident 007.jpg', 'Image (1).jpg', 'Image.jpg', 'IMG.jpg', 'IMG3.jpg', 'IMG00040.jpg', 'IMG00058.jpg', 'IMG_0003.jpg', 'IMG_0004.jpg', 'IMG_0005.jpg', 'IMG_0007.jpg', 'IMG_0008.jpg', 'IMG_0009.jpg', 'IMG_0010.jpg', 'Mak diploma handshake.jpg', 'New Picture.bmp', 'OneNote Table Of Contents (2).onetoc2', 'temp 121.jpg', 'temp 122.jpg', 'temp 220.jpg', 'temp 320.jpg', 'temp 321.jpg', 'temp 322.jpg', 'temp 323.jpg', 'temp 324.jpg', 'temp 325.jpg', 'temp 326.jpg', 'temp 327.jpg', 'temp 328.jpg', 'temp 329.jpg', 'temp 330.jpg', 'temp 331.jpg', 'temp 332.jpg', 'temp 333.jpg', 'temp 334.jpg', 'temp 335.jpg', 'temp 336.jpg', 'temp 337.jpg', 'temp 338.jpg', 'temp 339.jpg', 'temp 340.jpg', 'temp 341.jpg', 'temp 342.jpg', 'temp 343.jpg', 'Thumbs.db'] deleting filename .picasa.ini because ini doesn't contain .jpg or .png or .avi or .mp4 or .bmp deleting filename desktop.ini because ini doesn't contain .jpg or .png or .avi or .mp4 or .bmp deleting filename Guide FRE.pdf because pdf doesn't contain .jpg or .png or .avi or .mp4 or .bmp deleting filename Guide Ita.pdf because pdf doesn't contain .jpg or .png or .avi or .mp4 or .bmp deleting filename OneNote Table Of Contents (2).onetoc2 because onetoc2 doesn't contain .jpg or .png or .avi or .mp4 or .bmp deleting filename Thumbs.db because db doesn't contain .jpg or .png or .avi or .mp4 or .bmp filenames - bad exts: ['2010-11-02 15.58.30.jpg', '2010-11-02 15.58.45.jpg', '2010-11-25 09.42.59.jpg', '2011-03-19 19.32.09.jpg', '2011-05-28 17.13.38.jpg', '2011-05-28 17.26.37.jpg', '2012-02-02 20.16.46.jpg', '218.JPG', 'Guide ENG.pdf', 'Guide GER.pdf', 'Guide Spa.pdf', 'honda accident 001.jpg', 'honda accident 002.jpg', 'honda accident 003.jpg', 'honda accident 004.jpg', 'honda accident 005.jpg', 'honda accident 006.jpg', 'honda accident 007.jpg', 'Image (1).jpg', 'Image.jpg', 'IMG.jpg', 'IMG3.jpg', 'IMG00040.jpg', 'IMG00058.jpg', 'IMG_0003.jpg', 'IMG_0004.jpg', 'IMG_0005.jpg', 'IMG_0007.jpg', 'IMG_0008.jpg', 'IMG_0009.jpg', 'IMG_0010.jpg', 'Mak diploma handshake.jpg', 'New Picture.bmp', 'temp 121.jpg', 'temp 122.jpg', 'temp 220.jpg', 'temp 320.jpg', 'temp 321.jpg', 'temp 322.jpg', 'temp 323.jpg', 'temp 324.jpg', 'temp 325.jpg', 'temp 326.jpg', 'temp 327.jpg', 'temp 328.jpg', 'temp 329.jpg', 'temp 330.jpg', 'temp 331.jpg', 'temp 332.jpg', 'temp 333.jpg', 'temp 334.jpg', 'temp 335.jpg', 'temp 336.jpg', 'temp 337.jpg', 'temp 338.jpg', 'temp 339.jpg', 'temp 340.jpg', 'temp 341.jpg', 'temp 342.jpg', 'temp 343.jpg'] filenames = ['IMG_0028.JPG', 'IMG_0031.JPG', 'IMG_0032.JPG', 'IMG_0035.JPG', 'IMG_0037.JPG', 'IMG_0039.JPG', 'OneNote Table Of Contents.onetoc2', 'Thumbs.db', 'ZbThumbnail.info'] deleting filename OneNote Table Of Contents.onetoc2 because onetoc2 doesn't contain .jpg or .png or .avi or .mp4 or .bmp deleting filename ZbThumbnail.info because info doesn't contain .jpg or .png or .avi or .mp4 or .bmp filenames - bad exts: ['IMG_0028.JPG', 'IMG_0031.JPG', 'IMG_0032.JPG', 'IMG_0035.JPG', 'IMG_0037.JPG', 'IMG_0039.JPG', 'Thumbs.db'] ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
On 8/3/2015 1:22 AM, Cameron Simpson wrote: That depends. This is the tutor list; we're helping Clayton debug his code as an aid to learning. While it's good to know about the facilities in the standard library, pointing him directly at fnmatch (which I'd entirely forgotten) is the "give a man a fish" approach to help; a magic black box to do the job for him. Sometimes a fish of three or four lines that replaces a 20 line effort might be better considered as a solution to be teased apart and understood. Emile ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
In a message of Mon, 03 Aug 2015 18:22:32 +1000, Cameron Simpson writes: >That depends. This is the tutor list; we're helping Clayton debug his code as >an aid to learning. While it's good to know about the facilities in the >standard library, pointing him directly at fnmatch (which I'd entirely >forgotten) is the "give a man a fish" approach to help; a magic black box to >do >the job for him. > >Besides, I'm not sure fnmatch is much better for his task than the more direct >methods being discussed. And I am certain. It works exactly as he said he wanted -- a less cumbersome way to solve this problem, which he thought would be done some way with a for loop, looping over extensions, instead of the cumbersome way he is doing things. His design sense was perfectly fine; there is an elegant way to solve the problem precisely along the lines he imagined -- he just wasn't aware of this bit of the standard library. There is no particular virtue in teaching somebody how to build a pneumatic drill in order to crack walnuts. Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
On 03Aug2015 08:12, Laura Creighton wrote: I think people are giving you sub-optimal advice. Python has a module in the standard library for doing exactly what you want to do -- match files with certain extensions. See: https://docs.python.org/2/library/fnmatch.html It's unix style file matching, but I am fairly certain this works on windows also. I don't have a windows machine to test and make sure. That depends. This is the tutor list; we're helping Clayton debug his code as an aid to learning. While it's good to know about the facilities in the standard library, pointing him directly at fnmatch (which I'd entirely forgotten) is the "give a man a fish" approach to help; a magic black box to do the job for him. Besides, I'm not sure fnmatch is much better for his task than the more direct methods being discussed. Cheers, Cameron Simpson ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
2015-08-02 23:44 GMT+02:00 Clayton Kirkwood : > > > for dir_path, directories, files in os.walk(main_dir): > for file in files: > #print( " file = ", file) > # if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ): > > > I supppose you want to use regular expressions here and you are somehow familiar with them but you forgot to tell Python to handle your string as regex. This kind of expression must be matched against filenames instead of using "in" operator. In this case, https://docs.python.org/3/library/re.html and https://docs.python.org/3/howto/regex.html are your friends. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
I think people are giving you sub-optimal advice. Python has a module in the standard library for doing exactly what you want to do -- match files with certain extensions. See: https://docs.python.org/2/library/fnmatch.html It's unix style file matching, but I am fairly certain this works on windows also. I don't have a windows machine to test and make sure. Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
> -Original Message- > From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On > Behalf Of Steven D'Aprano > Sent: Sunday, August 02, 2015 5:49 PM > To: tutor@python.org > Subject: Re: [Tutor] scratching my head > > On Sun, Aug 02, 2015 at 02:44:15PM -0700, Clayton Kirkwood wrote: > > > for dir_path, directories, files in os.walk(main_dir): > > for file in files: > > #print( " file = ", file) > > # if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ): > > #if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower() > > name, ext = os.path.splitext(filename) > if ext.lower() in ('.jpg', '.png', '.avi', '.mp4'): > ... > > > > #del files[file] > > # > > #I get an error on int expected here. If I'm able to access by string, > > why wouldn't I be able to #acess in the del? > > What are you attempting to do here? files is a list of file names: > > files = ['this.jpg', 'that.txt', 'other.pdf'] filename = 'that.txt' > > What do you expect files['that.txt'] to do? > > The problem has nothing to do with del, the problem is that you are trying to > access the 'that.txt'-th item of a list, and that is meaningless. Well, I was expecting that the list entry would be deleted. In other parts of my code I am using filenames as the index of lists: list[filenames] for for loops and some ifs where it appears to work. I am able to look at directories and the files in them by doing this. Check the rest of my original code. I had one if that complained at the bottom of my code that complained that the index was supposed to be an in not the list element value. So I get that the index is supposed to be an int, and I think what is happening in much of the code is the filename somehow becomes an int and then the list accesses that way. It's very confusing. Basically, I was using filenames as indexes into the list. > > > > print( "looking at file ", file, " in > > top_directory_file_list ", top_directory_file_list ) > > What does this print? In particular, what does the last part, > top_directory_file_list, print? Because the next error: > > > if file in top_directory_file_list: > > #error: arg of type int not iterable > > is clear that it is an int. > > > #yet it works for the for loops > > I think you are confusing: > > top_directory_file_list > > directory_file_list I don't know. If you look at the code that is going thru the directory filename by filename the prints kick out filename and directories and the list elements are addressed by "strings", the actual filenames. What is happening in most of the code looks like what one would expect if the lists could be indexed by words not ints. As a programmer, I would expect lists to be addressed via a name or a number. It seems kind of like dicktionaries. Am I mixing dictionaries and list? Clayton > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
> -Original Message- > From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On > Behalf Of Cameron Simpson > Sent: Sunday, August 02, 2015 6:03 PM > To: tutor@python.org > Subject: Re: [Tutor] scratching my head > > On 02Aug2015 16:15, Clayton Kirkwood wrote: > >> Behalf Of Cameron Simpson > >> Sent: Sunday, August 02, 2015 3:35 PM > [...] > >> Personally I'd be reaching for os.path.splitext. Untested example below: > >> > >> from os.path import splitext > >> > >> for dir_path, directories, files in os.walk(main_dir): > >> for file in files: > >> prefix, ext = splitext(file) > >> if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'): > >> > >> > >> which I think is much easier to read. > >> > >> BTW, I'd be using the variable names "filename" and "filenames" > >> instead of "file" and "files": in python 2 "file" is a builtin > >> function (though long deprecated by "open()") and in any case I'd > >> (personally) expect such a > >name > >> to be an _open_ file. As opposed to "filename", which is clearer. > > > >Thanks, that should also help a lot. Now time to look at splitext, and > >the ext and ext[1:. > > The "[1:]" is because "ext" will include the dot. Yeah, after looking it up, it became clear, but thanks! > > >I appreciate your comments also about the variable names. > >Any comments on the problems lower in the file? > > Maybe you'd better reraise these problems again explicitly. Point taken. > > Cheers, > Cameron Simpson > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
On 02Aug2015 16:15, Clayton Kirkwood wrote: Behalf Of Cameron Simpson Sent: Sunday, August 02, 2015 3:35 PM [...] Personally I'd be reaching for os.path.splitext. Untested example below: from os.path import splitext for dir_path, directories, files in os.walk(main_dir): for file in files: prefix, ext = splitext(file) if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'): which I think is much easier to read. BTW, I'd be using the variable names "filename" and "filenames" instead of "file" and "files": in python 2 "file" is a builtin function (though long deprecated by "open()") and in any case I'd (personally) expect such a name to be an _open_ file. As opposed to "filename", which is clearer. Thanks, that should also help a lot. Now time to look at splitext, and the ext and ext[1:. The "[1:]" is because "ext" will include the dot. I appreciate your comments also about the variable names. Any comments on the problems lower in the file? Maybe you'd better reraise these problems again explicitly. Cheers, Cameron Simpson ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
On Sun, Aug 02, 2015 at 11:46:31PM +0100, Alan Gauld wrote: > On 02/08/15 23:01, Alan Gauld wrote: > > >found = False > >for s in (".jpg",".png",".avi",".mp4"): > > found = test or (s in file.lower()) > > Oops, that should be: > > found = found or (s in file.lower()) extensions = (".jpg",".png",".avi",".mp4") found = any(s in filename.lower() for s in extensions) but that's still wrong, because it will find files like History.of.Avis.PA.pdf as if it were an AVI file. Instead, use os.path.splitext. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
On Sun, Aug 02, 2015 at 02:44:15PM -0700, Clayton Kirkwood wrote: > for dir_path, directories, files in os.walk(main_dir): > for file in files: > #print( " file = ", file) > # if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ): > #if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower() name, ext = os.path.splitext(filename) if ext.lower() in ('.jpg', '.png', '.avi', '.mp4'): ... > #del files[file] > # > #I get an error on int expected here. If I'm able to access by string, why > wouldn't I be able to > #acess in the del? What are you attempting to do here? files is a list of file names: files = ['this.jpg', 'that.txt', 'other.pdf'] filename = 'that.txt' What do you expect files['that.txt'] to do? The problem has nothing to do with del, the problem is that you are trying to access the 'that.txt'-th item of a list, and that is meaningless. > print( "looking at file ", file, " in top_directory_file_list ", > top_directory_file_list ) What does this print? In particular, what does the last part, top_directory_file_list, print? Because the next error: > if file in top_directory_file_list: > #error: arg of type int not iterable is clear that it is an int. > #yet it works for the for loops I think you are confusing: top_directory_file_list directory_file_list ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
> -Original Message- > From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On > Behalf Of Cameron Simpson > Sent: Sunday, August 02, 2015 3:35 PM > To: tutor@python.org > Subject: Re: [Tutor] scratching my head > > On 02Aug2015 23:01, ALAN GAULD wrote: > >On 02/08/15 22:44, Clayton Kirkwood wrote: > >>for dir_path, directories, files in os.walk(main_dir): > >> for file in files: > >>#print( " file = ", file) > >># if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ): > > > >Python sees that as a single string. That string is not in your filename. > > > >>#if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower() > [...] > >But you could use a loop: > > > >found = False > >for s in (".jpg",".png",".avi",".mp4"): > >found = test or (s in file.lower()) if not found: ... > > > >> if( ".jpg" not in file.lower() and > >> ".png" not in file.lower() and > >> ".avi" not in file.lower() and > >> ".mp4" not in file.lower() ): > > > >Whether that's any better than your combined test is a moot point. > > Alan has commented extensively on the logic/implementation errors. I have > a suggestion. > > Personally I'd be reaching for os.path.splitext. Untested example below: > > from os.path import splitext > > for dir_path, directories, files in os.walk(main_dir): > for file in files: > prefix, ext = splitext(file) > if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'): > > > which I think is much easier to read. > > BTW, I'd be using the variable names "filename" and "filenames" instead of > "file" and "files": in python 2 "file" is a builtin function (though long > deprecated by "open()") and in any case I'd (personally) expect such a name > to be an _open_ file. As opposed to "filename", which is clearer. Thanks, that should also help a lot. Now time to look at splitext, and the ext and ext[1:. I appreciate your comments also about the variable names. Any comments on the problems lower in the file? Clayton > > Cheers, > Cameron Simpson > > Rudin's Law: > If there is a wrong way to do something, most people will do it every time. > Rudin's Second Law: > In a crisis that forces a choice to be made among alternative courses of > action, people tend to choose the worst possible course. > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
On 02/08/15 23:01, Alan Gauld wrote: found = False for s in (".jpg",".png",".avi",".mp4"): found = test or (s in file.lower()) Oops, that should be: found = found or (s in file.lower()) Sorry, 'test' was my first choice of name but I changed it to found later. But not everywhere :-( -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
On 02Aug2015 23:01, ALAN GAULD wrote: On 02/08/15 22:44, Clayton Kirkwood wrote: for dir_path, directories, files in os.walk(main_dir): for file in files: #print( " file = ", file) # if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ): Python sees that as a single string. That string is not in your filename. #if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower() [...] But you could use a loop: found = False for s in (".jpg",".png",".avi",".mp4"): found = test or (s in file.lower()) if not found: ... if( ".jpg" not in file.lower() and ".png" not in file.lower() and ".avi" not in file.lower() and ".mp4" not in file.lower() ): Whether that's any better than your combined test is a moot point. Alan has commented extensively on the logic/implementation errors. I have a suggestion. Personally I'd be reaching for os.path.splitext. Untested example below: from os.path import splitext for dir_path, directories, files in os.walk(main_dir): for file in files: prefix, ext = splitext(file) if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'): which I think is much easier to read. BTW, I'd be using the variable names "filename" and "filenames" instead of "file" and "files": in python 2 "file" is a builtin function (though long deprecated by "open()") and in any case I'd (personally) expect such a name to be an _open_ file. As opposed to "filename", which is clearer. Cheers, Cameron Simpson Rudin's Law: If there is a wrong way to do something, most people will do it every time. Rudin's Second Law: In a crisis that forces a choice to be made among alternative courses of action, people tend to choose the worst possible course. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
> -Original Message- > From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On > Behalf Of Alan Gauld > Sent: Sunday, August 02, 2015 3:01 PM > To: tutor@python.org > Subject: Re: [Tutor] scratching my head > > On 02/08/15 22:44, Clayton Kirkwood wrote: > > > for dir_path, directories, files in os.walk(main_dir): > > for file in files: > > #print( " file = ", file) > > # if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ): > > Python sees that as a single string. That string is not in your filename. > > > #if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower() > > Python sees that as a boolean expression so will try to work it out as a > True/False value. Since a non empty string is considered True and the first > True expression makes an OR opeation True overall it returns ".jpg" and tests > if it is not in the filename. > > > #except by the drudgery below. I should be able to just have a list, > > maybe from a file, that lists all > > You might think so but that's not how 'in' works. > > But you could use a loop: > > found = False > for s in (".jpg",".png",".avi",".mp4"): > found = test or (s in file.lower()) if not found: ... The for is much better and it's able to get input from a file. I would think Python more sensible if something like my commented one would work. That would make more sense to me. Thanks > > > if( ".jpg" not in file.lower() and > > ".png" not in file.lower() and > > ".avi" not in file.lower() and > > ".mp4" not in file.lower() ): > > Whether that's any better than your combined test is a moot point. > > HTH > -- > Alan G > Author of the Learn to Program web site > http://www.alan-g.me.uk/ > http://www.amazon.com/author/alan_gauld > Follow my photo-blog on Flickr at: > http://www.flickr.com/photos/alangauldphotos > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] scratching my head
On 02/08/15 22:44, Clayton Kirkwood wrote: for dir_path, directories, files in os.walk(main_dir): for file in files: #print( " file = ", file) # if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ): Python sees that as a single string. That string is not in your filename. #if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower() Python sees that as a boolean expression so will try to work it out as a True/False value. Since a non empty string is considered True and the first True expression makes an OR opeation True overall it returns ".jpg" and tests if it is not in the filename. #except by the drudgery below. I should be able to just have a list, maybe from a file, that lists all You might think so but that's not how 'in' works. But you could use a loop: found = False for s in (".jpg",".png",".avi",".mp4"): found = test or (s in file.lower()) if not found: ... if( ".jpg" not in file.lower() and ".png" not in file.lower() and ".avi" not in file.lower() and ".mp4" not in file.lower() ): Whether that's any better than your combined test is a moot point. HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] scratching my head
Hey, been awhile, but I ran into os.walk and it fit what I needed to do for an issue I've had for a long time: I have tons of pictures in my top directory of pictures which are duplicated into properly named subdirectories. Please see issues above my questions with large gaps below. TIA, Clayton #Program to find duplicated pictures in my picture directory tree #Presumably, if the file exists in a subdirectory I can remove if from the parent picture directory # #Clayton Kirkwood #01Aug15 import os from os.path import join, getsize main_dir = "/users/Clayton/Pictures" directory_file_list = {} duplicate_files = 0 top_directory_file_list = 0 for dir_path, directories, files in os.walk(main_dir): for file in files: #print( " file = ", file) # if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ): #if( (".jpg" or ".png" or ".avi" or ".mp4" ) not in file.lower() ): # #why don't these work?, especially the last one. How am I to capture all camera and video types #except by the drudgery below. I should be able to just have a list, maybe from a file, that lists all #off the types and do something like if master_list not in file.lower() if( ".jpg" not in file.lower() and ".png" not in file.lower() and ".avi" not in file.lower() and ".mp4" not in file.lower() ): print( "file ", file, "doesn't contain .jpg or .png or .avi or .mp4" ) #del files[file] # #I get an error on int expected here. If I'm able to access by string, why wouldn't I be able to #acess in the del? directory_file_list[dir_path] = files #this is a list #print(dir_path, directory_file_list[dir_path]) #print( main_dir ) for directory_path in directory_file_list.keys(): if( directory_path == main_dir ): top_directory_file_list = directory_file_list[directory_path] continue #print( directory_path, ":", directory_file_list[directory_path]) file_list = directory_file_list[directory_path] #print(file_list) for file in file_list: #pass print( "looking at file ", file, " in top_directory_file_list ", top_directory_file_list ) if file in top_directory_file_list: #error: arg of type int not iterable #yet it works for the for loops print( "file ", file, " found in both directory_path ", directory_path, " and ", main_dir) duplicate_files =+ 1 pass break ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor