Re: [Tutor] scratching my head

2015-08-05 Thread Laura Creighton
In a message of Wed, 05 Aug 2015 08:43:45 +0200, Peter Otten writes:
>Laura Creighton wrote:
>but I don't think that's simpler. Can you enlighten me?

When I got here, I landed in the middle of a discussion on how to
use regexps for solving this.  Plus a slew of string handling
functions, none of which included endswith, which I think is a
fine idea as well.

The nice thing about fname is that it handles all the 'are your
file names case insensitive' stuff for you, which can be a problem.

Laura

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head - still

2015-08-05 Thread Peter Otten
Cameron Simpson wrote:

> On 05Aug2015 12:46, Steven D'Aprano  wrote:
>>On Tue, Aug 04, 2015 at 05:52:15PM -0700, Clayton Kirkwood wrote:
>>> As seen below (closely), some filenames are not being removed while
>>> others are, such as in the first stanza, some pdfs are removed, some
>>> aren't. In the second stanza, Thumbs.db makes it through, but was caught
>>> in the first stanza. (Thanks for those who have proffered solutions to
>>> date!) I see no logic in the results. What am I missing???
>>
>>You are modifying the list of files while iterating over it, which plays
>>all sorts of hell with the process. Watch this:
> [... detailed explaination ...]
>>The lesson here is that you should never modify a list while iterating
>>over it. Instead, make a copy, and modify the copy.
> 
> What Steven said. Yes indeed.
> 
> Untested example suggestion:
> 
>   all_filenames = set(filenames)
>   for filename in filenames:
> if .. test here ...:
>   all_filenames.remove(filename)
>   print(all_filenames)
> 
> You could use a list instead of a set and for small numbers of files be
> fine. With large numbers of files a set is far faster to remove things
> from.

If the list size is manageable, usually the case for the names of files in 
one directory, you should not bother about removing items. Just build a new 
list:

all_filenames = [...]
matching_filenames = [name for name in all_filenames if test(name)]

If the list is huge and you expect that most items will be kept you might 
try reverse iteration:

for i in reversed(range(len(all_filenames))):
name = all_filenames[i]
if test(name):
del all_filenames[i]

This avoids both copying the list and the linear search performed by 
list.remove().

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-04 Thread Peter Otten
Laura Creighton wrote:

> In a message of Mon, 03 Aug 2015 18:22:32 +1000, Cameron Simpson writes:
> 
>>That depends. This is the tutor list; we're helping Clayton debug his code
>>as an aid to learning. While it's good to know about the facilities in the
>>standard library, pointing him directly at fnmatch (which I'd entirely
>>forgotten) is the "give a man a fish" approach to help; a magic black box
>>to do the job for him.
>>
>>Besides, I'm not sure fnmatch is much better for his task than the more
>>direct methods being discussed.
> 
> And I am certain.  It works exactly as he said he wanted -- a less
> cumbersome way to solve this problem, which he thought would be done
> some way with a for loop, looping over extensions, instead of the
> cumbersome way he is doing things.

I suppose you have some way in mind to simplify

# version 1, splitext()
import os

filenames = ["foo.jpg", "bar.PNG", "baz.txt"]
EXTENSIONS = {".jpg", ".png"}
matching_filenames = [
name for name in filenames 
if os.path.splitext(name)[1].lower() in EXTENSIONS]
print(matching_filenames)

with fnmatch. I can only come up with

# version 2, fnmatch()
import fnmatch
filenames = ["foo.jpg", "bar.PNG", "baz.txt"]
GLOBS = ["*.jpg", "*.png"]
matching_filenames = [
name for name in filenames 
if any(fnmatch.fnmatch(name.lower(), pat) for pat in GLOBS)]
print(matching_filenames)

but I don't think that's simpler. Can you enlighten me?

Digression: I don't know if str.endswith() was already suggested. I think 
that is a (small) improvement over the first version

# version 3, endswith()
filenames = ["foo.jpg", "bar.PNG", "baz.txt"]
EXTENSIONS = (".jpg", ".png")
matching_filenames = [
name for name in filenames 
if name.lower().endswith(EXTENSIONS)]
print(matching_filenames)


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head - still

2015-08-04 Thread Cameron Simpson

On 05Aug2015 12:46, Steven D'Aprano  wrote:

On Tue, Aug 04, 2015 at 05:52:15PM -0700, Clayton Kirkwood wrote:

As seen below (closely), some filenames are not being removed while others
are, such as in the first stanza, some pdfs are removed, some aren't. In the
second stanza, Thumbs.db makes it through, but was caught in the first
stanza. (Thanks for those who have proffered solutions to date!)
I see no logic in the results. What am I missing???


You are modifying the list of files while iterating over it, which plays
all sorts of hell with the process. Watch this:

[... detailed explaination ...]

The lesson here is that you should never modify a list while iterating
over it. Instead, make a copy, and modify the copy.


What Steven said. Yes indeed.

Untested example suggestion:

 all_filenames = set(filenames)
 for filename in filenames:
   if .. test here ...:
 all_filenames.remove(filename)
 print(all_filenames)

You could use a list instead of a set and for small numbers of files be fine.  
With large numbers of files a set is far faster to remove things from.


Cheers,
Cameron Simpson 

In the desert, you can remember your name,
'cause there ain't no one for to give you no pain.  - America
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head - still

2015-08-04 Thread Steven D'Aprano
On Tue, Aug 04, 2015 at 05:52:15PM -0700, Clayton Kirkwood wrote:
> As seen below (closely), some filenames are not being removed while others
> are, such as in the first stanza, some pdfs are removed, some aren't. In the
> second stanza, Thumbs.db makes it through, but was caught in the first
> stanza. (Thanks for those who have proffered solutions to date!)
> I see no logic in the results. What am I missing???

You are modifying the list of files while iterating over it, which plays 
all sorts of hell with the process. Watch this:

py> alist = [1, 2, 3, 4, 5, 6, 7, 8]
py> for n in alist:
... if n%2 == 0:  # even number
... alist.remove(n)
... print(n)
...
1
2
4
6
8
py> print(alist)
[1, 3, 5, 7]



If you pretend to be the Python interpreter, and simulate the process 
yourself, you'll see the same thing. Imagine that there is a pointer to 
the current item in the list. First time through the loop, it points to 
the first item, and you print the value and move on:

[>1, 2, 3, 4, 5, 6, 7, 8]
print 1

The second time through the loop:

[1, >2, 3, 4, 5, 6, 7, 8]
remove the 2, leaves the pointer pointing at three: 
[1, >3, 4, 5, 6, 7, 8]
print 2

Third time through the loop, we move on to the next value:

[1, 3, >4, 5, 6, 7, 8]
remove the 4, leaves the pointer pointing at five: 
[1, 3, >5, 6, 7, 8]
print 4

and so on.

The lesson here is that you should never modify a list while iterating 
over it. Instead, make a copy, and modify the copy.



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head - still

2015-08-04 Thread Clayton Kirkwood
As seen below (closely), some filenames are not being removed while others
are, such as in the first stanza, some pdfs are removed, some aren't. In the
second stanza, Thumbs.db makes it through, but was caught in the first
stanza. (Thanks for those who have proffered solutions to date!)
I see no logic in the results. What am I missing???
TIA, Clayton


import os
from os.path import join, getsize, splitext

main_dir = "/users/Clayton/Pictures"
directory_file_list = {}
duplicate_files = 0
top_directory_file_list = 0

for dir_path, directories, filenames in os.walk(main_dir):
print( "filenames = ", filenames, "\n" )
for filename in filenames:
prefix, ext = splitext(filename)
if not (ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4',
'mov', 'bmp') ):
print( "deleting filename ", filename, "  because ",
ext[1:].lower(), " doesn't contain .jpg or .png or .avi or .mp4 or .bmp" )
filenames.remove(filename)

print("\nfilenames - bad exts:\n", filenames )


produces:

filenames =  ['.picasa.ini', '2010-11-02 15.58.30.jpg', '2010-11-02
15.58.45.jpg', '2010-11-25 09.42.59.jpg', '2011-03-19 19.32.09.jpg',
'2011-05-28 17.13.38.jpg', '2011-05-28 17.26.37.jpg', '2012-02-02
20.16.46.jpg', '218.JPG', 'desktop.ini', 'Guide ENG.pdf', 'Guide FRE.pdf',
'Guide GER.pdf', 'Guide Ita.pdf', 'Guide Spa.pdf', 'honda accident 001.jpg',
'honda accident 002.jpg', 'honda accident 003.jpg', 'honda accident
004.jpg', 'honda accident 005.jpg', 'honda accident 006.jpg', 'honda
accident 007.jpg', 'Image (1).jpg', 'Image.jpg', 'IMG.jpg', 'IMG3.jpg',
'IMG00040.jpg', 'IMG00058.jpg', 'IMG_0003.jpg', 'IMG_0004.jpg',
'IMG_0005.jpg', 'IMG_0007.jpg', 'IMG_0008.jpg', 'IMG_0009.jpg',
'IMG_0010.jpg', 'Mak diploma handshake.jpg', 'New Picture.bmp', 'OneNote
Table Of Contents (2).onetoc2', 'temp 121.jpg', 'temp 122.jpg', 'temp
220.jpg', 'temp 320.jpg', 'temp 321.jpg', 'temp 322.jpg', 'temp 323.jpg',
'temp 324.jpg', 'temp 325.jpg', 'temp 326.jpg', 'temp 327.jpg', 'temp
328.jpg', 'temp 329.jpg', 'temp 330.jpg', 'temp 331.jpg', 'temp 332.jpg',
'temp 333.jpg', 'temp 334.jpg', 'temp 335.jpg', 'temp 336.jpg', 'temp
337.jpg', 'temp 338.jpg', 'temp 339.jpg', 'temp 340.jpg', 'temp 341.jpg',
'temp 342.jpg', 'temp 343.jpg', 'Thumbs.db'] 

deleting filename  .picasa.ini   because  ini  doesn't contain .jpg or .png
or .avi or .mp4 or .bmp
deleting filename  desktop.ini   because  ini  doesn't contain .jpg or .png
or .avi or .mp4 or .bmp
deleting filename  Guide FRE.pdf   because  pdf  doesn't contain .jpg or
.png or .avi or .mp4 or .bmp
deleting filename  Guide Ita.pdf   because  pdf  doesn't contain .jpg or
.png or .avi or .mp4 or .bmp
deleting filename  OneNote Table Of Contents (2).onetoc2   because  onetoc2
doesn't contain .jpg or .png or .avi or .mp4 or .bmp
deleting filename  Thumbs.db   because  db  doesn't contain .jpg or .png or
.avi or .mp4 or .bmp

filenames - bad exts:
 ['2010-11-02 15.58.30.jpg', '2010-11-02 15.58.45.jpg', '2010-11-25
09.42.59.jpg', '2011-03-19 19.32.09.jpg', '2011-05-28 17.13.38.jpg',
'2011-05-28 17.26.37.jpg', '2012-02-02 20.16.46.jpg', '218.JPG', 'Guide
ENG.pdf', 'Guide GER.pdf', 'Guide Spa.pdf', 'honda accident 001.jpg', 'honda
accident 002.jpg', 'honda accident 003.jpg', 'honda accident 004.jpg',
'honda accident 005.jpg', 'honda accident 006.jpg', 'honda accident
007.jpg', 'Image (1).jpg', 'Image.jpg', 'IMG.jpg', 'IMG3.jpg',
'IMG00040.jpg', 'IMG00058.jpg', 'IMG_0003.jpg', 'IMG_0004.jpg',
'IMG_0005.jpg', 'IMG_0007.jpg', 'IMG_0008.jpg', 'IMG_0009.jpg',
'IMG_0010.jpg', 'Mak diploma handshake.jpg', 'New Picture.bmp', 'temp
121.jpg', 'temp 122.jpg', 'temp 220.jpg', 'temp 320.jpg', 'temp 321.jpg',
'temp 322.jpg', 'temp 323.jpg', 'temp 324.jpg', 'temp 325.jpg', 'temp
326.jpg', 'temp 327.jpg', 'temp 328.jpg', 'temp 329.jpg', 'temp 330.jpg',
'temp 331.jpg', 'temp 332.jpg', 'temp 333.jpg', 'temp 334.jpg', 'temp
335.jpg', 'temp 336.jpg', 'temp 337.jpg', 'temp 338.jpg', 'temp 339.jpg',
'temp 340.jpg', 'temp 341.jpg', 'temp 342.jpg', 'temp 343.jpg'] 

filenames =  ['IMG_0028.JPG', 'IMG_0031.JPG', 'IMG_0032.JPG',
'IMG_0035.JPG', 'IMG_0037.JPG', 'IMG_0039.JPG', 'OneNote Table Of
Contents.onetoc2', 'Thumbs.db', 'ZbThumbnail.info'] 

deleting filename  OneNote Table Of Contents.onetoc2   because  onetoc2
doesn't contain .jpg or .png or .avi or .mp4 or .bmp
deleting filename  ZbThumbnail.info   because  info  doesn't contain .jpg or
.png or .avi or .mp4 or .bmp

filenames - bad exts:
 ['IMG_0028.JPG', 'IMG_0031.JPG', 'IMG_0032.JPG', 'IMG_0035.JPG',
'IMG_0037.JPG', 'IMG_0039.JPG', 'Thumbs.db'] 


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-04 Thread Emile van Sebille

On 8/3/2015 1:22 AM, Cameron Simpson wrote:

That depends. This is the tutor list; we're helping Clayton debug his
code as an aid to learning. While it's good to know about the facilities
in the standard library, pointing him directly at fnmatch (which I'd
entirely forgotten) is the "give a man a fish" approach to help; a magic
black box to do the job for him.


Sometimes a fish of three or four lines that replaces a 20 line effort 
might be better considered as a solution to be teased apart and 
understood.


Emile



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-03 Thread Laura Creighton
In a message of Mon, 03 Aug 2015 18:22:32 +1000, Cameron Simpson writes:

>That depends. This is the tutor list; we're helping Clayton debug his code as 
>an aid to learning. While it's good to know about the facilities in the 
>standard library, pointing him directly at fnmatch (which I'd entirely 
>forgotten) is the "give a man a fish" approach to help; a magic black box to 
>do 
>the job for him.
>
>Besides, I'm not sure fnmatch is much better for his task than the more direct 
>methods being discussed.

And I am certain.  It works exactly as he said he wanted -- a less
cumbersome way to solve this problem, which he thought would be done
some way with a for loop, looping over extensions, instead of the
cumbersome way he is doing things.

His design sense was perfectly fine; there is an elegant way to solve
the problem precisely along the lines he imagined -- he just wasn't
aware of this bit of the standard library.

There is no particular virtue in teaching somebody how to build a
pneumatic drill in order to crack walnuts.

Laura

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-03 Thread Cameron Simpson

On 03Aug2015 08:12, Laura Creighton  wrote:

I think people are giving you sub-optimal advice.

Python has a module in the standard library for doing exactly what
you want to do -- match files with certain extensions.

See: https://docs.python.org/2/library/fnmatch.html

It's unix style file matching, but I am fairly certain this works
on windows also.  I don't have a windows machine to test and make sure.


That depends. This is the tutor list; we're helping Clayton debug his code as 
an aid to learning. While it's good to know about the facilities in the 
standard library, pointing him directly at fnmatch (which I'd entirely 
forgotten) is the "give a man a fish" approach to help; a magic black box to do 
the job for him.


Besides, I'm not sure fnmatch is much better for his task than the more direct 
methods being discussed.


Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Válas Péter
2015-08-02 23:44 GMT+02:00 Clayton Kirkwood :

>
>
> for dir_path, directories, files in os.walk(main_dir):
> for file in files:
> #print( " file = ", file)
> #   if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):
>
>
> I supppose you want to use regular expressions here and you are somehow
familiar with them but you forgot to tell Python to handle your string as
regex. This kind of expression must be matched against filenames instead of
using "in" operator.

In this case, https://docs.python.org/3/library/re.html and
https://docs.python.org/3/howto/regex.html are your friends.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Laura Creighton
I think people are giving you sub-optimal advice.

Python has a module in the standard library for doing exactly what
you want to do -- match files with certain extensions.

See: https://docs.python.org/2/library/fnmatch.html

It's unix style file matching, but I am fairly certain this works
on windows also.  I don't have a windows machine to test and make sure.

Laura

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Clayton Kirkwood


> -Original Message-
> From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On
> Behalf Of Steven D'Aprano
> Sent: Sunday, August 02, 2015 5:49 PM
> To: tutor@python.org
> Subject: Re: [Tutor] scratching my head
> 
> On Sun, Aug 02, 2015 at 02:44:15PM -0700, Clayton Kirkwood wrote:
> 
> > for dir_path, directories, files in os.walk(main_dir):
> > for file in files:
> > #print( " file = ", file)
> > #   if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):
> > #if(  (".jpg" or ".png" or ".avi" or ".mp4" )  not in
file.lower()
> 
> name, ext = os.path.splitext(filename)
> if ext.lower() in ('.jpg', '.png', '.avi', '.mp4'):
> ...
> 
> 
> > #del files[file]
> > #
> > #I get an error on int expected here. If I'm able to access by string,
> > why wouldn't I be able to #acess in the del?
> 
> What are you attempting to do here? files is a list of file names:
> 
> files = ['this.jpg', 'that.txt', 'other.pdf'] filename = 'that.txt'
> 
> What do you expect files['that.txt'] to do?
> 
> The problem has nothing to do with del, the problem is that you are trying
to
> access the 'that.txt'-th item of a list, and that is meaningless.


Well, I was expecting that the list entry would be deleted. In other parts
of my code I am using filenames as the index of lists: list[filenames] for
for loops and some ifs where it appears to work. I am able to look at
directories and the files in them by doing this. Check the rest of my
original code. I had one if that complained at the bottom of my code that
complained that the index was supposed to be an in not the list element
value. So I get that the index is supposed to be an int, and I think what is
happening in much of the code is the filename somehow becomes an int and
then the list accesses that way. It's very confusing. Basically, I was using
filenames as indexes into the list.


> 
> 
> > print( "looking at file  ", file, "  in
> > top_directory_file_list  ", top_directory_file_list )
> 
> What does this print? In particular, what does the last part,
> top_directory_file_list, print? Because the next error:
> 
> > if file in top_directory_file_list:
> > #error: arg of type int not iterable
> 
> is clear that it is an int.
> 
> > #yet it works for the for loops
> 
> I think you are confusing:
> 
> top_directory_file_list
> 
> directory_file_list


I don't know. If you look at the code that is going thru the directory
filename by filename the prints kick out filename and directories and the
list elements are addressed by "strings", the actual filenames.

What is happening in most of the code looks like what one would expect if
the lists could be indexed by words not ints. As a programmer, I would
expect lists to be addressed via a name or a number. It seems kind of like
dicktionaries. Am I mixing dictionaries and list?


Clayton
> 
> 
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Clayton Kirkwood


> -Original Message-
> From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On
> Behalf Of Cameron Simpson
> Sent: Sunday, August 02, 2015 6:03 PM
> To: tutor@python.org
> Subject: Re: [Tutor] scratching my head
> 
> On 02Aug2015 16:15, Clayton Kirkwood  wrote:
> >> Behalf Of Cameron Simpson
> >> Sent: Sunday, August 02, 2015 3:35 PM
> [...]
> >> Personally I'd be reaching for os.path.splitext. Untested example
below:
> >>
> >>   from os.path import splitext
> >>   
> >>   for dir_path, directories, files in os.walk(main_dir):
> >> for file in files:
> >>   prefix, ext = splitext(file)
> >>   if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'):
> >> 
> >>
> >> which I think is much easier to read.
> >>
> >> BTW, I'd be using the variable names "filename" and "filenames"
> >> instead of "file" and "files": in python 2 "file" is a builtin
> >> function (though long deprecated by "open()") and in any case I'd
> >> (personally) expect such a
> >name
> >> to be an _open_ file. As opposed to "filename", which is clearer.
> >
> >Thanks, that should also help a lot. Now time to look at splitext, and
> >the ext and ext[1:.
> 
> The "[1:]" is because "ext" will include the dot.

Yeah, after looking it up, it became clear, but thanks!

> 
> >I appreciate your comments also about the variable names.
> >Any comments on the problems lower in the file?
> 
> Maybe you'd better reraise these problems again explicitly.

Point taken.

> 
> Cheers,
> Cameron Simpson 
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Cameron Simpson

On 02Aug2015 16:15, Clayton Kirkwood  wrote:

Behalf Of Cameron Simpson
Sent: Sunday, August 02, 2015 3:35 PM

[...]

Personally I'd be reaching for os.path.splitext. Untested example below:

  from os.path import splitext
  
  for dir_path, directories, files in os.walk(main_dir):
for file in files:
  prefix, ext = splitext(file)
  if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'):


which I think is much easier to read.

BTW, I'd be using the variable names "filename" and "filenames" instead of
"file" and "files": in python 2 "file" is a builtin function (though long
deprecated by "open()") and in any case I'd (personally) expect such a

name

to be an _open_ file. As opposed to "filename", which is clearer.


Thanks, that should also help a lot. Now time to look at splitext, and the
ext and ext[1:.


The "[1:]" is because "ext" will include the dot.


I appreciate your comments also about the variable names.
Any comments on the problems lower in the file?


Maybe you'd better reraise these problems again explicitly.

Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Steven D'Aprano
On Sun, Aug 02, 2015 at 11:46:31PM +0100, Alan Gauld wrote:
> On 02/08/15 23:01, Alan Gauld wrote:
> 
> >found = False
> >for s in (".jpg",".png",".avi",".mp4"):
> > found = test or (s in file.lower())
> 
> Oops, that should be:
> 
> found = found or (s in file.lower())

extensions = (".jpg",".png",".avi",".mp4")
found = any(s in filename.lower() for s in extensions)

but that's still wrong, because it will find files like

History.of.Avis.PA.pdf

as if it were an AVI file. Instead, use os.path.splitext.



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Steven D'Aprano
On Sun, Aug 02, 2015 at 02:44:15PM -0700, Clayton Kirkwood wrote:

> for dir_path, directories, files in os.walk(main_dir):
> for file in files:
> #print( " file = ", file)
> #   if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):
> #if(  (".jpg" or ".png" or ".avi" or ".mp4" )  not in file.lower()

name, ext = os.path.splitext(filename)
if ext.lower() in ('.jpg', '.png', '.avi', '.mp4'):
...


> #del files[file]
> #
> #I get an error on int expected here. If I'm able to access by string, why
> wouldn't I be able to
> #acess in the del?

What are you attempting to do here? files is a list of file names:

files = ['this.jpg', 'that.txt', 'other.pdf']
filename = 'that.txt'

What do you expect files['that.txt'] to do?

The problem has nothing to do with del, the problem is that you are 
trying to access the 'that.txt'-th item of a list, and that is 
meaningless.


> print( "looking at file  ", file, "  in top_directory_file_list  ",
> top_directory_file_list )

What does this print? In particular, what does the last part, 
top_directory_file_list, print? Because the next error:

> if file in top_directory_file_list:
> #error: arg of type int not iterable

is clear that it is an int.

> #yet it works for the for loops

I think you are confusing:

top_directory_file_list

directory_file_list


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Clayton Kirkwood


> -Original Message-
> From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On
> Behalf Of Cameron Simpson
> Sent: Sunday, August 02, 2015 3:35 PM
> To: tutor@python.org
> Subject: Re: [Tutor] scratching my head
> 
> On 02Aug2015 23:01, ALAN GAULD  wrote:
> >On 02/08/15 22:44, Clayton Kirkwood wrote:
> >>for dir_path, directories, files in os.walk(main_dir):
> >> for file in files:
> >>#print( " file = ", file)
> >>#   if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):
> >
> >Python sees that as a single string. That string is not in your filename.
> >
> >>#if(  (".jpg" or ".png" or ".avi" or ".mp4" )  not in
file.lower()
> [...]
> >But you could use a loop:
> >
> >found = False
> >for s in (".jpg",".png",".avi",".mp4"):
> >found = test or (s in file.lower()) if not found: ...
> >
> >> if(  ".jpg" not in file.lower() and
> >>  ".png" not in file.lower() and
> >>  ".avi" not in file.lower() and
> >>  ".mp4" not in file.lower() ):
> >
> >Whether that's any better than your combined test is a moot point.
> 
> Alan has commented extensively on the logic/implementation errors. I have
> a suggestion.
> 
> Personally I'd be reaching for os.path.splitext. Untested example below:
> 
>   from os.path import splitext
>   
>   for dir_path, directories, files in os.walk(main_dir):
> for file in files:
>   prefix, ext = splitext(file)
>   if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'):
> 
> 
> which I think is much easier to read.
> 
> BTW, I'd be using the variable names "filename" and "filenames" instead of
> "file" and "files": in python 2 "file" is a builtin function (though long
> deprecated by "open()") and in any case I'd (personally) expect such a
name
> to be an _open_ file. As opposed to "filename", which is clearer.


Thanks, that should also help a lot. Now time to look at splitext, and the
ext and ext[1:. I appreciate your comments also about the variable names.
Any comments on the problems lower in the file?

Clayton


> 
> Cheers,
> Cameron Simpson 
> 
> Rudin's Law:
>   If there is a wrong way to do something, most people will do it every
time.
> Rudin's Second Law:
>   In a crisis that forces a choice to be made among alternative courses of
>   action, people tend to choose the worst possible  course.
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Alan Gauld

On 02/08/15 23:01, Alan Gauld wrote:


found = False
for s in (".jpg",".png",".avi",".mp4"):
 found = test or (s in file.lower())


Oops, that should be:

found = found or (s in file.lower())

Sorry, 'test' was my first  choice of name
but I changed it to found later.
But not everywhere :-(


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Cameron Simpson

On 02Aug2015 23:01, ALAN GAULD  wrote:

On 02/08/15 22:44, Clayton Kirkwood wrote:

for dir_path, directories, files in os.walk(main_dir):
for file in files:
#print( " file = ", file)
#   if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):


Python sees that as a single string. That string is not in your filename.


#if(  (".jpg" or ".png" or ".avi" or ".mp4" )  not in file.lower()

[...]

But you could use a loop:

found = False
for s in (".jpg",".png",".avi",".mp4"):
   found = test or (s in file.lower())
if not found: ...


if(  ".jpg" not in file.lower() and
 ".png" not in file.lower() and
 ".avi" not in file.lower() and
 ".mp4" not in file.lower() ):


Whether that's any better than your combined test is a moot point.


Alan has commented extensively on the logic/implementation errors. I have a 
suggestion.


Personally I'd be reaching for os.path.splitext. Untested example below:

 from os.path import splitext
 
 for dir_path, directories, files in os.walk(main_dir):
   for file in files:
 prefix, ext = splitext(file)
 if ext and ext[1:].lower() in ('jpg', 'png', 'avi', 'mp4'):
   

which I think is much easier to read.

BTW, I'd be using the variable names "filename" and "filenames" instead of 
"file" and "files": in python 2 "file" is a builtin function (though long 
deprecated by "open()") and in any case I'd (personally) expect such a name to 
be an _open_ file. As opposed to "filename", which is clearer.


Cheers,
Cameron Simpson 

Rudin's Law:
 If there is a wrong way to do something, most people will do it every time.
Rudin's Second Law:
 In a crisis that forces a choice to be made among alternative courses of
 action, people tend to choose the worst possible  course.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Clayton Kirkwood


> -Original Message-
> From: Tutor [mailto:tutor-bounces+crk=godblessthe...@python.org] On
> Behalf Of Alan Gauld
> Sent: Sunday, August 02, 2015 3:01 PM
> To: tutor@python.org
> Subject: Re: [Tutor] scratching my head
> 
> On 02/08/15 22:44, Clayton Kirkwood wrote:
> 
> > for dir_path, directories, files in os.walk(main_dir):
> >  for file in files:
> > #print( " file = ", file)
> > #   if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):
> 
> Python sees that as a single string. That string is not in your filename.
> 
> > #if(  (".jpg" or ".png" or ".avi" or ".mp4" )  not in
file.lower()
> 
> Python sees that as a boolean expression so will try to work it out as a
> True/False value. Since a non empty string is considered True and the
first
> True expression makes an OR opeation True overall it returns ".jpg" and
tests
> if it is not in the filename.
> 
> > #except by the drudgery below. I should be able to just have a list,
> > maybe from a file, that lists all
> 
> You might think so but that's not how 'in' works.
> 
> But you could use a loop:
> 
> found = False
> for s in (".jpg",".png",".avi",".mp4"):
>  found = test or (s in file.lower()) if not found: ...


The for is much better and it's able to get input from a file. I would think
Python more sensible if something like my commented one would work. That
would make more sense to me.

Thanks


> 
> >  if(  ".jpg" not in file.lower() and
> >   ".png" not in file.lower() and
> >   ".avi" not in file.lower() and
> >   ".mp4" not in file.lower() ):
> 
> Whether that's any better than your combined test is a moot point.
> 
> HTH
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
> 
> 
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] scratching my head

2015-08-02 Thread Alan Gauld

On 02/08/15 22:44, Clayton Kirkwood wrote:


for dir_path, directories, files in os.walk(main_dir):
 for file in files:
#print( " file = ", file)
#   if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):


Python sees that as a single string. That string is not in your filename.


#if(  (".jpg" or ".png" or ".avi" or ".mp4" )  not in file.lower()


Python sees that as a boolean expression so will try to
work it out as a True/False value. Since a non empty
string is considered True and the first True expression
makes an OR opeation True overall it returns ".jpg" and
tests if it is not in the filename.


#except by the drudgery below. I should be able to just have a list, maybe
from a file, that lists all


You might think so but that's not how 'in' works.

But you could use a loop:

found = False
for s in (".jpg",".png",".avi",".mp4"):
found = test or (s in file.lower())
if not found: ...


 if(  ".jpg" not in file.lower() and
  ".png" not in file.lower() and
  ".avi" not in file.lower() and
  ".mp4" not in file.lower() ):


Whether that's any better than your combined test is a moot point.

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] scratching my head

2015-08-02 Thread Clayton Kirkwood
Hey, been awhile, but I ran into os.walk and it fit what I needed to do for
an issue I've had for a long time: I have tons of pictures in my top
directory of pictures which are duplicated into properly named
subdirectories. Please see issues above my questions with large gaps below. 
TIA,
Clayton


#Program to find duplicated pictures in my picture directory tree
#Presumably, if the file exists in a subdirectory I can remove if from the
parent picture directory
#
#Clayton Kirkwood
#01Aug15

import os
from os.path import join,  getsize

main_dir = "/users/Clayton/Pictures"
directory_file_list = {}
duplicate_files = 0
top_directory_file_list = 0

for dir_path, directories, files in os.walk(main_dir):
for file in files:
#print( " file = ", file)
#   if( ("(\.jpg|\.png|\.avi|\.mp4)$") not in file.lower() ):
#if(  (".jpg" or ".png" or ".avi" or ".mp4" )  not in file.lower()
):
#
#why don't these work?, especially the last one. How am I to capture all
camera and video types
#except by the drudgery below. I should be able to just have a list, maybe
from a file, that lists all
#off the types and do something like if master_list not in file.lower()





if(  ".jpg" not in file.lower() and
 ".png" not in file.lower() and
 ".avi" not in file.lower() and
 ".mp4" not in file.lower() ):

print( "file ", file, "doesn't contain .jpg or .png or .avi or
.mp4" )
#del files[file]
#
#I get an error on int expected here. If I'm able to access by string, why
wouldn't I be able to
#acess in the del?





directory_file_list[dir_path] = files   #this is a list
#print(dir_path, directory_file_list[dir_path])
#print( main_dir )
for directory_path in directory_file_list.keys():
if( directory_path == main_dir ):
top_directory_file_list = directory_file_list[directory_path]
continue
#print( directory_path, ":", directory_file_list[directory_path])
file_list = directory_file_list[directory_path]
#print(file_list)
for file in file_list:
#pass
print( "looking at file  ", file, "  in top_directory_file_list  ",
top_directory_file_list )
if file in top_directory_file_list:
#error: arg of type int not iterable
#yet it works for the for loops





print( "file ", file, " found in both directory_path ",
directory_path, " and ", main_dir)
duplicate_files =+ 1
pass
break


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor