On Tuesday, 19 April 2016 23:21:42 UTC+10, Sayth Renshaw  wrote:
> On Tuesday, 19 April 2016 18:17:02 UTC+10, Peter Otten  wrote:
> > Steven D'Aprano wrote:
> > 
> > > On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
> > > 
> > >> Hi
> > >> 
> > >> Why would it be that my files are not being found in this script?
> > > 
> > > You are calling the script with:
> > > 
> > > python jqxml.py samples *.xml
> > > 
> > > This does not do what you think it does: under Linux shells, the glob
> > > *.xml will be expanded by the shell. Fortunately, in your case, you have
> > > no files in the current directory matching the glob *.xml, so it is not
> > > expanded and the arguments your script receives are:
> > > 
> > > 
> > > "python jqxml.py"  # not used
> > > 
> > > "samples"  # dir
> > > 
> > > "*.xml"  # mask
> > > 
> > > 
> > > You then call:
> > > 
> > > fileResult = filter(lambda x: x.endswith(mask), files)
> > > 
> > > which looks for file names which end with a literal string (asterisk, dot,
> > > x, m, l) in that order. You have no files that match that string.
> > > 
> > > At the shell prompt, enter this:
> > > 
> > > touch samples/junk\*.xml
> > > 
> > > and run the script again, and you should see that it now matches one file.
> > > 
> > > Instead, what you should do is:
> > > 
> > > 
> > > (1) Use the glob module:
> > > 
> > > https://docs.python.org/2/library/glob.html
> > > https://docs.python.org/3/library/glob.html
> > > 
> > > https://pymotw.com/2/glob/
> > > https://pymotw.com/3/glob/
> > > 
> > > 
> > > (2) When calling the script, avoid the shell expanding wildcards by
> > > escaping them or quoting them:
> > > 
> > > python jqxml.py samples "*.xml"
> > 
> > (3) *Use* the expansion mechanism provided by the shell instead of fighting 
> > it:
> > 
> > $ python jqxml.py samples/*.xml
> > 
> > This requires that you change your script
> > 
> > from pyquery import PyQuery as pq
> > import pandas as pd
> > import sys
> > 
> > fileResult = sys.argv[1:]
> > 
> > if not fileResult:
> >      print("no files specified")
> >      sys.exit(1)
> > 
> > for file in fileResult:
> >     print(file)
> > 
> > for items in fileResult:
> >     try:
> >         d = pq(filename=items)
> >     except FileNotFoundError as e:
> >         print(e)
> >         continue
> >     res = d('nomination')
> >     # you could move the attrs definition before the loop
> >     attrs = ('id', 'horse')
> >     # probably a bug: you are overwriting data on every iteration
> >     data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
> > 
> > I think this is the most natural approach if you are willing to accept the 
> > quirk that the script tries to process the file 'samples/*.xml' if the 
> > samples directory doesn't contain any files with the .xml suffix. Common 
> > shell tools work that way:
> > 
> > $ ls samples/*.xml
> > samples/1.xml  samples/2.xml  samples/3.xml
> > $ ls samples/*.XML
> > ls: cannot access samples/*.XML: No such file or directory
> > 
> > Unrelated: instead of working with sys.argv directly you could use argparse 
> > which is part of the standard library. The code to get at least one file is
> > 
> > import argparse
> > 
> > parser = argparse.ArgumentParser()
> > parser.add_argument("files", nargs="+")
> > args = parser.parse_args()
> > 
> > print(args.files)
> > 
> > Note that this doesn't fix the shell expansion oddity.
> 
> Hi
> 
> Thanks for the insight, after doing a little reading I found this post which 
> uses both argparse and glob and attempts to cover the windows and bash 
> expansion of wildcards, 
> http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
> 
> import argparse  
> from glob import glob  
>    
> def main(file_names):  
>     print file_names  
>    
> if __name__ == "__main__":  
>     parser = argparse.ArgumentParser()  
>     parser.add_argument("file_names", nargs='*') 
>     #nargs='*' tells it to combine all positional arguments into a single 
> list  
>     args = parser.parse_args()  
>     file_names = list()  
>    
>     #go through all of the arguments and replace ones with wildcards with the 
> expansion
>     #if a string does not contain a wildcard, glob will return it as is.
>     for arg in args.file_names:  
>         file_names += glob(arg)  
>      
>     main(file_names)
> 
> And way beyond my needs for such a tiny script but I think tis is the flask 
> developers python cli creation package Click 
> http://click.pocoo.org/5/why/#why-not-argparse based of optparse.
> 
> 
> >     # probably a bug: you are overwriting data on every iteration
> >     data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
> 
> Thanks for picking this up will have to append to it on each iteration for 
> each attribute.
> 
> Thank You
> 
> Sayth

Scratch that bit about the code for 
http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
 can't get it to work, good general direction though

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to